gffutils.interface.FeatureDB.create_introns
- FeatureDB.create_introns(exon_featuretype='exon', grandparent_featuretype='gene', parent_featuretype=None, new_featuretype='intron', merge_attributes=True, numeric_sort=False)[source]
Create introns from existing annotations.
- exon_featuretypestring
Feature type to use in order to infer introns. Typically
"exon"
.- grandparent_featuretypestring
If
grandparent_featuretype
is not None, then group exons by children of this featuretype. Ifgranparent_featuretype
is “gene” (default), then introns will be created for all first-level children of genes. This may include mRNA, rRNA, ncRNA, etc. If you only want to infer introns from one of these featuretypes (e.g., mRNA), then use theparent_featuretype
kwarg which is mutually exclusive withgrandparent_featuretype
.- parent_featuretypestring
If
parent_featuretype
is not None, then only use this featuretype to infer introns. Use this if you only want a subset of featuretypes to have introns (e.g., “mRNA” only, and not ncRNA or rRNA). Mutually exclusive withgrandparent_featuretype
.- new_featuretypestring
Feature type to use for the inferred introns; default is
"intron"
.- merge_attributesbool
Whether or not to merge attributes from all exons. If False then no attributes will be created for the introns.
- numeric_sortbool
If True, then merged attributes that can be cast to float will be sorted by their numeric values (but will still be returned as string). This is useful, for example, when creating introns between exons and the exons have exon_number attributes as an integer. Using numeric_sort=True will ensure that the returned exons have merged exon_number attribute of [‘9’, ‘10’] (numerically sorted) rather than [‘10’, ‘9’] (alphabetically sorted).
A generator object that yields
Feature
objects representing new intronsThe returned generator can be passed directly to the
FeatureDB.update()
method to permanently add them to the database. However, this needs to be done carefully to avoid deadlocks from simultaneous reading/writing.When using
update()
you should also use the same keyword arguments used to create the db in the first place (with the exception offorce
).Here are three options for getting the introns back into the database, depending on the circumstances.
OPTION 1: Create list of introns.
Consume the
create_introns()
generator completely before writing to the database. If you have sufficient memory, this is the easiest option:db.update(list(db.create_introns(**intron_kwargs)), **create_kwargs)
OPTION 2: Use `WAL <https://sqlite.org/wal.html>`__
The WAL pragma enables simultaneous read/write. WARNING: this does not work if the database is on a networked filesystem, like those used on many HPC clusters.
db.set_pragmas({"journal_mode": "WAL"}) db.update(db.create_introns(**intron_kwargs), **create_kwargs)
OPTION 3: Write to intermediate file.
Use this if you are memory limited and using a networked filesystem:
with open('tmp.gtf', 'w') as fout: for intron in db.create_introns(**intron_kwargs): fout.write(str(intron) + "
- “)
db.update(gffutils.DataIterator(‘tmp.gtf’), **create_kwargs)