gffutils.interface.FeatureDB.create_introns

FeatureDB.create_introns(exon_featuretype='exon', grandparent_featuretype='gene', parent_featuretype=None, new_featuretype='intron', merge_attributes=True, numeric_sort=False)[source]

Create introns from existing annotations.

exon_featuretypestring

Feature type to use in order to infer introns. Typically "exon".

grandparent_featuretypestring

If grandparent_featuretype is not None, then group exons by children of this featuretype. If granparent_featuretype is “gene” (default), then introns will be created for all first-level children of genes. This may include mRNA, rRNA, ncRNA, etc. If you only want to infer introns from one of these featuretypes (e.g., mRNA), then use the parent_featuretype kwarg which is mutually exclusive with grandparent_featuretype.

parent_featuretypestring

If parent_featuretype is not None, then only use this featuretype to infer introns. Use this if you only want a subset of featuretypes to have introns (e.g., “mRNA” only, and not ncRNA or rRNA). Mutually exclusive with grandparent_featuretype.

new_featuretypestring

Feature type to use for the inferred introns; default is "intron".

merge_attributesbool

Whether or not to merge attributes from all exons. If False then no attributes will be created for the introns.

numeric_sortbool

If True, then merged attributes that can be cast to float will be sorted by their numeric values (but will still be returned as string). This is useful, for example, when creating introns between exons and the exons have exon_number attributes as an integer. Using numeric_sort=True will ensure that the returned exons have merged exon_number attribute of [‘9’, ‘10’] (numerically sorted) rather than [‘10’, ‘9’] (alphabetically sorted).

A generator object that yields Feature objects representing new introns

The returned generator can be passed directly to the FeatureDB.update() method to permanently add them to the database. However, this needs to be done carefully to avoid deadlocks from simultaneous reading/writing.

When using update() you should also use the same keyword arguments used to create the db in the first place (with the exception of force).

Here are three options for getting the introns back into the database, depending on the circumstances.

OPTION 1: Create list of introns.

Consume the create_introns() generator completely before writing to the database. If you have sufficient memory, this is the easiest option:

db.update(list(db.create_introns(**intron_kwargs)), **create_kwargs)

OPTION 2: Use `WAL <https://sqlite.org/wal.html>`__

The WAL pragma enables simultaneous read/write. WARNING: this does not work if the database is on a networked filesystem, like those used on many HPC clusters.

db.set_pragmas({"journal_mode": "WAL"})
db.update(db.create_introns(**intron_kwargs), **create_kwargs)

OPTION 3: Write to intermediate file.

Use this if you are memory limited and using a networked filesystem:

with open('tmp.gtf', 'w') as fout:
    for intron in db.create_introns(**intron_kwargs):
        fout.write(str(intron) + "
“)

db.update(gffutils.DataIterator(‘tmp.gtf’), **create_kwargs)