From the GTF definition:
The following feature types are required: ‘CDS’, ‘start_codon’, ‘stop_codon’. The features ‘5UTR’, ‘3UTR’, ‘inter’, ‘inter_CNS’, ‘intron_CNS’ and ‘exon’ are optional. All other features will be ignored.
So genes and transcript are not explicitly defined. The transcript extent, is,
after all, implied by all exons with a single
transcript_id, and the extent
of a gene is implied by all exons with a single
gene_id. However, this can
be tedious to calculate by hand.
gffutils infers the gene and transcript extents when the file is
imported into a database, and adds new “derived” features for each gene and
transcript. That way, a gene can be easily accessed by its ID, just like for
However, not all files meet the official GTF specifications where each feature
transcript_id to indicate its parent feature and
gene_id to indicate
its “grandparent” feature. To accommodate this,
gffutils provides some
extra options for the
Example that shows the use of
These kwargs are used to extract the parent and grandparent feature respectively.
gene_key="gene_id". But if
your particular data file does not conform to this, then they can be changed.
Examples that show the use of
Genes and transcripts are inferred from their component “exon” features. If
your particular data file does not conform to the GTF standard, you can use the
subfeature kwarg to change this. By default,
subfeature="exon", but see
the example wormbase_gff2_alt.txt for an instance of where this needs to