Attributes
The last field of a GFF or GTF file contains attributes. As described in
Dialects, these can be inconsistently formatted, but we try to the best
we can. Once the attributes have been parsed, they can be accessed via
Feature.attributes
or using getitem syntax on the Feature
itself.
An attributes.Attributes
object behaves much like a dictionary, except
that all of its values are stored internally as a list. By default, all
attribute values are returned as lists, even 1-item lists. However, this can
be changed using the constants.always_return_list
setting.
Let’s get an example Attributes
object to work with, by parsing a GFF
line:
>>> f = gffutils.feature.feature_from_line(
... 'chr2L\tFlyBase\texon\t8193\t8589\t.\t+\t.\tID=exon1; Parent=FBtr0300689,FBtr0300690')
The attributes.Attributes
object is accessed like this:
>>> f.attributes
<gffutils.attributes.Attributes object at ...>
It behaves like a dictionary of lists:
>>> for i in sorted(f.attributes.items()):
... print('{i[0]}: {i[1]}'.format(i=i))
ID: ['exon1']
Parent: ['FBtr0300689', 'FBtr0300690']
>>> f.attributes['ID']
['exon1']
Usually it’s more convenient to access the attributes directly from the feature, like this:
>>> f['ID']
['exon1']
We can add attributes, again directly from the feature:
>>> f['parent_type'] = 'mRNA'
By default, a list is always returned, even for 1-item lists:
>>> f['parent_type'] == f.attributes['parent_type'] == ['mRNA']
True
However, we can change this behavior like so:
>>> gffutils.constants.always_return_list = False
Now the single values are returned as strings rather than 1-item lists:
>>> for i in sorted(f.attributes.items()):
... print('{i[0]}: {i[1]}'.format(i=i))
ID: exon1
Parent: ['FBtr0300689', 'FBtr0300690']
parent_type: mRNA
Reset back to the original behavior:
>>> gffutils.constants.always_return_list = True