gffutils.feature.Feature
- class gffutils.feature.Feature(seqid='.', source='.', featuretype='.', start='.', end='.', score='.', strand='.', frame='.', attributes=None, extra=None, bin=None, id=None, dialect=None, file_order=None, keep_order=False, sort_attribute_values=False)[source]
- __init__(seqid='.', source='.', featuretype='.', start='.', end='.', score='.', strand='.', frame='.', attributes=None, extra=None, bin=None, id=None, dialect=None, file_order=None, keep_order=False, sort_attribute_values=False)[source]
Represents a feature from the database.
Usually you won’t want to use this directly, since it has various implementation details needed for operating in the context of FeatureDB objects. Instead, try the
gffutils.feature.feature_from_line()
function.When printed, reproduces the original line from the file as faithfully as possible using
dialect
.- Parameters:
seqid (string) – Name of the sequence (often chromosome)
source (string) – Source of the feature; typically the originating database or program that predicted the feature
featuretype (string) – Type of feature. For example “gene”, “exon”, “TSS”, etc
start (int or ".") – 1-based coordinates; start must be <= end. If “.” (the default placeholder for GFF files), then the corresponding attribute will be None.
end (int or ".") – 1-based coordinates; start must be <= end. If “.” (the default placeholder for GFF files), then the corresponding attribute will be None.
score (string) – Stored as a string.
strand ("+" | "-" | ".") – Strand of the feature; “.” when strand is not relevant.
frame ("0" | "1" | "2") – Coding frame. 0 means in-frame; 1 means there is one extra base at the beginning, so the first codon starts at the second base; 2 means two extra bases at the beginning. Interpretation is strand specific; “beginning” for a minus-strand feature is at the end coordinate.
attributes (string or dict) –
If a string, first assume it is serialized JSON; if this fails then assume it’s the original key/vals string. If it’s a dictionary already, then use as-is.
The end result is that this instance’s
attributes
attribute will always be a dictionary.Upon printing, the attributes will be reconstructed based on this dictionary and the dialect – except if the original attributes string was provided, in which case that will be used directly.
Notes on encoding/decoding: the only time unquoting (e.g., “%2C” becomes “,”) happens is if
attributes
is a string and ifsettings.ignore_url_escape_characters = False
. If dict or JSON, the contents are used as-is.Similarly, the only time characters are quoted (“,” becomes “%2C”) is when the feature is printed (
__str__
method).extra (string or list) –
Additional fields after the canonical 9 fields for GFF/GTF.
If a string, then first assume it’s serialized JSON; if this fails then assume it’s a tab-delimited string of additional fields. If it’s a list already, then use as-is.
bin (int) – UCSC genomic bin. If None, will be created based on provided start/end; if start or end is “.” then bin will be None.
id (None or string) – Database-specific primary key for this feature. The only time this should not be None is if this feature is coming from a database, in which case it will be filled in automatically.
dialect (dict or None) – The dialect to use when reconstructing attribute strings; defaults to the GFF3 spec.
FeatureDB
objects will automatically attach the dialect from the original file.file_order (int) – This is the
rowid
special field used in a sqlite3 database; this is provided by FeatureDB.keep_order (bool) – If True, then the attributes in the printed string will be in the order specified in the dialect. Disabled by default, since this sorting step is time-consuming over many features.
sort_attribute_values (bool) – If True, then the values of each attribute will be sorted when the feature is printed. Mostly useful for testing, where the order is important for checking against expected values. Disabled by default, since it can be time-consuming over many features.
Methods
__init__
([seqid, source, featuretype, ...])Represents a feature from the database.
astuple
([encoding])Return a tuple suitable for import into a database.
calc_bin
([_bin])Calculate the smallest UCSC genomic bin that will contain this feature.
sequence
(fasta[, use_strand])Retrieves the sequence of this feature as a string.
Attributes
chrom
stop