gffutils.feature.Feature

class gffutils.feature.Feature(seqid='.', source='.', featuretype='.', start='.', end='.', score='.', strand='.', frame='.', attributes=None, extra=None, bin=None, id=None, dialect=None, file_order=None, keep_order=False, sort_attribute_values=False)[source]
__init__(seqid='.', source='.', featuretype='.', start='.', end='.', score='.', strand='.', frame='.', attributes=None, extra=None, bin=None, id=None, dialect=None, file_order=None, keep_order=False, sort_attribute_values=False)[source]

Represents a feature from the database.

Usually you won’t want to use this directly, since it has various implementation details needed for operating in the context of FeatureDB objects. Instead, try the gffutils.feature.feature_from_line() function.

When printed, reproduces the original line from the file as faithfully as possible using dialect.

Parameters:
  • seqid (string) – Name of the sequence (often chromosome)

  • source (string) – Source of the feature; typically the originating database or program that predicted the feature

  • featuretype (string) – Type of feature. For example “gene”, “exon”, “TSS”, etc

  • start (int or ".") – 1-based coordinates; start must be <= end. If “.” (the default placeholder for GFF files), then the corresponding attribute will be None.

  • end (int or ".") – 1-based coordinates; start must be <= end. If “.” (the default placeholder for GFF files), then the corresponding attribute will be None.

  • score (string) – Stored as a string.

  • strand ("+" | "-" | ".") – Strand of the feature; “.” when strand is not relevant.

  • frame ("0" | "1" | "2") – Coding frame. 0 means in-frame; 1 means there is one extra base at the beginning, so the first codon starts at the second base; 2 means two extra bases at the beginning. Interpretation is strand specific; “beginning” for a minus-strand feature is at the end coordinate.

  • attributes (string or dict) –

    If a string, first assume it is serialized JSON; if this fails then assume it’s the original key/vals string. If it’s a dictionary already, then use as-is.

    The end result is that this instance’s attributes attribute will always be a dictionary.

    Upon printing, the attributes will be reconstructed based on this dictionary and the dialect – except if the original attributes string was provided, in which case that will be used directly.

    Notes on encoding/decoding: the only time unquoting (e.g., “%2C” becomes “,”) happens is if attributes is a string and if settings.ignore_url_escape_characters = False. If dict or JSON, the contents are used as-is.

    Similarly, the only time characters are quoted (“,” becomes “%2C”) is when the feature is printed (__str__ method).

  • extra (string or list) –

    Additional fields after the canonical 9 fields for GFF/GTF.

    If a string, then first assume it’s serialized JSON; if this fails then assume it’s a tab-delimited string of additional fields. If it’s a list already, then use as-is.

  • bin (int) – UCSC genomic bin. If None, will be created based on provided start/end; if start or end is “.” then bin will be None.

  • id (None or string) – Database-specific primary key for this feature. The only time this should not be None is if this feature is coming from a database, in which case it will be filled in automatically.

  • dialect (dict or None) – The dialect to use when reconstructing attribute strings; defaults to the GFF3 spec. FeatureDB objects will automatically attach the dialect from the original file.

  • file_order (int) – This is the rowid special field used in a sqlite3 database; this is provided by FeatureDB.

  • keep_order (bool) – If True, then the attributes in the printed string will be in the order specified in the dialect. Disabled by default, since this sorting step is time-consuming over many features.

  • sort_attribute_values (bool) – If True, then the values of each attribute will be sorted when the feature is printed. Mostly useful for testing, where the order is important for checking against expected values. Disabled by default, since it can be time-consuming over many features.

Methods

__init__([seqid, source, featuretype, ...])

Represents a feature from the database.

astuple([encoding])

Return a tuple suitable for import into a database.

calc_bin([_bin])

Calculate the smallest UCSC genomic bin that will contain this feature.

sequence(fasta[, use_strand])

Retrieves the sequence of this feature as a string.

Attributes

chrom

stop