gffutils.interface.FeatureDB.region

FeatureDB.region(region=None, seqid=None, start=None, end=None, strand=None, featuretype=None, completely_within=False)[source]

Return features within specified genomic coordinates.

Specifying genomic coordinates can be done in a flexible manner

Parameters:
  • region (string, tuple, or Feature instance) –

    If string, then of the form “seqid:start-end”. If tuple, then (seqid, start, end). If Feature, then use the features seqid, start, and end values.

    This argument is mutually exclusive with start/end/seqid.

    Note: By design, even if a feature is provided, its strand will be ignored. If you want to restrict the output by strand, use the separate strand kwarg.

  • strand – If strand is provided, then only those features exactly matching strand will be returned. So strand='.' will only return unstranded features. Default is strand=None which does not restrict by strand.

  • seqid – Mutually exclusive with region. These kwargs can be used to approximate slice notation; see “Details” section below.

  • start – Mutually exclusive with region. These kwargs can be used to approximate slice notation; see “Details” section below.

  • end – Mutually exclusive with region. These kwargs can be used to approximate slice notation; see “Details” section below.

  • strand – Mutually exclusive with region. These kwargs can be used to approximate slice notation; see “Details” section below.

  • featuretype (None, string, or iterable) – If not None, then restrict output. If string, then only report that feature type. If iterable, then report all featuretypes in the iterable.

  • completely_within (bool) – By default (completely_within=False), returns features that partially or completely overlap region. If completely_within=True, features that are completely within region will be returned.

Notes

The meaning of seqid, start, and end is interpreted as follows:

seqid

start

end

meaning

str

int

int

equivalent to region kwarg

None

int

int

features from all chroms within coords

str

None

int

equivalent to [:end] slice notation

str

int

None

equivalent to [start:] slice notation

None

None

None

equivalent to FeatureDB.all_features()

If performance is a concern, use completely_within=True. This allows the query to be optimized by only looking for features that fall in the precise genomic bin (same strategy as UCSC Genome Browser and BEDTools). Otherwise all features’ start/stop coords need to be searched to see if they partially overlap the region of interest.

Examples

  • region(seqid="chr1", start=1000) returns all features on chr1 that start or extend past position 1000

  • region(seqid="chr1", start=1000, completely_within=True) returns all features on chr1 that start past position 1000.

  • region("chr1:1-100", strand="+", completely_within=True) returns only plus-strand features that completely fall within positions 1 to 100 on chr1.

Return type:

A generator object that yields Feature objects.