gffutils.interface.FeatureDB.region
- FeatureDB.region(region=None, seqid=None, start=None, end=None, strand=None, featuretype=None, completely_within=False)[source]
Return features within specified genomic coordinates.
Specifying genomic coordinates can be done in a flexible manner
- Parameters:
region (string, tuple, or Feature instance) –
If string, then of the form “seqid:start-end”. If tuple, then (seqid, start, end). If
Feature
, then use the features seqid, start, and end values.This argument is mutually exclusive with start/end/seqid.
Note: By design, even if a feature is provided, its strand will be ignored. If you want to restrict the output by strand, use the separate
strand
kwarg.strand – If
strand
is provided, then only those features exactly matchingstrand
will be returned. Sostrand='.'
will only return unstranded features. Default isstrand=None
which does not restrict by strand.seqid – Mutually exclusive with
region
. These kwargs can be used to approximate slice notation; see “Details” section below.start – Mutually exclusive with
region
. These kwargs can be used to approximate slice notation; see “Details” section below.end – Mutually exclusive with
region
. These kwargs can be used to approximate slice notation; see “Details” section below.strand – Mutually exclusive with
region
. These kwargs can be used to approximate slice notation; see “Details” section below.featuretype (None, string, or iterable) – If not None, then restrict output. If string, then only report that feature type. If iterable, then report all featuretypes in the iterable.
completely_within (bool) – By default (
completely_within=False
), returns features that partially or completely overlapregion
. Ifcompletely_within=True
, features that are completely withinregion
will be returned.
Notes
The meaning of
seqid
,start
, andend
is interpreted as follows:seqid
start
end
meaning
str
int
int
equivalent to
region
kwargNone
int
int
features from all chroms within coords
str
None
int
equivalent to [:end] slice notation
str
int
None
equivalent to [start:] slice notation
None
None
None
equivalent to FeatureDB.all_features()
If performance is a concern, use
completely_within=True
. This allows the query to be optimized by only looking for features that fall in the precise genomic bin (same strategy as UCSC Genome Browser and BEDTools). Otherwise all features’ start/stop coords need to be searched to see if they partially overlap the region of interest.Examples
region(seqid="chr1", start=1000)
returns all features on chr1 that start or extend past position 1000region(seqid="chr1", start=1000, completely_within=True)
returns all features on chr1 that start past position 1000.region("chr1:1-100", strand="+", completely_within=True)
returns only plus-strand features that completely fall within positions 1 to 100 on chr1.
- Return type:
A generator object that yields
Feature
objects.