Each¶
Similar to BedTool.filter()
, which applies a function to return True
or False given an Interval
, the BedTool.each()
method applies a
function to return a new, possibly modified Interval
.
The BedTool.each()
method applies a function to every feature. Like
BedTool.filter()
, you can use your own function or some pre-defined
ones in the featurefuncs
module. Also like filter()
, *args
and **kwargs
are sent to the function.
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> # The results of an "intersect" with c=True will return features
>>> # with an additional field representing the counts.
>>> with_counts = a.intersect(b, c=True)
Let’s define a function that will take the number of counts in each feature as calculated above and divide by the number of bases in that feature. We can also supply an optional scalar, like 0.001, to get the results in “number of intersections per kb”. We then insert that value into the score field of the feature. Here’s the function:
>>> def normalize_count(feature, scalar=0.001):
... """
... assume feature's last field is the count
... """
... counts = float(feature[-1])
... normalized = round(counts / (len(feature) * scalar), 2)
...
... # need to convert back to string to insert into feature
... feature.score = str(normalized)
... return feature
And we apply it like this:
>>> normalized = with_counts.each(normalize_count)
>>> print(normalized)
chr1 1 100 feature1 0.0 + 0
chr1 100 200 feature2 10.0 + 1
chr1 150 500 feature3 2.86 - 1
chr1 900 950 feature4 20.0 + 1
Similar to BedTool.filter()
, we could have used the Python built-in
function map
to map a function to each Interval
. In fact, this can
still be useful if you don’t want a BedTool
object as a result. For
example:
>>> feature_lengths = map(len, a)
However, the BedTool.each()
method returns a BedTool
object,
which can be used in a chain of commands, e.g.,
>>> a.intersect(b).each(normalize_count).filter(lamda x: float(x[4]) < 1e-5)