Filtering

The BedTool.filter() method lets you pass in a function that accepts an Interval as its first argument and returns True for False. This allows you to perform “grep”-like operations on BedTool objects. For example, here’s how to get a new BedTool containing features from a that are more than 100 bp long:

>>> a = pybedtools.example_bedtool('a.bed')
>>> b = a.filter(lambda x: len(x) > 100)
>>> print(b)
chr1        150     500     feature3        0       -

The filter() method will pass its *args and **kwargs to the function provided. So here is a more generic case, where the function is defined once and different arguments are passed in for filtering on different lengths:

>>> def len_filter(feature, L):
...     "Returns True if feature is longer than L"
...     return len(feature) > L

Now we can pass different lengths without defining a new function for each length of interest, like this:

>>> a = pybedtools.example_bedtool('a.bed')

>>> print(a.filter(len_filter, L=10))
chr1        1       100     feature1        0       +
chr1        100     200     feature2        0       +
chr1        150     500     feature3        0       -
chr1        900     950     feature4        0       +


>>> print(a.filter(len_filter, L=99))
chr1        100     200     feature2        0       +
chr1        150     500     feature3        0       -


>>> print(a.filter(len_filter, L=200))
chr1        150     500     feature3        0       -

See Using BedTool objects as iterators/generators for more advanced and space-efficient usage of filter() using iterators.

Note that we could have used the built-in Python function, filter(), but that would have returned an iterator that we would have to construct a new pybedtools.BedTool out of. The BedTool.filter() method returns a ready-to-use BedTool object, which allows embedding of BedTool.filter() calls in a chain of commands, e.g.:

>>> a.intersect(b).filter(lambda x: len(x) < 100).merge()

Fast filtering functions in Cython

The featurefuncs module contains some ready-made functions written in Cython that will be faster than pure Python equivalents. For example, there are greater_than() and less_than() functions, which are about 70% faster. In IPython:

>>> from pybedtools.featurefuncs import greater_than

>>> len(a)
310456

>>> def L(x,width=100):
...     return len(x) > 100

>>> # The %timeit command is from IPython, and won't work
>>> # in a regular Python script:
>>> %timeit a.filter(greater_than, 100)
1 loops, best of 3: 1.74 s per loop

>>> %timeit a.filter(L, 100)
1 loops, best of 3: 2.96 s per loop