More detailed answers to these questions can often be found on the Issues page.
“Does pybedtools have a simple reader/writer for BED files?”¶
pybedtools designed to work with BEDTools, the reading/writing/parsing
function can be easily used for other things.
Simply iterating over a
BedTool object will parse each line into
Interval object. You can then manipulate this or access the fields
x = pybedtools.example_bedtool('a.bed') for interval in x: # do something with interval
However, if you’re planning on writing the results out to file, it may be more
useful to write a transformation function along with the
method. This allows you to read, transform, and write all in one command:
def my_func(f): """ adds 10 bp to the stop """ f.stop += 1 return f pybedtools.example_bedtool('a.bed')\ .each(my_func)\ .saveas('out.bed')
Another useful idiom is creating a generator function. For example, here we change the name field to reflect the value of a counter. We create a BedTool from the iterator and then save it:
def gen(): counter = 0 for i in pybedtools.example_bedtool('a.bed'): i.name = str(counter) counter += 1 yield i pybedtools.BedTool(gen()).saveas('counted.bed')
See Saving the results for more on saving the results.
“Can I create a BedTool object from an existing list?”¶
BedTool constructor will figure it out:
items = [ ('chr1', 100, 200), ('chr1', 500, 600), ] x = pybedtools.BedTool(items)
“I’m getting an empty BedTool”¶
Check to make sure you’re not consuming a BedTool generator. Note that
BedTool.each() will return a generator BedTool
object. Keep in mind that checking the length of a generator BedTool will
completely consume it.
It’s probably best to save intermediate versions to file using
BedTool.saveas(). If you don’t provide a filename, it’ll save to an
automatically cleaned up tempfile:
my_bedtool\ .filter(my_filter_func)\ .saveas()\ .intersect(y)\ .filter(lambda x: len(x) > 1000)\ .saveas('filtered-intersected-large.bed')
“I’m getting a MalformedBedLineError”¶
This error can be raised by BEDTools itself. Typical reasons are that start > end, or the fields are not tab-delimited.
You can try the
pybedtools.remove_invalid() function to clean up your
file, or manually edit the offending lines.
“I get a segfault when iterating over a BedTool object”¶
Issue #88 which
addresses this issue – in summary, Cython’s handling of iterators works
unexpectedly. It’s best to call the
next() method explicitly when doing
complex manipulations on an iterating
“Can I add extra information to FASTA headers when using BedTool.sequence()?”¶
Since BEDTools adds the feature name to the FASTA header, you can manipulate the feature name on the fly with a custom modifier function:
def fields2name(f): "replace GFF featuretype field with the attributes field" f = f[-1] return f import pybedtools g = pybedtools.BedTool("my.gff").each(fields2name).sequence(fi='my.fasta') print open(g.seqfn).readline()
“Too many files open” error¶
Sometimes you may get the error:
* Too many files open -- please submit a bug report so that this can be fixed
This error occurs because you have hit your operating system’s limit on the
number of open files. This usually happens when creating many
objects, often within a for-loop.
In general, try to create as few
BedTool objects as you can. Every time you
BedTool object, you create a new open file. There is usually
a BEDTools program that already does what you want, and will do it faster.
For example, say we want to:
start with all annotations
only consider exons
write a file containing just exons
count reads in multiple BAM files for each exon
Here is a first draft. Note that the for-loop creates a
object each iteration, and the
result is yet another
will version will raise the “Too many files open” error.
# This version will be slow and, with many exons, will raise the "Too many # files open" error import pybedtools all_features = pybedtools.BedTool('annotations.gff') fout = open('exons.gff', 'w') for feature in all_features: if feature != 'exon': continue fout.write(str(feature)) bt = pybedtools.BedTool([feature]) result = bt.multi_bam_coverage(bams=['reads1.bam', 'reads2.bam']) # ...do something with result fout.close()
In contrast, it would be better to construct an “exon-only”
the beginning. The
BedTool.filter() method is a good way to do this.
Then, there is only one call to
In this version there are only 3
BedTool objects: the
one that opens
annotations.gff, the one that uses
exons.gff after it is
result. (Note that the one created from the filter operation is
a “streaming” BedTool, so there is no open file that will contribute to the
# This is the recommended way. import pybedtools exons = pybedtools.BedTool('annotations.gff')\ .filter(lambda x: x == 'exon')\ .saveas('exons.gff') result = exons.multi_bam_coverage(bams=['reads1.bam', 'reads2.bam']) # ...do something with result