Saving BedTool
results¶
In general, there are three different ways of saving results from
BedTool
operations:
Use the BedTool.saveas()
method¶
The BedTool.saveas()
method makes a copy of the results, so beware
that for large files, this can be time and/or memory-consuming. However, when
working with a streaming or iterating BedTool
, this is a great way to
render the results to disk in the middle of a pipeline.
A good example of this is saving the results from a BedTool.each()
call:
>>> from pybedtools.featurefuncs import TSS
>>> a = pybedtools.example_bedtool('a.bed')
>>> result = a.each(TSS, upstream=1000, downstream=0)\
... .saveas('upstream_regions.bed')
Use the BedTool.moveto()
method¶
The BedTool.moveto()
method does a move operation of the results.
This is best used when the results have been written to disk already (perhaps
to a tempfile) but you’d like to give the file a more reasonable/memorable
name.
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b).moveto('intersection_of_a_and_b.bed')
Use the output
keyword argument¶
If you know ahead of time that you want to save the output to a particular
file, use the output
keyword argument to any wrapped BedTool
method that returns another BedTool
object. This will override the
default behavior of creating a tempfile.
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b, output='intersection_of_a_and_b.bed')
Working with non-interval output files¶
BEDTools
commands offer lots of flexibility. This means it is possible to
return results that are not supported interval files like
BED/GFF/GTF/BAM/SAM/VCF.
Consider the following example, which uses BedTool.groupby()
to get
a 2-column file containing the number of intervals in each featuretype:
>>> a = pybedtools.example_bedtool('gdc.gff')
>>> b = pybedtools.example_bedtool('gdc.bed')
>>> c = a.intersect(b, c=True)
>>> d = c.groupby(g=[3], c=10, o=['sum'])
We can read the file created by d
looks like this:
(note: the latest version of BEDTools, v2.26.0, causes this to fail. This will be fixed in the next BEDTools release (see https://github.com/arq5x/bedtools2/issues/453, https://github.com/arq5x/bedtools2/issues/450, https://github.com/arq5x/bedtools2/issues/435, https://github.com/arq5x/bedtools2/issues/436 for details).
>>> # bedtools v2.26.0
>>> print(open(d.fn).read())
UTR 0
CDS 2
intron 4
CDS 0
UTR 1
exon 3
mRNA 7
CDS 2
exon 2
tRNA 2
gene 7
Trying to iterate over d
([i for i in d]
) or save it (d.saveas()
) raises
exceptions. This is because:
saveas()
is expected to return aBedTool
object that can be used with otherBEDTools
tools. We can’t create aBedTool
object out of an unsupported file format like thisiterating over a
BedTool
object is expected to yieldInterval
objects, but these lines can’t be converted into the supported formats
To save the output to a filename of your choosing, provide the output
argument instead of saveas()
, like this:
>>> # only works with bedtools != v2.26.0
>>> # d = c.groupby(g=[3], c=10, o=['sum'], output='counts.txt')
To iterate over the lines of the file, you can use standard Python tools, e.g.:
>>> # only works with bedtools != v2.26.0
>>> # for line in open(d.fn):
>>> # featuretype, count = line.strip().split()