Design principles¶
Hopefully, understanding (or just being aware of) these design principles
will help in getting the most out of pybedtools
and working
efficiently.
Principle 1: Temporary files are created (and deleted) automatically¶
Using BedTool
instances typically has the side effect of creating
temporary files on disk. Even when using the iterator protocol of
BedTool
objects, temporary files may be created in order to run
BEDTools programs (see Using BedTool objects as iterators/generators for more on this latter topic).
Let’s illustrate some of the design principles behind pybedtools
by
merging features in a.bed
that are 100 bp or less apart (d=100
)
in a strand-specific way (s=True
):
>>> from pybedtools import BedTool
>>> import pybedtools
>>> a = BedTool(pybedtools.example_filename('a.bed'))
>>> merged_a = a.merge(d=100, s=True)
Now merged_a
is a BedTool
instance that contains the results of the
merge.
BedTool
objects must always point to a file on disk. So in the
example above, merged_a
is a BedTool
, but what file does it
point to? You can always check the BedTool.fn
attribute to find
out:
>>> # what file does `merged_a` point to?
>>> merged_a.fn
'/tmp/pybedtools.MPPp5f.tmp'
Note that the specific filename will be different for you since it is a
randomly chosen name (handled by Python’s tempfile
module). This
shows one important aspect of pybedtools
: every operation results in
a new temporary file. Temporary files are stored in /tmp
by
default, and have the form /tmp/pybedtools.*.tmp
.
By default, at exit all temp files created during the session will be deleted. However, if Python does not exit cleanly (e.g., from a bug in client code), then the temp files will not be deleted.
If this happens, from the command line you can always do a:
rm /tmp/pybedtools.*.tmp
In the middle of a session, you can force a deletion of all tempfiles created thus far:
>>> # Don't do this yet if you're following the tutorial!
>>> pybedtools.cleanup()
Alternatively, in this session or another session you can use:
>>> pybedtools.cleanup(remove_all=True)
to remove all files that match the pattern
<tempdir>/pybedtools.*.tmp
where <tempdir>
is the current value
of pybedtools.get_tempdir()
.
If you need to specify a different directory than that used by default by Python’s tempdir module, then you can set it with:
>>> pybedtools.set_tempdir('/scratch')
You’ll need write permissions to this directory, and it needs to already exist. All temp files will then be written to that directory, until the tempdir is changed again.
Principle 2: Names and arguments are as similar as possible to BEDTools¶
As much as possible, BEDTools programs and BedTool
methods share
the same names and arguments.
Returning again to this example:
>>> merged_a = a.merge(d=100, s=True)
This demonstrates that the BedTool
methods that wrap BEDTools
programs do the same thing and take the exact same arguments as the
BEDTools program. Here we can pass d=100
and s=True
only because the
underlying BEDTools program, mergeBed
, can accept these arguments.
Need to know what arguments mergeBed
can take? See the docs for
BedTool.merge()
; for more on this see Principle 7: Check the help.
In general, remove the “Bed” from the end of the BEDTools program to get
the corresponding BedTool
method. So there’s a
BedTool.subtract()
method for subtractBed
, a
BedTool.intersect()
method for intersectBed
, and so on.
Principle 3: Indifference to BEDTools version¶
Since BedTool
methods just wrap BEDTools programs, they are as up-to-date as
the version of BEDTools you have installed on disk. If you are using a
cutting-edge version of BEDTools that has some hypothetical argument
-z
for intersectBed
, then you can use a.intersectBed(z=True)
.
pybedtools
will also raise an exception if you try to use a method
that relies on a more recent version of BEDTools than you have installed.
Principle 4: Sensible default args¶
If we were running the mergeBed
program from the command line, we
would have to specify the input file with the mergeBed -i
option.
pybedtools
assumes that if we’re calling the merge()
method on
the BedTool
, a
, we want to operate on the bed file that a
points to.
In general, BEDTools programs that accept a single BED file as input
(by convention typically specified with the -i
option) the
default behavior for pybedtools
is to use the BedTool
’s
file (indicated in the BedTool.fn
attribute) as input.
We can still pass a file using the i
keyword argument if we wanted to be
absolutely explicit. In fact, the following two versions produce the same
output:
>>> # The default is to use existing file for input -- no need
>>> # to specify "i" . . .
>>> result1 = a.merge(d=100, s=True)
>>> # . . . but you can always be explicit if you'd like
>>> result2 = a.merge(i=a.fn, d=100, s=True)
>>> # Confirm that the output is identical
>>> result1 == result2
True
Methods that have this type of default behavior are indicated by the following text in their docstring:
.. note::
For convenience, the file this BedTool object points to is passed as "-i"
There are some BEDTools programs that accept two BED files as input, like
intersectBed
where the the first file is specified with -a
and the
second file with -b
. The default behavior for pybedtools
is to
consider the BedTool
’s file as -a
and the first non-keyword
argument to the method as -b
, like this:
>>> b = pybedtools.example_bedtool('b.bed')
>>> result3 = a.intersect(b)
This is exactly the same as passing the a
and b
arguments explicitly:
>>> result4 = a.intersect(a=a.fn, b=b.fn)
>>> result3 == result4
True
Furthermore, the first non-keyword argument used as -b
can either be a
filename or another BedTool
object; that is, these commands also
do the same thing:
>>> result5 = a.intersect(b=b.fn)
>>> result6 = a.intersect(b=b)
>>> str(result5) == str(result6)
True
Methods that accept either a filename or another BedTool
instance
as their first non-keyword argument are indicated by the following text in
their docstring:
.. note::
This method accepts either a BedTool or a file name as the first
unnamed argument
Principal 5: Other arguments have no defaults¶
Only the BEDTools arguments that refer to BED (or other interval) files have
defaults. In the current version of BEDTools, this means only the -i
,
-a
, and -b
arguments have defaults. All others have no defaults
specified by pybedtools
; they pass the buck to BEDTools programs. This
means if you do not specify the d
kwarg when calling BedTool.merge()
,
then it will use whatever the installed version of BEDTools uses for -d
(currently, mergeBed
’s default for -d
is 0).
-d
is an option to BEDTools mergeBed
that accepts a value, while
-s
is an option that acts as a switch. In pybedtools
, simply
pass a value (integer, float, whatever) for value-type options like -d
,
and boolean values (True
or False
) for the switch-type options like
-s
.
Here’s another example using both types of keyword arguments; the
BedTool
object b
(or it could be a string filename too) is
implicitly passed to intersectBed
as -b
(see Principle 4: Sensible default args above):
>>> a.intersect(b, v=True, f=0.5)
Again, any option that can be passed to a BEDTools program can be passed
to the corresonding BedTool
method.
Principle 6: Chaining together commands¶
Most methods return new BedTool
objects, allowing you to chain
things together just like piping commands together on the command line. To
give you a flavor of this, here is how you would get the merged regions of
features shared between a.bed
(as referred to by the
BedTool
a
we made previously) and b.bed
: (as referred to
by the BedTool
b
):
>>> a.intersect(b).merge().saveas('shared_merged.bed')
<BedTool(shared_merged.bed)>
This is equivalent to the following BEDTools commands:
intersectBed -a a.bed -b b.bed | merge -i stdin > shared_merged.bed
Methods that return a new BedTool
instance are indicated with the following text in their docstring:
.. note::
This method returns a new BedTool instance
Principle 7: Check the help¶
If you’re unsure of whether a method uses a default, or if you want to read
about what options an underlying BEDTools program accepts, check the help.
Each pyBedTool
method that wraps a BEDTools program also wraps
the BEDTools program help string. There are often examples of how to use
a method in the docstring as well. The documentation is also run through
doctests, so the code you read here is guaranteed to work and be
up-to-date.