Using the history and tags
BEDTools makes it very easy to do rather complex genomic algebra. Sometimes
when you’re doing some exploratory work, you’d like to rewind back to a
previous step, or clean up temporary files that have been left on disk over the
course of some experimentation.
To assist this sort of workflow, BedTool
instances keep track of
their history in the BedTool.history
attribute. Let’s make an
example BedTool
, c
, that has some history:
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b, u=True)
c
now has a history which tells you all sorts of useful things (described
in more detail below):
>>> print c.history
[<HistoryStep> bedtool("/home/ryan/pybedtools/pybedtools/test/a.bed").intersect("/home/ryan/pybedtools/pybedtools/test/b.bed", u=True), parent tag: klkreuay, result tag: egzgnrvj]
There are several things to note here. First, the history describes the full
commands, including all the names of the temp files and all the arguments that
you would need to run in order to re-create it. Since BedTool
objects
are fundamentally file-based, the command refers to the underlying filenames
(i.e., a.bed
and b.bed
) instead of the BedTool
instances (i.e., a
and b
). A simple copy-paste of the command will be
enough re-run the command. While this may be useful in some situations, be
aware that if you do run the command again you’ll get another temp file that
has the same contents as c
’s temp file.
To avoid such cluttering of your temp dir, the history also reports
tags. BedTool
objects, when created, get a random tag assigned
to them. You can get get the BedTool
associated with tag with the
pybedtools.find_tagged()
function. These tags are used to keep track
of instances during this session.
So in this case, we could get a reference to the a
instance with:
>>> should_be_a = pybedtools.find_tagged('klkreuay')
Here’s confirmation that the parent of the first step of c
’s history is
a
(note that HistoryStep
objects have a
HistoryStep.parent_tag
and HistoryStep.result_tag
):
>>> pybedtools.find_tagged(c.history[0].parent_tag) == a
True
Let’s make something with a more complicated history:
>>> a = pybedtools.example_bedtool('a.bed')
>>> b = pybedtools.example_bedtool('b.bed')
>>> c = a.intersect(b)
>>> d = c.slop(g=pybedtools.chromsizes('hg19'), b=1)
>>> e = d.merge()
>>> # this step adds complexity!
>>> f = e.subtract(b)
Let’s see what the history of f
(the last BedTool
created) looks
like … note that here I’m formatting the results to make it easier to
see:
>>> print f.history
[
| [
| | [
| | | [
| | | |<HistoryStep> BedTool("/usr/local/lib/python2.6/dist-packages/pybedtools/test/data/a.bed").intersect(
| | | | "/usr/local/lib/python2.6/dist-packages/pybedtools/test/data/b.bed",
| | | | ),
| | | | parent tag: rzrztxlw,
| | | | result tag: ifbsanqk
| | | ],
| | |
| | |<HistoryStep> BedTool("/tmp/pybedtools.BgULVj.tmp").slop(
| | | b=1,genome="hg19"
| | | ),
| | | parent tag: ifbsanqk,
| | | result tag: omfrkwjp
| | ],
| |
| |<HistoryStep> BedTool("/tmp/pybedtools.SFmbYc.tmp").merge(),
| | parent tag: omfrkwjp,
| | result tag: zlwqblvk
| ],
|
|<HistoryStep> BedTool("/tmp/pybedtools.wlBiMo.tmp").subtract(
| "/usr/local/lib/python2.6/dist-packages/pybedtools/test/data/b.bed",
| ),
| parent tag: zlwqblvk,
| result tag: reztxhen
]
Those first three history steps correspond to c
, d
, and e
respectively, as we can see by comparing the code snippet above with the
commands in each history step. In other words, e
can be described by the
sequence of 3 commands in the first three history steps. In fact, if we
checked e.history
, we’d see exactly those same 3 steps.
When f
was created above, it operated both on e
, which had its own
history, as well as b
– note the nesting of the list. You can do
arbitrarily complex “genome algebra” operations, and the history of the
BEDTools
will keep track of this. It may not be useful in every
situtation, but the ability to backtrack and have a record of what you’ve
done can sometimes be helpful.