pybedtools documentation¶
Overview¶
The BEDTools suite of programs is widely
used for genomic interval manipulation or “genome algebra”. pybedtools wraps
and extends BEDTools and offers feature-level manipulations from within
Python.
See full online documentation, including installation instructions, at https://daler.github.io/pybedtools/.
The GitHub repo is at https://github.com/daler/pybedtools.
Why pybedtools?¶
Here is an example to get the names of genes that are <5 kb away from intergenic SNPs:
from pybedtools import BedTool
snps = BedTool('snps.bed.gz') # [1]
genes = BedTool('hg19.gff') # [1]
intergenic_snps = snps.subtract(genes) # [2]
nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]
for gene in nearby: # [4]
if int(gene[-1]) < 5000: # [4]
print gene.name # [4]
Useful features shown here include:
[1]support for all BEDTools-supported formats (here gzipped BED and GFF)[2]wrapping of all BEDTools programs and arguments (here,subtractandclosestand passing the-dflag toclosest);[3]streaming results (like Unix pipes, here specified bystream=True)[4]iterating over results while accessing feature data by index or by attribute access (here[-1]and.name).
In contrast, here is the same analysis using shell scripting. Note that this
requires knowledge in Perl, bash, and awk. The run time is identical to the
pybedtools version above:
snps=snps.bed.gz
genes=hg19.gff
intergenic_snps=/tmp/intergenic_snps
snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))
intersectBed -a $snps -b $genes -v > $intergenic_snps
closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'
rm $intergenic_snps
See the Shell script comparison in the docs for more details on this comparison, or keep reading the full documentation at http://daler.github.io/pybedtools.
As of 2022, pybedtools is released under the MIT license; see LICENSE.txt for
more info.
Note
If you use pybedtools in your work, please cite the pybedtools
manuscript
and the BEDTools manuscript:
Dale RK, Pedersen BS, and Quinlan AR. 2011. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27(24):3423-3424.
Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841–842.
Getting started¶
The documentation is separated into 4 main parts, depending on the depth you’d like to cover:
Lazy, or just want to jump in? Check out Three brief examples to get a feel for the package.
Want a guided tour? Give the Tutorial Contents a shot.
More advanced features are described in the Topical Documentation section.
Finally, doctested module documentation can be found in pybedtools Reference.
Contents:¶
- Installation
- Running tests, compiling docs
- Three brief examples
- Tutorial Contents
- Topical Documentation
- Design principles
- Creating a
BedTool - Saving
BedToolresults - Using BedTool objects as iterators/generators
- Low-level operations
- Working with BAM files
- Notes on BAM file semantics
- Specifying genomes
- Randomization
- Wrapping new tools
- Comparisons
- Shell script comparison
pybedtoolsdevelopment model- Under the hood
- FAQs
- “Does pybedtools have a simple reader/writer for BED files?”
- “Can I create a BedTool object from an existing list?”
- “I’m getting an empty BedTool”
- “I’m getting a MalformedBedLineError”
- “I get a segfault when iterating over a BedTool object”
- “Can I add extra information to FASTA headers when using BedTool.sequence()?”
- “Too many files open” error
- Scripts
pybedtoolsReference- Changelog
- Changes in v0.12.0
- Changes in v0.11.0
- Changes in v0.10.1
- Changes in v0.9.1
- Changes in v0.9
- Changes in v0.8.2
- Changes in v0.8.1
- Changes in v0.8.0
- Changes in v0.7.10
- Changes in v0.7.9
- Changes in v0.7.8
- Changes in v0.7.7
- Changes in v0.7.6
- Changes in v0.7.5
- Changes in v0.7.4
- Changes in v0.7.1
- Changes in v0.7.0
- Changes in v0.6.9
- Changes in v0.6.8
- Changes in v0.6.7
- Changes in v0.6.6
- Changes in v0.6.5
- Changes in v0.6.4
- Changes in v0.6.3
- Changes in v0.6.2
- Changes in v0.6.1
- Changes in v0.6
- Changes in v0.5.5
- Changes in v0.5