pybedtools Reference

The following tables summarize the methods and functions; click on a method or function name to see the complete documentation.

BedTool creation

The main BedTool documentation, with a list of all methods in alphabetical order at the bottom. For more details, please see Creating a BedTool.

pybedtools.bedtool.BedTool([fn, …])

BEDTools wrappers

These methods wrap BEDTools programs for easy use with Python; you can then use the other pybedtools functionality for further manipulation and analysis.

The documentation of each of these methods starts with pybedtools-specific documentation, possibly followed by an example. Finally, the BEDTools help is copied verbatim from whatever version was installed when generating these docs.

In general the BEDTool wrapper methods adhere to the Design principles:

pybedtools.bedtool.BedTool.intersect(*args, …) Wraps bedtools intersect.
pybedtools.bedtool.BedTool.window(*args, …) Wraps bedtools window.
pybedtools.bedtool.BedTool.closest(*args, …) Wraps bedtools closest.
pybedtools.bedtool.BedTool.coverage(*args, …) Wraps bedtools coverage.
pybedtools.bedtool.BedTool.map(*args, **kwargs) Wraps bedtools map; See also BedTool.each().
pybedtools.bedtool.BedTool.genome_coverage(…) Wraps bedtools genomecov.
pybedtools.bedtool.BedTool.merge(*args, **kwargs) Wraps bedtools merge.
pybedtools.bedtool.BedTool.cluster(*args, …) Wraps bedtools cluster.
pybedtools.bedtool.BedTool.complement(*args, …) Wraps bedtools complement.
pybedtools.bedtool.BedTool.subtract(*args, …) Wraps bedtools subtract.
pybedtools.bedtool.BedTool.slop(*args, **kwargs) Wraps bedtools slop.
pybedtools.bedtool.BedTool.flank(*args, **kwargs) Wraps bedtools flank.
pybedtools.bedtool.BedTool.sort(*args, **kwargs) Wraps bedtools sort.
pybedtools.bedtool.BedTool.random(*args, …) Wraps bedtools random.
pybedtools.bedtool.BedTool.shuffle(*args, …) Wraps bedtools shuffle.
pybedtools.bedtool.BedTool.annotate(*args, …) Wraps bedtools annotate.
pybedtools.bedtool.BedTool.multi_intersect(…) Wraps bedtools multiintersect.
pybedtools.bedtool.BedTool.union_bedgraphs(…) Wraps bedtools unionbedg.
pybedtools.bedtool.BedTool.pair_to_bed(…) Wraps bedtools pairtobed.
pybedtools.bedtool.BedTool.pair_to_pair(…) Wraps bedtools pairtopair.
pybedtools.bedtool.BedTool.bam_to_bed(*args, …) Wraps bedtools bamtobed.
pybedtools.bedtool.BedTool.to_bam(*args, …) Wraps bedtools bedtobam
pybedtools.bedtool.BedTool.bedpe_to_bam(…) Wraps bedtools bedpetobam.
pybedtools.bedtool.BedTool.bed6(*args, **kwargs) Wraps bedtools bed12tobed6.
pybedtools.bedtool.BedTool.bam_to_fastq(…) Wraps bedtools bamtofastq.
pybedtools.bedtool.BedTool.sequence(*args, …) Wraps bedtools getfasta.
pybedtools.bedtool.BedTool.mask_fasta(*args, …) Wraps bedtools maskfasta.
pybedtools.bedtool.BedTool.nucleotide_content(…) Wraps bedtools nuc.
pybedtools.bedtool.BedTool.multi_bam_coverage(…) Wraps bedtools multicov.
pybedtools.bedtool.BedTool.tag_bam(*args, …) Wraps bedtools tag.
pybedtools.bedtool.BedTool.jaccard(*args, …) Returns a dictionary with keys (intersection, union, jaccard).
pybedtools.bedtool.BedTool.reldist(*args, …) If detail=False, then return a dictionary with keys (reldist, count,
pybedtools.bedtool.BedTool.overlap(*args, …) Wraps bedtools overlap.
pybedtools.bedtool.BedTool.links(*args, **kwargs) Wraps linksBed.
pybedtools.bedtool.BedTool.igv(*args, **kwargs) Wraps bedtools igv.
pybedtools.bedtool.BedTool.window_maker(…) Wraps bedtools makewindows.
pybedtools.bedtool.BedTool.groupby(*args, …) Wraps bedtools groupby.
pybedtools.bedtool.BedTool.expand(*args, …) Wraps bedtools expand

Other BedTool methods

These methods are some of the ways in which pybedtools extend the BEDTools suite.

Feature-by-feature operations

Methods that operate on a feature-by-feature basis to modify or filter features on the fly.

pybedtools.bedtool.BedTool.each(func, *args, …) Modify each feature with a user-defined function.
pybedtools.bedtool.BedTool.filter(func, …) Filter features by user-defined function.
pybedtools.bedtool.BedTool.split(func, …) Split each feature using a user-defined function.
pybedtools.bedtool.BedTool.truncate_to_chrom(genome) Ensure all features fall within chromosome limits.
pybedtools.bedtool.BedTool.remove_invalid(…) Remove invalid features that may break BEDTools programs.

The pybedtools.featurefuncs module contains some commonly-used functions that can be passed to BedTool.each():

pybedtools.featurefuncs.three_prime Returns the 3’-most coordinate, plus upstream and downstream bp; adds the string add_to_name to the feature’s name if provided (e.g., “_polyA_site”)
pybedtools.featurefuncs.five_prime Returns the 5’-most coordinate, plus upstream and downstream bp; adds the string add_to_name to the feature’s name if provided (e.g., “_TSS”)
pybedtools.featurefuncs.TSS Alias for five_prime.
pybedtools.featurefuncs.extend_fields Pads the fields of the feature with “.” to a total length of n fields,
pybedtools.featurefuncs.center Return the width bp from the center of a feature.
pybedtools.featurefuncs.midpoint Specialized version of center() that just returns the single-bp midpoint
pybedtools.featurefuncs.normalized_to_length Normalizes the value at feature[idx] to the feature’s length, in kb.
pybedtools.featurefuncs.rename Forces a rename of all features, e.g., for renaming everything in a file ‘exon’
pybedtools.featurefuncs.greater_than Return True if feature length > size
pybedtools.featurefuncs.less_than Return True if feature length < size
pybedtools.featurefuncs.normalized_to_length Normalizes the value at feature[idx] to the feature’s length, in kb.
pybedtools.featurefuncs.rename Forces a rename of all features, e.g., for renaming everything in a file ‘exon’
pybedtools.featurefuncs.bedgraph_scale
pybedtools.featurefuncs.add_color Signature:
pybedtools.featurefuncs.gff2bed Signature:
pybedtools.featurefuncs.bed2gff Signature:

Searching for features

These methods take a single interval as input and return the intervals of the BedTool that overlap.

This can be useful when searching across many BED files for a particular coordinate range – for example, they can be used identify all binding sites, stored in many different BED files, that fall within a gene’s coordinates.

pybedtools.bedtool.BedTool.all_hits(interval) Return all intervals that overlap interval.
pybedtools.bedtool.BedTool.any_hits(interval) Return whether or not any intervals overlap interval.
pybedtools.bedtool.BedTool.count_hits(interval) Return the number of intervals that overlap interval.
pybedtools.bedtool.BedTool.tabix_intervals(…) Retrieve all intervals within coordinates from a “tabixed” BedTool.
pybedtools.bedtool.BedTool.tabix([in_place, …]) Prepare a BedTool for use with Tabix.
pybedtools.bedtool.BedTool.bgzip([in_place, …]) Helper function for more control over “tabixed” BedTools.

BedTool introspection

These methods provide information on the BedTool object.

If using BedTool.head(), don’t forget that you can index into BedTool objects, too.

pybedtools.bedtool.BedTool.head([n, as_string]) Prints the first n lines or returns them if as_string is True
pybedtools.bedtool.BedTool.count() Count the number features in this BedTool.
pybedtools.bedtool.BedTool.field_count([n]) Number of fields in each line of this BedTool (checks n lines)
pybedtools.bedtool.BedTool.file_type Return the type of the current file.

Randomization helpers

Helper methods useful for assessing empirical instersection distributions between interval files.

pybedtools.bedtool.BedTool.parallel_apply(…) Generalized method for applying a function in parallel.
pybedtools.bedtool.BedTool.randomstats(…) Dictionary of results from many randomly shuffled intersections.
pybedtools.bedtool.BedTool.randomintersection(…) Perform iterations shufflings, each time intersecting with other.
pybedtools.bedtool.BedTool.randomintersection_bp(…) Like randomintersection, but return the bp overlap instead of the number of intersecting intervals.
pybedtools.bedtool.BedTool.random_subset(…) Return a BedTool containing a random subset.
pybedtools.bedtool.BedTool.random_jaccard(other) Computes the naive Jaccard statistic (intersection divided by union).
pybedtools.bedtool.BedTool.random_op(*args, …) For backwards compatibility; see BedTool.parallel_apply instead.

Managing BedTool objects on disk

These methods are used to specify where to save results from BedTool operations.

pybedtools.bedtool.BedTool.saveas(*args, …) Make a copy of the BedTool.
pybedtools.bedtool.BedTool.moveto(*args, …) Move to a new filename (can be much quicker than BedTool.saveas())

Misc operations

Methods that can’t quite be categorized into the above sections.

pybedtools.bedtool.BedTool.cat(*args, **kwargs) Concatenate interval files together.
pybedtools.bedtool.BedTool.at(inds) Returns a new BedTool with only intervals at lines inds
pybedtools.bedtool.BedTool.absolute_distance(other) Returns an iterator of the absolute distances between features in self and other.
pybedtools.bedtool.BedTool.cut(indexes[, stream]) Analagous to unix cut.
pybedtools.bedtool.BedTool.total_coverage() Return the total number of bases covered by this interval file.
pybedtools.bedtool.BedTool.with_attrs(*args, …) Helper method for adding attributes in the middle of a pipeline.
pybedtools.bedtool.BedTool.as_intervalfile() Returns an IntervalFile of this BedTool for low-level interface.
pybedtools.bedtool.BedTool.introns([gene, exon]) Create intron features (requires specific input format).
pybedtools.bedtool.BedTool.set_chromsizes(…) Prepare BedTool for operations that require chromosome coords.
pybedtools.bedtool.BedTool.print_sequence() Print the sequence that was retrieved by BedTool.sequence.
pybedtools.bedtool.BedTool.save_seqs(fn) Save sequences, after calling BedTool.sequence.
pybedtools.bedtool.BedTool.seq(loc, fasta) Return just the sequence from a region string or a single location >>> fn = pybedtools.example_filename(‘test.fa’) >>> BedTool.seq(‘chr1:2-10’, fn) ‘GATGAGTCT’ >>> BedTool.seq((‘chr1’, 1, 10), fn) ‘GATGAGTCT’
pybedtools.bedtool.BedTool.liftover(chainfile) Returns a new BedTool of the liftedOver features, saving the unmapped ones as unmapped.
pybedtools.bedtool.BedTool.colormap_normalize([…]) Returns a normalization instance for use by featurefuncs.add_color().
pybedtools.bedtool.BedTool.relative_distance(other) Returns an iterator of relative distances between features in self and other.

Module-level functions

Working with example files

pybedtools comes with many example files. Here are some useful functions for accessing them.

pybedtools.bedtool.example_bedtool(fn) Return a bedtool using a bed file from the pybedtools examples directory.
pybedtools.filenames.list_example_files() Returns a list of files in the examples dir.
pybedtools.filenames.example_filename(fn) Return a bed file from the pybedtools examples directory.

Creating Interval objects from scratch

Interval objects are the core object in pybedtools to represent a genomic interval, written in Cython for speed.

pybedtools.cbedtools.Interval Class to represent a genomic interval.
pybedtools.cbedtools.create_interval_from_list Create an Interval object from a list of strings.

pybedtools setup and config

Use these functions right after importing in order to use custom paths or to clean up the temp directory.

pybedtools.helpers.set_bedtools_path([path]) Explicitly set path to BEDTools installation dir.
pybedtools.helpers.get_tempdir() Gets the current tempdir for the module.
pybedtools.helpers.set_tempdir(tempdir) Set the directory for temp files.
pybedtools.helpers.cleanup([verbose, remove_all]) Deletes all temp files from the current session (or optionally all sessions)
pybedtools.debug_mode(x) Enable debug mode.

Working with “chromsizes” or assembly coordinate files

Many BEDTools programs need “genome files” or “chromsizes” files so as to remain within the coordinates of the assembly you’re working on. These functions help manage these files.

pybedtools.helpers.get_chromsizes_from_ucsc(genome) Download chrom size info for genome from UCSC and returns the dictionary.
pybedtools.helpers.chromsizes(genome) Looks for a genome already included in the genome registry; if not found then it looks it up on UCSC.
pybedtools.helpers.chromsizes_to_file(…[, fn]) Converts a chromsizes dictionary to a file.

Performing operations in parallel (multiprocessing)

pybedtools.parallel.parallel_apply(…[, …]) Call an arbitrary BedTool method many times in parallel.

pybedtools.contrib

The pybedtools.contrib module contains higher-level code that leverages BedTool objects for common analyses.

Plotting

Plotting results from BEDTools/pybedtools operations is very useful for exploring and understanding the tools as well as for teaching purposes.

pybedtools.contrib.plotting.Track(features)
pybedtools.contrib.plotting.TrackCollection(config)
pybedtools.contrib.plotting.binary_heatmap(…) Plots a “binary heatmap”, showing the results of a multi-intersection.
pybedtools.contrib.plotting.binary_summary(d) Convenience function useful printing the results from binary_heatmap().
pybedtools.contrib.plotting.BedToolsDemo(…)
pybedtools.contrib.plotting.ConfiguredBedToolsDemo(…)

Working with bigWig files

At this time, pybedtools does not support reading bigWig files, only creating them via UCSC utilities.

pybedtools.contrib.bigwig.bam_to_bigwig(bam, …) Given a BAM file bam and assembly genome, create a bigWig file scaled such that the values represent scaled reads – that is, reads per million mapped reads.
pybedtools.contrib.bigwig.bedgraph_to_bigwig(…)
pybedtools.contrib.bigwig.wig_to_bigwig(wig, …)

Working with bigBed files

pybedtools.contrib.bigbed.bigbed(x, genome, …) Converts a BedTool object to a bigBed format and returns the new filename.
pybedtools.contrib.bigbed.bigbed_to_bed(fn)

IntersectionMatrix

The IntersectionMatrix class makes it easy to intersect a large number of interval files with each other.

pybedtools.contrib.IntersectionMatrix(beds, …) Class to handle many pairwise comparisons of interval files

contrib.venn_maker

The venn_maker module helps you make Venn diagrams using the R package VennDiagram.

Note that Venn diagrams are not good for when you have nested intersections. See the docs for pybedtools.contrib.venn_maker.cleaned_intersect() and its source for more details.

pybedtools.contrib.venn_maker Interface between pybedtools and the R package VennDiagram.
pybedtools.contrib.venn_maker.venn_maker(beds) Given a list of interval files, write an R script to create a Venn diagram of overlaps (and optionally run it).
pybedtools.contrib.venn_maker.cleaned_intersect(items) Perform interval intersections such that the end products have identical features for overlapping intervals.

contrib.long_range_interaction

pybedtools.contrib.long_range_interaction.tag_bedpe(…) Tag each end of a BEDPE with a set of (possibly many) query BED files.
pybedtools.contrib.long_range_interaction.cis_trans_interactions(…) Converts the output from tag_bedpe into a pandas DataFrame containing information about regions that contact each other in cis (same fragment) or trans (different fragments).

Scripts

These scripts demonstrate ways of using pybedtools for genomic analyses.

Typically a script will be added here and if the functionality is useful, it is abstracted out into a more powerful and flexible module. For example, the pybedtools.contrib.venn_maker module is a more powerful and flexible way of making Venn diagrams than the simpler venn_mpl and venn_gchart scripts below.

Another example is the pybedtools.contrib.IntersectionMatrix class, which extends the intersection_matrix.py script. The class stores results and timestamps in a local sqlite3 database to avoid re-computing up-to-date results.

pybedtools.scripts.venn_mpl Given 3 files, creates a 3-way Venn diagram of intersections using matplotlib; see pybedtools.contrib.venn_maker for more flexibility.
pybedtools.scripts.venn_gchart Given 3 files, creates a 3-way Venn diagram of intersections using the Google Chart API; see pybedtools.contrib.venn_maker for more flexibility.
pybedtools.scripts.intersection_matrix Create a matrix of many pairwise intersections; see pybedtools.contrib.IntersectionMatrix for more flexibility
pybedtools.scripts.annotate annotate a file with the neearest features in another.
pybedtools.scripts.intron_exon_reads Example from pybedtools documentation: find reads in introns and exons using multiple CPUs.
pybedtools.scripts.py_ms_example Example from the manuscript; see sh_ms_example.sh for the shell script equivalent.