pybedtools.bedtool.BedTool.genome_coverage¶
- BedTool.genome_coverage(*args, **kwargs)[source]¶
Wraps
bedtools genomecov
.Note that some invocations of
bedtools genomecov
do not result in a properly-formatted BED file. For example, the default behavior is to report a histogram of coverage. Iterating over the resulting, non-BED-format file will raise exceptions in pybedtools’ parser.Consider using the
BedTool.to_dataframe
method to convert these non-BED files into a pandas DataFrame for further use.Example usage:
BAM file input does not require a genome:
>>> a = pybedtools.example_bedtool('x.bam') >>> b = a.genome_coverage(bg=True) >>> b.head(3) chr2L 9329 9365 1 chr2L 10212 10248 1 chr2L 10255 10291 1
Other input does require a genome:
>>> a = pybedtools.example_bedtool('x.bed') >>> b = a.genome_coverage(bg=True, genome='dm3') >>> b.head(3) chr2L 9329 9365 1 chr2L 10212 10248 1 chr2L 10255 10291 1
Non-BED format results: >>> a = pybedtools.example_bedtool(‘x.bed’) >>> b = a.genome_coverage(genome=’dm3’) >>> df = b.to_dataframe(names=[‘chrom’, ‘depth’, ‘n’, ‘chromsize’, ‘fraction’])
For convenience, the file or stream this BedTool points to is implicitly passed as the
-i
argument togenomeCoverageBed
There are two alternatives for supplying a genome. Use
g="genome.filename"
if you have a genome’s chrom sizes saved as a file. This is the what BEDTools expects when using it from the command line. Alternatively, use thegenome="assembly.name"
(for example,genome="hg19"
) to use chrom sizes for that assembly without having to manage a separate file. Thegenome
argument triggers a callpybedtools.chromsizes
, so see that method for more details.Original BEDTools help::
Tool: bedtools genomecov (aka genomeCoverageBed) Version: v2.31.1 Summary: Compute the coverage of a feature file among a genome. Usage: bedtools genomecov [OPTIONS] -i <bed/gff/vcf> -g <genome> OR -ibam <bam/cram> Options: -ibam The input file is in BAM format. Note: BAM **must** be sorted by position -g Provide a genome file to define chromosome lengths. Note:Required when not using -ibam option. -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -ignoreD Ignore local deletions (CIGAR "D" operations) in BAM entries when computing coverage. -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -pc Calculate coverage of pair-end fragments. Works for BAM files only -fs Force to use provided fragment size instead of read length Works for BAM files only -du Change strand af the mate read (so both reads from the same strand) useful for strand specific Works for BAM files only -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) Notes: (1) The genome file should tab delimited and structured as follows: <chromName><TAB><chromSize> For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18**gl000207**random 4262 (2) The input BED (-i) file must be grouped by chromosome. A simple "sort -k 1,1 <BED> > <BED>.sorted" will suffice. (3) The input BAM (-ibam) file must be sorted by position. A "samtools sort <BAM>" should suffice. Tip 1. Use samtools faidx to create a genome file from a FASTA: One can the samtools faidx command to index a FASTA file. The resulting .fai index is suitable as a genome file, as bedtools will only look at the first two, relevant columns of the .fai file. For example: samtools faidx GRCh38.fa bedtools genomecov -i my.bed -g GRCh38.fa.fai Tip 2. Use UCSC Table Browser to create a genome file: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome