pybedtools.bedtool.BedTool.genome_coverage¶

BedTool.genome_coverage(*args, **kwargs)[source]¶

Wraps bedtools genomecov.

Note that some invocations of bedtools genomecov do not result in a properly-formatted BED file. For example, the default behavior is to report a histogram of coverage. Iterating over the resulting, non-BED-format file will raise exceptions in pybedtools’ parser.

Consider using the BedTool.to_dataframe method to convert these non-BED files into a pandas DataFrame for further use.

Example usage:

BAM file input does not require a genome:
>>> a = pybedtools.example_bedtool('x.bam')
>>> b = a.genome_coverage(bg=True)
>>> b.head(3) 
chr2L   9329    9365    1
chr2L   10212   10248   1
chr2L   10255   10291   1
Other input does require a genome:
>>> a = pybedtools.example_bedtool('x.bed')
>>> b = a.genome_coverage(bg=True, genome='dm3')
>>> b.head(3) 
chr2L   9329    9365    1
chr2L   10212   10248   1
chr2L   10255   10291   1
Non-BED format results: >>> a = pybedtools.example_bedtool(‘x.bed’) >>> b = a.genome_coverage(genome=’dm3’) >>> df = b.to_dataframe(names=[‘chrom’, ‘depth’, ‘n’, ‘chromsize’, ‘fraction’])

For convenience, the file or stream this BedTool points to is implicitly passed as the -i argument to genomeCoverageBed

There are two alternatives for supplying a genome. Use g="genome.filename" if you have a genome’s chrom sizes saved as a file. This is the what BEDTools expects when using it from the command line. Alternatively, use the genome="assembly.name" (for example, genome="hg19") to use chrom sizes for that assembly without having to manage a separate file. The genome argument triggers a call pybedtools.chromsizes, so see that method for more details.

Original BEDTools help::

Tool:    bedtools genomecov (aka genomeCoverageBed)
Version: v2.31.1
Summary: Compute the coverage of a feature file among a genome.

Usage: bedtools genomecov [OPTIONS] -i <bed/gff/vcf> -g <genome> OR -ibam <bam/cram>

Options: 
        -ibam           The input file is in BAM format.
                        Note: BAM **must** be sorted by position

        -g              Provide a genome file to define chromosome lengths.
                        Note:Required when not using -ibam option.

        -d              Report the depth at each genome position (with one-based coordinates).
                        Default behavior is to report a histogram.

        -dz             Report the depth at each genome position (with zero-based coordinates).
                        Reports only non-zero positions.
                        Default behavior is to report a histogram.

        -bg             Report depth in BedGraph format. For details, see:
                        genome.ucsc.edu/goldenPath/help/bedgraph.html

        -bga            Report depth in BedGraph format, as above (-bg).
                        However with this option, regions with zero 
                        coverage are also reported. This allows one to
                        quickly extract all regions of a genome with 0 
                        coverage by applying: "grep -w 0$" to the output.

        -split          Treat "split" BAM or BED12 entries as distinct BED intervals.
                        when computing coverage.
                        For BAM files, this uses the CIGAR "N" and "D" operations 
                        to infer the blocks for computing coverage.
                        For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds
                        fields (i.e., columns 10,11,12).

        -ignoreD        Ignore local deletions (CIGAR "D" operations) in BAM entries
                        when computing coverage.

        -strand         Calculate coverage of intervals from a specific strand.
                        With BED files, requires at least 6 columns (strand is column 6). 
                        - (STRING): can be + or -

        -pc             Calculate coverage of pair-end fragments.
                        Works for BAM files only
        -fs             Force to use provided fragment size instead of read length
                        Works for BAM files only
        -du             Change strand af the mate read (so both reads from the same strand) useful for strand specific
                        Works for BAM files only
        -5              Calculate coverage of 5" positions (instead of entire interval).

        -3              Calculate coverage of 3" positions (instead of entire interval).

        -max            Combine all positions with a depth >= max into
                        a single bin in the histogram. Irrelevant
                        for -d and -bedGraph
                        - (INTEGER)

        -scale          Scale the coverage by a constant factor.
                        Each coverage value is multiplied by this factor before being reported.
                        Useful for normalizing coverage by, e.g., reads per million (RPM).
                        - Default is 1.0; i.e., unscaled.
                        - (FLOAT)

        -trackline      Adds a UCSC/Genome-Browser track line definition in the first line of the output.
                        - See here for more details about track line definition:
                              http://genome.ucsc.edu/goldenPath/help/bedgraph.html
                        - NOTE: When adding a trackline definition, the output BedGraph can be easily
                              uploaded to the Genome Browser as a custom track,
                              BUT CAN NOT be converted into a BigWig file (w/o removing the first line).

        -trackopts      Writes additional track line definition parameters in the first line.
                        - Example:
                           -trackopts 'name="My Track" visibility=2 color=255,30,30'
                           Note the use of single-quotes if you have spaces in your parameters.
                        - (TEXT)

Notes: 
        (1) The genome file should tab delimited and structured as follows:
         <chromName><TAB><chromSize>

        For example, Human (hg19):
        chr1    249250621
        chr2    243199373
        ...
        chr18**gl000207**random 4262

        (2) The input BED (-i) file must be grouped by chromosome.
         A simple "sort -k 1,1 <BED> > <BED>.sorted" will suffice.

        (3) The input BAM (-ibam) file must be sorted by position.
         A "samtools sort <BAM>" should suffice.

Tip 1. Use samtools faidx to create a genome file from a FASTA: 
        One can the samtools faidx command to index a FASTA file.
        The resulting .fai index is suitable as a genome file, 
        as bedtools will only look at the first two, relevant columns
        of the .fai file.

        For example:
        samtools faidx GRCh38.fa
        bedtools genomecov -i my.bed -g GRCh38.fa.fai

Tip 2. Use UCSC Table Browser to create a genome file: 
        One can use the UCSC Genome Browser's MySQL database to extract
        chromosome sizes. For example, H. sapiens:

        mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
        "select chrom, size from hg19.chromInfo"  > hg19.genome

pybedtools.bedtool.BedTool.genome_coverage¶

pybedtools

Navigation

Related Topics