pybedtools.bedtool.BedTool.shuffle¶

BedTool.shuffle(*args, **kwargs)[source]¶

Wraps bedtools shuffle.

Example usage:

>>> a = pybedtools.example_bedtool('a.bed')
>>> seed = 1 # so this test always returns the same results
>>> b = a.shuffle(genome='hg19', chrom=True, seed=seed)
>>> print(b) 
chr1    123081365       123081464       feature1        0       +
chr1    243444570       243444670       feature2        0       +
chr1    194620241       194620591       feature3        0       -
chr1    172792873       172792923       feature4        0       +

For convenience, the file or stream this BedTool points to is implicitly passed as the -i argument to shuffleBed

There are two alternatives for supplying a genome. Use g="genome.filename" if you have a genome’s chrom sizes saved as a file. This is the what BEDTools expects when using it from the command line. Alternatively, use the genome="assembly.name" (for example, genome="hg19") to use chrom sizes for that assembly without having to manage a separate file. The genome argument triggers a call pybedtools.chromsizes, so see that method for more details.

Original BEDTools help::

Tool:    bedtools shuffle (aka shuffleBed)
Version: v2.31.1
Summary: Randomly permute the locations of a feature file among a genome.

Usage:   bedtools shuffle [OPTIONS] -i <bed/gff/vcf> -g <genome>

Options: 
        -excl   A BED/GFF/VCF file of coordinates in which features in -i
                should not be placed (e.g. gaps.bed).

        -incl   Instead of randomly placing features in a genome, the -incl
                options defines a BED/GFF/VCF file of coordinates in which 
                features in -i should be randomly placed (e.g. genes.bed). 
                Larger -incl intervals will contain more shuffled regions. 
                This method DISABLES -chromFirst. 
        -chrom  Keep features in -i on the same chromosome.
                - By default, the chrom and position are randomly chosen.
                - NOTE: Forces use of -chromFirst (see below).

        -seed   Supply an integer seed for the shuffling.
                - By default, the seed is chosen automatically.
                - (INTEGER)

        -f      Maximum overlap (as a fraction of the -i feature) with an -excl
                feature that is tolerated before searching for a new, 
                randomized locus. For example, -f 0.10 allows up to 10%
                of a randomized feature to overlap with a given feature
                in the -excl file. **Cannot be used with -incl file.**
                - Default is 1E-9 (i.e., 1bp).
                - FLOAT (e.g. 0.50)

        -chromFirst     
                Instead of choosing a position randomly among the entire
                genome (the default), first choose a chrom randomly, and then
                choose a random start coordinate on that chrom.  This leads
                to features being ~uniformly distributed among the chroms,
                as opposed to features being distribute as a function of chrom size.

        -bedpe  Indicate that the A file is in BEDPE format.

        -maxTries       
                Max. number of attempts to find a home for a shuffled interval
                in the presence of -incl or -excl.
                Default = 1000.
        -noOverlapping  
                Don't allow shuffled intervals to overlap.
        -allowBeyondChromEnd    
                Allow shuffled intervals to be relocated to a position
                in which the entire original interval cannot fit w/o exceeding
                the end of the chromosome.  In this case, the end coordinate of the
                shuffled interval will be set to the chromosome's length.
                By default, an interval's original length must be fully-contained
                within the chromosome.
Notes: 
        (1)  The genome file should tab delimited and structured as follows:
             <chromName><TAB><chromSize>

        For example, Human (hg19):
        chr1    249250621
        chr2    243199373
        ...
        chr18**gl000207**random 4262

Tip 1. Use samtools faidx to create a genome file from a FASTA: 
        One can the samtools faidx command to index a FASTA file.
        The resulting .fai index is suitable as a genome file, 
        as bedtools will only look at the first two, relevant columns
        of the .fai file.

        For example:
        samtools faidx GRCh38.fa
        bedtools shift -i my.bed -l 100 -g GRCh38.fa.fai

Tip 2. Use UCSC Table Browser to create a genome file: 
        One can use the UCSC Genome Browser's MySQL database to extract
        chromosome sizes. For example, H. sapiens:

        mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
        "select chrom, size from hg19.chromInfo"  > hg19.genome

pybedtools.bedtool.BedTool.shuffle¶

pybedtools

Navigation

Related Topics