pybedtools.bedtool.BedTool.shuffle¶
- BedTool.shuffle(*args, **kwargs)[source]¶
Wraps
bedtools shuffle
.Example usage:
>>> a = pybedtools.example_bedtool('a.bed') >>> seed = 1 # so this test always returns the same results >>> b = a.shuffle(genome='hg19', chrom=True, seed=seed) >>> print(b) chr1 123081365 123081464 feature1 0 + chr1 243444570 243444670 feature2 0 + chr1 194620241 194620591 feature3 0 - chr1 172792873 172792923 feature4 0 +
For convenience, the file or stream this BedTool points to is implicitly passed as the
-i
argument toshuffleBed
There are two alternatives for supplying a genome. Use
g="genome.filename"
if you have a genome’s chrom sizes saved as a file. This is the what BEDTools expects when using it from the command line. Alternatively, use thegenome="assembly.name"
(for example,genome="hg19"
) to use chrom sizes for that assembly without having to manage a separate file. Thegenome
argument triggers a callpybedtools.chromsizes
, so see that method for more details.Original BEDTools help::
Tool: bedtools shuffle (aka shuffleBed) Version: v2.31.1 Summary: Randomly permute the locations of a feature file among a genome. Usage: bedtools shuffle [OPTIONS] -i <bed/gff/vcf> -g <genome> Options: -excl A BED/GFF/VCF file of coordinates in which features in -i should not be placed (e.g. gaps.bed). -incl Instead of randomly placing features in a genome, the -incl options defines a BED/GFF/VCF file of coordinates in which features in -i should be randomly placed (e.g. genes.bed). Larger -incl intervals will contain more shuffled regions. This method DISABLES -chromFirst. -chrom Keep features in -i on the same chromosome. - By default, the chrom and position are randomly chosen. - NOTE: Forces use of -chromFirst (see below). -seed Supply an integer seed for the shuffling. - By default, the seed is chosen automatically. - (INTEGER) -f Maximum overlap (as a fraction of the -i feature) with an -excl feature that is tolerated before searching for a new, randomized locus. For example, -f 0.10 allows up to 10% of a randomized feature to overlap with a given feature in the -excl file. **Cannot be used with -incl file.** - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50) -chromFirst Instead of choosing a position randomly among the entire genome (the default), first choose a chrom randomly, and then choose a random start coordinate on that chrom. This leads to features being ~uniformly distributed among the chroms, as opposed to features being distribute as a function of chrom size. -bedpe Indicate that the A file is in BEDPE format. -maxTries Max. number of attempts to find a home for a shuffled interval in the presence of -incl or -excl. Default = 1000. -noOverlapping Don't allow shuffled intervals to overlap. -allowBeyondChromEnd Allow shuffled intervals to be relocated to a position in which the entire original interval cannot fit w/o exceeding the end of the chromosome. In this case, the end coordinate of the shuffled interval will be set to the chromosome's length. By default, an interval's original length must be fully-contained within the chromosome. Notes: (1) The genome file should tab delimited and structured as follows: <chromName><TAB><chromSize> For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18**gl000207**random 4262 Tip 1. Use samtools faidx to create a genome file from a FASTA: One can the samtools faidx command to index a FASTA file. The resulting .fai index is suitable as a genome file, as bedtools will only look at the first two, relevant columns of the .fai file. For example: samtools faidx GRCh38.fa bedtools shift -i my.bed -l 100 -g GRCh38.fa.fai Tip 2. Use UCSC Table Browser to create a genome file: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome