pybedtools.contrib.long_range_interaction.tag_bedpe

pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, queries, verbose=False)[source]

Tag each end of a BEDPE with a set of (possibly many) query BED files.

For example, given a BEDPE of interacting fragments from a Hi-C experiment, identify the contacts between promoters and ChIP-seq peaks. In this case, promoters and ChIP-seq peaks of interest would be provided as BED files.

The strategy is to split the BEDPE into two separate files. Each file is intersected independently with the set of queries. The results are then iterated through in parallel to tie the ends back together. It is this iterator that is returned (see example below).

Parameters:
bedpestr

BEDPE-format file. Must be name-sorted.

queriesdict

Dictionary of BED/GFF/GTF/VCF files to use. After splitting the BEDPE, these query files (values in the dictionary) will be passed as the -b arg to bedtools intersect. The keys are passed as the names argument for bedtools intersect

Features in each file must have unique names. Use pybedtools.featurefuncs.UniqueID() to help fix this.

Each file must be BED3 to BED6.

Returns:
Tuple of (iterator, n, extra).
iterator is described below. n is the total number of lines in the
BEDPE file, which is useful for calculating percentage complete for
downstream work. extra is the number of extra fields found in the BEDPE
(also useful for downstream processing).
iterator yields tuples of (label, end1_hits, end2_hits) where label is
the name field of one line of the original BEDPE file. end1_hits and
end2_hits are each iterators of BED-like lines representing all
identified intersections across all query BED files for end1 and end2 for
this pair.
Recall that BEDPE format defines a single name and a single score for each
pair. For each item in end1_hits, the fields are::

chrom1 start1 end1 name score strand1 [extra fields] query_label fields_from_query_intersecting_end1

where [extra fields] are any additional fields from the original BEDPE,
query_label is one of the keys in the beds input dictionary, and the
remaining fields in the line are the intersecting line from the
corresponding BED file in the beds input dictionary.
Similarly, each item in end2_hits consists of:

chrom2 start2 end2 name score strand2 [extra fields] query_label fields_from_query_intersecting_end2

At least one line is reported for every line in the BEDPE file. If there
was no intersection, the standard BEDTools null fields will be shown. In
end1_hits and end2_hits, a line will be reported for each hit in each
query.