pybedtools.contrib.long_range_interaction.tag_bedpe¶
- pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, queries, verbose=False)[source]¶
Tag each end of a BEDPE with a set of (possibly many) query BED files.
For example, given a BEDPE of interacting fragments from a Hi-C experiment, identify the contacts between promoters and ChIP-seq peaks. In this case, promoters and ChIP-seq peaks of interest would be provided as BED files.
The strategy is to split the BEDPE into two separate files. Each file is intersected independently with the set of queries. The results are then iterated through in parallel to tie the ends back together. It is this iterator that is returned (see example below).
- Parameters:
- bedpestr
BEDPE-format file. Must be name-sorted.
- queriesdict
Dictionary of BED/GFF/GTF/VCF files to use. After splitting the BEDPE, these query files (values in the dictionary) will be passed as the
-b
arg tobedtools intersect
. The keys are passed as thenames
argument forbedtools intersect
Features in each file must have unique names. Use
pybedtools.featurefuncs.UniqueID()
to help fix this.Each file must be BED3 to BED6.
- Returns:
- Tuple of (iterator, n, extra).
iterator
is described below.n
is the total number of lines in the- BEDPE file, which is useful for calculating percentage complete for
- downstream work.
extra
is the number of extra fields found in the BEDPE - (also useful for downstream processing).
iterator
yields tuples of (label, end1_hits, end2_hits) wherelabel
is- the name field of one line of the original BEDPE file.
end1_hits
and end2_hits
are each iterators of BED-like lines representing all- identified intersections across all query BED files for end1 and end2 for
- this pair.
- Recall that BEDPE format defines a single name and a single score for each
- pair. For each item in
end1_hits
, the fields are:: chrom1 start1 end1 name score strand1 [extra fields] query_label fields_from_query_intersecting_end1
- where
[extra fields]
are any additional fields from the original BEDPE, query_label
is one of the keys in thebeds
input dictionary, and the- remaining fields in the line are the intersecting line from the
- corresponding BED file in the
beds
input dictionary. - Similarly, each item in
end2_hits
consists of: chrom2 start2 end2 name score strand2 [extra fields] query_label fields_from_query_intersecting_end2
- At least one line is reported for every line in the BEDPE file. If there
- was no intersection, the standard BEDTools null fields will be shown. In
end1_hits
andend2_hits
, a line will be reported for each hit in each- query.