pybedtools.contrib.long_range_interaction.cis_trans_interactions

pybedtools.contrib.long_range_interaction.cis_trans_interactions(iterator, n, extra, verbose=True)[source]

Converts the output from tag_bedpe into a pandas DataFrame containing information about regions that contact each other in cis (same fragment) or trans (different fragments).

For example, given a BEDPE file representing 3D interactions in the genome, we want to identify which transcription start sites are connected to distal regions containing a peak.

>>> bedpe = pybedtools.example_bedtool('test_bedpe.bed')
>>> print(bedpe) 
chr1  1  10  chr1  50   90   pair1  5  +  -  x1
chr1  2  15  chr1  200  210  pair2  1  +  +  y1
>>> tsses = pybedtools.example_bedtool('test_tsses.bed')
>>> print(tsses) 
chr1  5   6   gene1
chr1  60  61  gene2
chr1  88  89  gene3
>>> peaks = pybedtools.example_bedtool('test_peaks.bed')
>>> print(peaks) 
chr1  3  4  peak1  50  .

Here’s what the tracks look like. Note that pair1 is evidence of a gene1-gene2 interaction and a gene1-gene3 interaction:

 TRACKS:

               1         2  /         5         6    / 8         9     / 20
     0123456789012345678901 / 2345678901234567890123 / 012345678901234 / 0123456789
pair1 |||||||||------------ / --------|||||||||||||| / ||||||||||||||| /
pair2  |||||||||||||------- / ---------------------- / --------------- / ||||||||||
tsses     1                 /                   2    /         3
peaks   1
>>> from collections import OrderedDict
>>> queries = OrderedDict()
>>> queries['tss'] = tsses
>>> queries['pk'] = peaks
>>> iterator, n, extra = pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, queries)
>>> for (label, group1, group2) in iterator:
...    group1 = sorted(group1, key=lambda x: str(x))
...    group2 = sorted(group2, key=lambda x: str(x))
...    for i in group1:
...        print(i, end='')  
...    for i in group2:
...        print(i, end='')  
chr1       1       10      pair1   5       +       x1      pk      chr1    3       4       peak1   50      .       1
chr1       1       10      pair1   5       +       x1      tss     chr1    5       6       gene1   1
chr1       50      90      pair1   5       -       x1      tss     chr1    60      61      gene2   1
chr1       50      90      pair1   5       -       x1      tss     chr1    88      89      gene3   1
chr1       2       15      pair2   1       +       y1      pk      chr1    3       4       peak1   50      .       1
chr1       2       15      pair2   1       +       y1      tss     chr1    5       6       gene1   1
chr1       200     210     pair2   1       +       y1      .       .       -1      -1      .       0

Now we run the same thing, but now aggregate it. Note that each piece of interaction evidence has its own line. The first line shows that pair1 has gene1 and peak1 in the same fragment, and that they are connected to gene2. The second line shows again that gene1 and peak1 are in the same fragmet and that they are also connected to gene3:

>>> import pandas; pandas.set_option('display.max_columns', 10)
>>> iterator, n, extra = pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, {'tss': tsses, 'pk': peaks})
>>> df =  pybedtools.contrib.long_range_interaction.cis_trans_interactions(iterator, n, extra)
>>> print(df.sort_values(list(df.columns)).reset_index(drop=True))
  target_label target_name cis_label cis_name distal_label distal_name  label
0           pk       peak1       tss    gene1            .           .  pair2
1           pk       peak1       tss    gene1          tss       gene2  pair1
2           pk       peak1       tss    gene1          tss       gene3  pair1
3          tss       gene1        pk    peak1            .           .  pair2
4          tss       gene1        pk    peak1          tss       gene2  pair1
5          tss       gene1        pk    peak1          tss       gene3  pair1
6          tss       gene2       tss    gene3           pk       peak1  pair1
7          tss       gene2       tss    gene3          tss       gene1  pair1
8          tss       gene3       tss    gene2           pk       peak1  pair1
9          tss       gene3       tss    gene2          tss       gene1  pair1

If we only care about genes:

>>> print((df[df.target_label == 'tss']).sort_values(list(df.columns)).reset_index(drop=True))
  target_label target_name cis_label cis_name distal_label distal_name  label
0          tss       gene1        pk    peak1            .           .  pair2
1          tss       gene1        pk    peak1          tss       gene2  pair1
2          tss       gene1        pk    peak1          tss       gene3  pair1
3          tss       gene2       tss    gene3           pk       peak1  pair1
4          tss       gene2       tss    gene3          tss       gene1  pair1
5          tss       gene3       tss    gene2           pk       peak1  pair1
6          tss       gene3       tss    gene2          tss       gene1  pair1

Note that in pair2, there is no evidence of interaction between gene1 and gene2.

What interacts distally with gene2’s TSS?

>>> assert set(df.loc[df.target_name == 'gene2', 'distal_name']).difference('.') == set([u'gene1', u'peak1'])