Track hub from Excel

If you are an avid user of the UCSC Genome Browser and the trackhub package, you might find it tedious to write a script for every single hub you create. The command-line tool uses this package to further automate the trak hub making process. Additionally, if you are not familiar with Python, this tool makes it even easier to make track hubs. Follow this guide for how to use the package and how to fill out an Excel workbook to make any simple or complex visualization on the UCSC Genome Browser.

1. Create a template

The Excel file must have a specific format to be parsed correctly.

To create a template, run:

trackhub_from_excel --template

This will create a file template.xlsx in the current directory, with the correct sheets that you can fill out with the instructions below.

Alternatively, you can get a working example to study and test:

trackhub_from_excel --create-example my_example.xlsx

2. Fill out the Excel workbook

Using the template just created, fill out the sheets with the data you’d like to visualize.

hub and genome sheets

The following special sheet names are used for configuring the hub and the genome (for assembly tracks):

hub – This sheet is necessary for all track hubs. It defines the hub name and labels and genome.

genome – This sheet is only necessary when using a genome assembly. It points to the 2bit file and gives the genome a name and label.

Example hub sheet

hub

examplehub

short_label

Example hub

long_label

Example track hub from Excel

email

user@example.com

genome

hg38

Container track sheets

Container tracks must be configured in their own sheet.

The following special sheet names are required when using the corresponding container track. These sheets are created in the template: aggregate_config, view_config, super_config, and composite_config.

Each type of container track must be on its own sheet. For example the sheet view_config can have several view tracks defined but there can be no other types of container tracks defined on that sheet. This applies to all container track types.

Columns in these sheets correspond to valid track parameters for the respective track type. There are also some special fields for container track configuration:

Extra aggregate_config field: An aggregate track can be placed in a super track. In this case, include add a super track in super_config, and in aggregate_config add a super column and add the name of the super track.

Extra view_config field: Views must be placed inside of a composite track. Configure this by adding the name of the composite track in the column labeled composite.

Extra composite_config field: A composite track can be placed in a super track. In this case, include the name of the super track in the column labeled super similar to as described above for view_config.

Super tracks are within the track hub and therefore do not need special fields.

Note

Subgroups are not specified in the composite config. Rather, they are automatically inferred and created based on the subgroups assigned to individual tracks (and which composites those tracks are assigned to). This makes it much more convenient to organize your tracks. See the tracks section below for details.

Example view_config sheet

name

view

short_label

long_label

visibility

tracktype

composite

signal

signalview

Genomic signal

Genomic signal (CPM)

full

bigWig

experiment1

peaks

peaksview

Peaks

Called peaks (macs2)

dense

bigBed

experiment1

Example composite_config sheet

name

short_label

long_label

tracktype

super

experiment1

experiment 1

Experiment 1

bigWig

supertrack1

experiment2

experiment 2

Experiment 2

bigWig

Example super_config sheet

name

short_label

long_label

supertrack1

Super track

Super track

Tracks

All other sheets that do not have the special names indicated above are assumed to configure tracks.

Each row defines a track and must have values in the name, tracktype, and source (or bigDataUrl) columns. Use source when the file is on disk and use bigDataUrl when the file is publicly hosted. The user can define more fields according to the specific track type.

Different track types can be listed on the same sheet. Tracks in different containers can be listed on the same sheet and tracks in the same containers can be listed on different sheets.

Leave the cell in the Excel sheet blank to omit that track field for that track. The program will remove this field for the track.

To use container tracks, be sure to define the container and use the container and container_type fields for the track.

For example, to place a track in a view track you need first add a row for the view in the view_config sheet that includes a name field. In another sheet (containing tracks, so you can name it whatever you want), fill out a row for the track including the container_type and container fields in addition to the required fields described above. For the container_type column, fill in “view” and for the container column fill in the same name that is in the view_config sheet.

To add a subgroup to a track, make a column with the prefix subgroup_. The value after the underscore will become the name of the subgroup. Fill in the group that data file fits into.

For example, to make subgroups based on genotype, you might label the column subgroup_genotype and fill in the rows with “WT” or “KO”. You can make as many subgroups as you need.

Example tracks sheet

name

short_label

long_label

tracktype

source

visibility

color

container

container_type

dimensions

subgroup_celltype

subgroup_genotype

k562_wt

K562 WT signal

K562 cells, WT signal

bigWig

data/kwt.bigwig

full

120,51,154

signalview

view

dimX=genotype dimY=celltype

k562

wt

k562_wt_pk

K562 WT peaks

K562 cells, WT peaks

bigBed

data/kwt.bigbed

dense

120,51,154

peakview

view

dimX=genotype dimY=celltype

k562

wt

3. Run the script

This will default to naming the track hub directory as “staging”

python trackhub_from_excel.py --excel_file experiment.xlsx

You can run it with the --staging flag to specify the name

python trackhub_from_excel.py --excel_file experiment.xlsx --staging experiment

The output directory will then be ready for uploading to a host.