Quickstart

Preliminaries

If you’ve never created a track hub before, it’s important to read the track hub help so you know some of the terminology.

In order to create the structure of a track hub, no data files are necessary and no host is required – it’s simply a handful of text files on disk. However, if you want to use the track hub as it was intended – to browse data – you’ll need a publicly accessible place to host files (http or ftp) and some data. For testing small datasets, you can upload to a GitHub repository. The example below demonstrates this.

Here, we’ll set up a simple hub with 2 subtracks using data that comes with trackhub to illustrate the workflow. More complicated examples can be found in Assembly example and Organizing larger track hubs.

For the impatient:

Create hub components

A track hub consists of connected components of files that point to each other. The hub file points to a genomes file, which points to a trackDb file, which points to the various data tracks.

In trackhub, these are modeled as classes with parents and children. The hub is the parent of the genomes file, which is the parent of the trackDb file, and so on. In this quickstart, we use the default_hub() function to get all of these components already connected. See Assembly example for connecting them one-by-one or for more control over filenames.

import os
import trackhub

hub, genomes_file, genome, trackdb = trackhub.default_hub(
    hub_name="quickstart",
    genome="hg38",
    email="you@email.com")

We can indvidually print each object to see the contents of the text file that will be created. Note that until a hub is “rendered” (see below), the files won’t actually exist.

print(hub)

hub hub
shortLabel quickstart
longLabel quickstart
genomesFile quickstart.genomes.txt
email you@email.com

Note that we could have specified the short_label and long_label kwargs, but by default the hub name is carried over to the shortLabel and longLabel lines. Furthermore, the genomes.txt file’s name is prefixed by the hub name.

print(genomes_file)

genome hg38
trackDb hg38/trackDb.txt

So far, the trackdb object has no tracks added:

>>> print(trackdb)
Traceback (most recent call last):
  ...
ValueError: No Track objects specified

Adding tracks

In this example, we are adding tracks one at a time. This is not much easier than just writing the trackDb file in a text editor. However, creating tracks in Python makes it easy to scale up to hundreds of tracks and unlocks all of Python for configuring tracks. As one example, we could color tracks by some aspect (sample, or treatment, for example) and rapidly make the changes across the hub. See Organizing larger track hubs for an example of this.

Here’s how to create a single track. The most important argument is the source argument, which points to the location of the track’s file on disk. Unlike the objects we made above which don’t yet exist as a file, a Track object must point to an existing file. Sometimes you might want to include a file from a different hub in your own hub. In this case, instead of source, use the url kwarg.

Here were are using the helpers.data_dir() function to get the path to the example data shipped with trackhub. In practice, these paths would be inside your analysis directories.

First, the simplest track to add: a name, source, and type. Note that we add it to the trackdb as well:

track1 = trackhub.Track(
    name="signal1",
    source=os.path.join(trackhub.helpers.data_dir(), 'sine-hg38-0.bedgraph.bw'),
    tracktype='bigWig -2 2',
)

trackdb.add_tracks(track1)

Next, a slightly more complex one, which changes the visibility, view height, color, and view limits as well as sets nicer labels. See the Track Database Definition page for more details as well as all the other settings that are possible.

track2 = trackhub.Track(
    name='signal2',
    short_label='Signal 2',
    long_label='Signal track for sample 2',
    source=os.path.join(trackhub.helpers.data_dir(), 'sine-hg38-1.bedgraph.bw'),
    tracktype='bigWig',
    color='128,0,0',
    maxHeightPixels='8:50:128',
    viewLimits='-2:2',
    visibility='full'
)

trackdb.add_tracks(track2)

Now that we have added tracks to the trackdb object, printing it should show the updates:

print(trackdb)

track signal1
bigDataUrl signal1.bigWig
shortLabel signal1
longLabel signal1
type bigWig -2 2

track signal2
bigDataUrl signal2.bigWig
shortLabel Signal 2
longLabel Signal track for sample 2
type bigWig
color 128,0,0
maxHeightPixels 8:50:128
viewLimits -2:2
visibility full

Render (stage) the hub

To get ready for uploading, trackhub renders the hub files and symlinks all the track source files into a local directory. This allows us to inspect the hub before uploading.

By default the staging directory is a temporary directory, but here to be more explicit we will set staging="quickstart-staging"

trackhub.upload.stage_hub(hub, staging="quickstart-staging")

The quickstart-staging should have these contents:

quickstart-staging/
  quickstart.genomes.txt
  quickstart.hub.txt
quickstart-staging/hg38/
  signal1.bigWig
  signal2.bigWig
  trackDb.txt

This directory is now ready for transferring to a host to serve it. You can either use rsync directly from the terminal (be sure to use the -L argument, which follows symlinks), or use the trackhub.upload.upload_hub() function.

Another workflow would be to create a Github repo, then either set the path to the repo as the staging diretory, or move the contents of the staging directory into the repo:

For example, at the command line:

git clone https://github.com/user/repo.git
cd repo
rsync -L ../quickstart-staging .
git add .
git commit -m 'update hub'
git push origin