.. hubward documentation master file, created by sphinx-quickstart on Tue Jul 9 22:26:36 2013. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. include:: ../README.rst `hubward` uses the following concepts: :track: Data that can be represented as a single track in the UCSC Genome Browser. Examples include a file of called peaks; read pileup from a single RNA-seq sample; CNV scores for one sample; or anything that can be converted into bigBed, bigWig, BAM, or VCF format. :study: A collection of tracks, typically all from the same published article. :group: A collection of studies, typically related in some way. Studies ------- The minimal definition of a `hubward` "study" is a directory with a `metadata.yaml` file. In practice, the directory contains raw data and conversion scripts. A study generally corresponds to data from a single published paper, but this is not required. The `metadata.yaml` file describes and configures one or many `tracks` grouped together. These are uploaded to a track hub as a single `composite track `_. The `metadata.yaml` file consists of several sections. The `study` section stores bibliographic information. It is converted to HTML documentation and added to the study's configuration page in the UCSC Genome Browser. .. code-block:: yaml study: reference: 'Ho, J. W. K. et al. Nature 512, 449-452 (2014).' PMID: 25164756 description: 'ENCODE predicted enhancers' label: encode-enhancers processing: "Downloaded data were converted to bigBed format" The `tracks` section is a list, with one item for each track to be included in the hub. Here is one such item in the `tracks` list: .. code-block:: yaml tracks: - label: "enhancers [K562]" description: "K562 enhancers" genome: hg19 original: "raw-data/p300_enhancers_K562.txt" processed: "processed-data/p300_enhancers_K562.bigbed" script: "scr/process.py" source: fn: "comparative_enhancer_calls.tar.gz" url: "http://compbio.med.harvard.edu/modencode/webpage/enh_calls_final/comparative_enhancer_calls.tar.gz" trackinfo: tracktype: "bigBed 3" visibility: "dense" itemRgb: "on" color: "#FF0000" type: bigbed The config file format and fields are described in detail later in the documentation. In summary, this block defines the source data, an output file to create, and a conversion script to create a bigBed file with features colored red, for enhancers in K562 cells from the ENCODE project. A logical extension of this would be to include additional tracks for other cell lines in this data set. To process the data for a study, use:: hubward process where `directory` contains the `metadata.yaml` file. For each defined track, this will: - ensure original data exist. If not, the `source` url is downloaded to the `source` fn and extracted - ensure processed file exists and is up-to-date. If it is older than `original` or older than `script`, the script is re-run. Groups ------ Multiple studies can be grouped together using a higher-level config file, here called `group.yaml`. Each study can have multiple tracks; each group can have multiple studies. For example, if the path to the above `metadata.yaml` file is `encode/hg19/encode-enhancers`, then that directory can be included in the `studies` list so that the K562 enhancers track will be uploaded: .. code-block:: yaml group: encode genome: hg19 name: "encodetracks" short_label: "Supp. ENCODE" long_label: "Supplemental ENCODE tracks" hub_url: "http://localhost/encode/hg19/compiled.hub.txt" server: hub_remote: "/root/encode/hg19/compiled.hub.txt" host: "localhost" user: "www" email: "www@localhost" studies: - encode/hg19/encode-enhancers To process all studies in a group, run:: hubward process This processes all configured studies to ensure their output is up-to-date. To create the track hub files and upload to a remote server, run:: hubward upload After it runs, it will show the URL that can be used to load the hub into the Genome Browser. Workflow -------- To visualize a new dataset, the workflow is the following: 1. Write a `metadata.yaml` file and the corresponding scripts to perform conversion. 2. Write a `group config` file file that groups together individual studies. 3. Run `hubward process `. This parses the group config file, and for each defined study, parses its `metadata.yaml` file, downloads data if needed, runs conversion scripts if necessary. 4. Run `hubward upload `. This builds the track hub config files using the `trackhub` Python package, and uploads to the server configured in the group config. Going further ------------- Use `hubward skeleton` to create a template study including directories, and a `metadata-builder.py` script to aid in programmatic generation of `metadata.yaml`. `hubward` includes many helper functions which can be imported into the processing script. Contents: .. toctree:: :maxdepth: 2 installation hubward contributing authors history Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`