.. currentmodule:: trackhub .. _excelinstructions: Track hub from Excel ==================== If you are an avid user of the UCSC Genome Browser and the *trackhub* package, you might find it tedious to write a script for every single hub you create. The command-line tool uses this package to further automate the trak hub making process. Additionally, if you are not familiar with Python, this tool makes it even easier to make track hubs. Follow this guide for how to use the package and how to fill out an Excel workbook to make any simple or complex visualization on the UCSC Genome Browser. 1. Create a template -------------------- The Excel file must have a specific format to be parsed correctly. To create a template, run:: trackhub_from_excel --template This will create a file ``template.xlsx`` in the current directory, with the correct sheets that you can fill out with the instructions below. Alternatively, you can get a working example to study and test:: trackhub_from_excel --create-example my_example.xlsx 2. Fill out the Excel workbook ------------------------------ Using the template just created, fill out the sheets with the data you'd like to visualize. hub and genome sheets ~~~~~~~~~~~~~~~~~~~~~ The following special sheet names are used for configuring the hub and the genome (for assembly tracks): ``hub`` – This sheet is necessary for all track hubs. It defines the hub name and labels and genome. ``genome`` – This sheet is only necessary when using a genome assembly. It points to the 2bit file and gives the genome a name and label. Example ``hub`` sheet ````````````````````` .. list-table:: * - hub - examplehub * - short_label - Example hub * - long_label - Example track hub from Excel * - email - user@example.com * - genome - hg38 Container track sheets ~~~~~~~~~~~~~~~~~~~~~~ Container tracks must be configured in their own sheet. The following special sheet names are required when using the corresponding container track. These sheets are created in the template: ``aggregate_config``, ``view_config``, ``super_config``, and ``composite_config``. Each type of container track must be on its own sheet. For example the sheet ``view_config`` can have several view tracks defined but there can be no other types of container tracks defined on that sheet. This applies to all container track types. Columns in these sheets correspond to valid track parameters for the respective track type. There are also some special fields for container track configuration: Extra ``aggregate_config`` field: An aggregate track can be placed in a super track. In this case, include add a super track in ``super_config``, and in ``aggregate_config`` add a ``super`` column and add the name of the super track. Extra ``view_config`` field: Views must be placed inside of a composite track. Configure this by adding the name of the composite track in the column labeled ``composite``. Extra ``composite_config`` field: A composite track can be placed in a super track. In this case, include the name of the super track in the column labeled ``super`` similar to as described above for ``view_config``. Super tracks are within the track hub and therefore do not need special fields. .. note:: Subgroups are not specified in the composite config. Rather, they are automatically inferred and created based on the subgroups assigned to individual tracks (and which composites those tracks are assigned to). This makes it much more convenient to organize your tracks. See the tracks section below for details. Example ``view_config`` sheet ````````````````````````````` .. list-table:: :header-rows: 1 * - name - view - short_label - long_label - visibility - tracktype - composite * - signal - signalview - Genomic signal - Genomic signal (CPM) - full - bigWig - experiment1 * - peaks - peaksview - Peaks - Called peaks (macs2) - dense - bigBed - experiment1 Example ``composite_config`` sheet `````````````````````````````````` .. list-table:: :header-rows: 1 * - name - short_label - long_label - tracktype - super * - experiment1 - experiment 1 - Experiment 1 - bigWig - supertrack1 * - experiment2 - experiment 2 - Experiment 2 - bigWig - Example ``super_config`` sheet `````````````````````````````` .. list-table:: :header-rows: 1 * - name - short_label - long_label * - supertrack1 - Super track - Super track Tracks ~~~~~~ All other sheets that do not have the special names indicated above are assumed to configure tracks. Each row defines a track and must have values in the ``name``, ``tracktype``, and ``source`` (or ``bigDataUrl``) columns. Use ``source`` when the file is on disk and use ``bigDataUrl`` when the file is publicly hosted. The user can define more fields according to the specific track type. Different track types can be listed on the same sheet. Tracks in different containers can be listed on the same sheet and tracks in the same containers can be listed on different sheets. Leave the cell in the Excel sheet blank to omit that track field for that track. The program will remove this field for the track. To use container tracks, be sure to define the container and use the ``container`` and ``container_type`` fields for the track. For example, to place a track in a view track you need first add a row for the view in the ``view_config`` sheet that includes a ``name`` field. In another sheet (containing tracks, so you can name it whatever you want), fill out a row for the track including the ``container_type`` and ``container`` fields in addition to the required fields described above. For the ``container_type`` column, fill in "view" and for the ``container`` column fill in the same name that is in the ``view_config`` sheet. To add a subgroup to a track, make a column with the prefix ``subgroup_``. The value after the underscore will become the name of the subgroup. Fill in the group that data file fits into. For example, to make subgroups based on genotype, you might label the column ``subgroup_genotype`` and fill in the rows with "WT" or "KO". You can make as many subgroups as you need. Example ``tracks`` sheet ```````````````````````` .. list-table:: :header-rows: 1 * - name - short_label - long_label - tracktype - source - visibility - color - container - container_type - dimensions - subgroup_celltype - subgroup_genotype * - k562_wt - K562 WT signal - K562 cells, WT signal - bigWig - data/kwt.bigwig - full - 120,51,154 - signalview - view - dimX=genotype dimY=celltype - k562 - wt * - k562_wt_pk - K562 WT peaks - K562 cells, WT peaks - bigBed - data/kwt.bigbed - dense - 120,51,154 - peakview - view - dimX=genotype dimY=celltype - k562 - wt 3. Run the script ----------------- This will default to naming the track hub directory as “staging” ``python trackhub_from_excel.py --excel_file experiment.xlsx`` You can run it with the ``--staging`` flag to specify the name ``python trackhub_from_excel.py --excel_file experiment.xlsx --staging experiment`` The output directory will then be ready for uploading to a host.