General Usage¶

XPRESSpipe can be run essentially from beginning to end as a pipeline, or as individual sub-modules. We will describe each option in more detail in each section of the documentation. The purpose of XPRESSpipe is to automate the alignment, quality control, and initial analysis of single-end (SE), paired-end (PE), and ribosome profiling data. It is intended that input data is in its own directory and that each file is a properly formatted fastq file. However, the suffix for these files can be fq or txt as well. They can be zipped (zip or gz) or unzipped. When using intermediate sub-modules, such as align or readDistribution, input will vary and is explicated

in the --help menu for each sub-module.

Further analysis on the resulting datasets can be performed using XPRESSplot.

File Naming¶

In order for many of the XPRESSpipe functions to perform properly and for the output to be reliable after alignment (except for generation of a raw counts table), file naming conventions must be followed.

Download your raw sequence data and place in a folder – this folder should contain all the sequence data and nothing else.
If you are working with single-end data, the files must be a FASTQ-formatted file and end with the suffix fastq, fastq.gz, fq, fq.gz, txt, txt.gz. We recommend the fastq or fastq.gz suffix.
If you are working with paired-end data, the rules from Step 2 apply, but must the suffix must be prefaced by the paired read group number as below:

ExperimentName_Rep1_a_WT.r1.fastq.gz
ExperimentName_Rep1_a_WT.r2.fastq.gz
ExperimentName_Rep2_a_WT.r1.fastq.gz
ExperimentName_Rep2_a_WT.r2.fastq.gz

or

ExperimentName_Rep1_a_WT.read1.fastq.gz
ExperimentName_Rep1_a_WT.read2.fastq.gz
ExperimentName_Rep2_a_WT.read1.fastq.gz
ExperimentName_Rep2_a_WT.read2.fastq.gz

Data Output¶

Running seRNAseq, peRNAseq, or riboseq will output all intermediate and final data files as shown in this schematic: