Read Quantification¶
Quantifying and Collating Reads¶
In order to quantify aligned reads, they must be counts to a reference transcriptome. This will tell you in relative terms how much of each transcript is expressed in a system. The following sub-module will perform this quantification, as well as compile all sample quantifications into a single data matrix for downstream use.
Arguments¶
The help menu can be accessed by calling the following from the command line:
$ xpresspipe count --help
Required Arguments | Description |
---|---|
-i <path>, --input <path> |
Path to input directory of SAM files |
-o <path>, --output <path> |
Path to output directory |
-g </path/transcripts.gtf> , --gtf </path/transcripts.gtf> |
Path and file name to GTF used for alignment quantification (if a modified GTF was created, this should be provided here; if using Cufflinks and you want isoform abundance estimates, important that you do not provide a longest transcript only GTF) |
Optional Arguments | Description |
---|---|
--suppress_version_check |
Suppress version checks and other features that require internet access during processing |
-e , --experiment |
Experiment name |
-c , --quantification_method |
Specify quantification method (default: htseq; other option: cufflinks. If using Cufflinks, no downstream sample normalization is required) |
--feature_type <feature> |
Specify feature type (3rd column in GTF file) to be used if quantifying with htseq (default: CDS) |
--stranded <fr-unstranded/fr-firststrand /fr-secondstrand||no/yes> |
Specify whether library preparation was stranded (Options before || correspond with Cufflinks inputs, options after correspond with htseq inputs) |
--deduplicate |
Include flag to quantify reads with de-duplication (will search for files with suffix _dedupRemoved.bam ) |
--bam_suffix |
Change from default suffix of _Aligned.sort.bam |
-m |
Number of max processors to use for tasks (default: No limit) |
Example 1: Count ribosome profiling alignments¶
- Input points to directory with SAM alignment files that are sorted by name
- An experiment name is provided to name the final data matrix
- Reads are quantified only to coding genes and are not counted if mapping to the first x nucleotides of each transcript exon 1 (x being the value provided for truncation when initially creating the reference files)
$ xpresspipe count -i riboseq_out/alignments/ -o riboseq_out/ -r se_reference/ -g se_reference/transcripts_codingOnly_truncated.gtf -e se_test
Example 2: Count paired-end alignments¶
- Input points to directory with SAM alignment files that are sorted by name
- An experiment name is not provided and a default name is given to the data matrix using datatime
- Reads are quantified to the entire transcriptome (coding and non-coding, no truncation)
$ xpresspipe count -i pe_out/alignments/ -o pe_out/ -r pe_reference/