Prealignments
One feature of the pipeline is prealignments, which siphons off reads by aligning to small genomes before the main alignment to the primary reference genome. Ideas for common prealignment references are provided by ref_decoy.
Using prealignments with looper
To change the prealignments, you just need to add a prealignment
sample attribute with a space-separated list of all genomes you want to prealign to.
For example, adding this to your PEP project config yaml file will instruct the pipeline to pre-align to human_rDNA
, then rCRSd
, before doing the primary alignment to hg38
:
sample_modifiers:
imply:
- if:
organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
then:
prealignments: human_rDNA rCRSd
You could accomplish the same thing in a less elegant way by adding these columns to your sample table so it looks something like this:
sample_name | genome | prealignments | other columns... |
---|---|---|---|
sample1 | hg38 | human_rDNA rCRSd | ... |
sample2 | hg38 | human_rDNA rCRSd | ... |
sample3 | hg38 | human_rDNA rCRSd | ... |
Using prealignments on the command-line
If you want to adjust this for a single sample run, you just pass the values on the command line:
/pipelines/pepatac.py \
--sample-name test \
--genome hg38 \
--prealignments human_rDNA rCRSd \
--input examples/data/test_r1.fq.gz \
--single-or-paired single \
-O $HOME/example/