One feature of the pipeline is prealignments, which siphons off reads by aligning to small genomes before the main alignment to the primary reference genome. Ideas for common prealignment references are provided by ref_decoy.
Using prealignments with looper
To change the prealignments, you just need to add a
prealignment sample attribute with a space-separated list of all genomes you want to prealign to.
For example, adding this to your PEP project config yaml file will instruct the pipeline to pre-align to
rCRSd, before doing the primary alignment to
sample_modifiers: imply: - if: organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"] then: prealignments: human_rDNA rCRSd
You could accomplish the same thing in a less elegant way by adding these columns to your sample table so it looks something like this:
Using prealignments on the command-line
If you want to adjust this for a single sample run, you just pass the values on the command line:
/pipelines/pepatac.py \ --sample-name test \ --genome hg38 \ --prealignments human_rDNA rCRSd \ --input examples/data/test_r1.fq.gz \ --single-or-paired single \ -O $HOME/example/