This guide walks you through extending
PEPATAC to run on multiple samples using
looper. The pipeline can be run directly from the command line for a single sample (see Install and run). If you need to run it on many samples, you could write your own sample handling code, but we have pre-configured everything to work nicely with
looper, our sample handling engine.
Looper is a pipeline submission engine that makes it easy to deploy any pipeline across samples. It will let you run the jobs locally, in containers, using any cluster resource manager, or in containers on a cluster.
You can install
pip install --user loopercli
Start by running the example project in the
examples/test_project/ folder. Let's use the looper's
-d argument to do a dry run, which will create job scripts for every sample in the project, but will not execute them:
cd pepatac looper run -d examples/test_project/test_config.yaml
If the looper executable is not in your
$PATH, add the following line to your
If that worked, let's actually run the example by taking out the
looper run examples/test_project/test_config.yaml
There are lots of other cool things you can do with looper, like dry runs, summarize results, check on pipeline run status, clean intermediate files to save disk space, lump multiple samples into one job, and more. For details, consult the looper docs.
To run your own samples, you'll need to organize them in PEP format, which is explained in how to create a PEP and is universal to all pipelines that read PEPs, including PEPATAC. To get you started, there are multiple examples you can adapt in the
examples/ folder (e.g. example test PEP). In short, you need two files for your project:
The sample annotation file must specify these columns: