We have produced both docker and singularity containers that hold all the necessary software for
PEPATAC. You can run
PEPATAC as an individual pipeline on a single sample using these containers by directly calling
docker run or
singularity exec. Or, you can rely on
looper, which is already set up to run any pipeline in existing containers using the
divvy templating system. Instructions for both follow:
First, make sure your environment is set up to run either docker or singularity containers. Then, pull the container image:
Docker: You can pull the docker databio/pepatac image from dockerhub like this:
docker pull databio/pepatac
Or build the image using the included Dockerfile (you can use a recipe in the included Makefile):
cd pepatac/ make docker
Singularity: You can download the singularity image or build it from the docker image using the Makefile:
cd pepatac/ make singularity
Now you'll need to tell the pipeline where you saved the singularity image. You can either create an environment variable called
$SIMAGES that points to the folder where your image is stored, or you can tweak the
pipeline_interface.yaml file so that the
compute.singularity_image attribute is pointing to the right location on disk.
If your containers are set up correctly, then won't need to install any additional software.
Individual jobs can be run in a container by simply running the
pepatac.py command through
docker run or
singularity exec. You can run containers either on your local computer, or in an HPC environment, as long as you have
singularity installed. For example, run it locally in singularity like this:
singularity exec --bind $GENOMES $SIMAGES/pepatac pipelines/pepatac.py --help
docker, you can use:
docker run --rm -it databio/pepatac pipelines/pepatac.py --help
Be sure to mount the volumes you need with
--volume. If you're utilizing any environment variables (e.g.
$GENOMES), don't forget to include those in your docker command with the
-e option. For a more detailed example, check out our guide to learn how to run pepatac in a container.
The pipeline has been successfully run in both a Linux and MacOS environment. With
docker you need to bind mount your volume that contains the pipeline and your
$GENOMES location, as well as provide the container the same environment variables your host environment is using.
In the first example, we're mounting our home user directory (
/home/jps3ag/) which contains the parent directories to our
$GENOMES folder and to the pipeline itself. We'll also provide the pipeline two environment variables,
Here's that example command in a Linux environment to run the test example through the pipeline:
docker run --rm -it --volume /home/jps3ag/:/home/jps3ag/ \ -e GENOMES='/home/jps3ag/genomes/' \ -e HOME='/home/jps3ag/' \ databio/pepatac \ /home/jps3ag/src/pepatac/pipelines/pepatac.py --single-or-paired paired \ --prealignments rCRSd human_repeats \ --genome hg38 \ --sample-name test1 \ --input /home/jps3ag/src/pepatac/examples/data/test1_r1.fastq.gz \ --input2 /home/jps3ag/src/pepatac/examples/data/test1_r2.fastq.gz \ --genome-size hs \ -O $HOME/pepatac_test
In this second example, we'll perform the same command in a Mac environment using Docker for Mac.
This necessitates a few minor changes to run that same example:
Remember to allocate sufficient memory (6-8GB should generally be more than adequate) in Docker for Mac.
docker run --rm -it --volume /Users/jps3ag/:/Users/jps3ag/ \ -e GENOMES="/Users/jps3ag/genomes" \ -e HOME="/Users/jps3ag/" \ databio/pepatac \ /Users/jps3ag/src/pepatac/pipelines/pepatac.py --single-or-paired paired \ --prealignments rCRSd human_repeats \ --genome hg38 \ --sample-name test1 \ --input /Users/jps3ag/src/pepatac/examples/data/test1_r1.fastq.gz \ --input2 /Users/jps3ag/src/pepatac/examples/data/test1_r2.fastq.gz \ --genome-size hs \ -O $HOME/pepatac_test
To run multiple samples in a container, you simply need to configure
looper to use a container-compatible template. The looper documentation has detailed instructions for how to run pipelines in containers.