How does PEPATAC handle technical or biological replicates?

Currently, PEPATAC intentionally does not incorporate replicate information because there is no universally accepted approach to dealing with replicates, which depends on the biology of the particular samples. Instead, we recommend a two-stage approach: First, individually run each replicate through the pipeline, and evaluate each replicate separately to ensure quality-control. Then, either merge scores of replicates at the peak level, or merge raw fastq files for replicates you wish to keep and re-run the pipeline on the merged sample. For an example of this, see Corces, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362, (Supplemental methods under "Constructing a counts matrix and normalization.").

When deciding whether or not to merge technical replicates, you should first follow basic QC procedures you would perform on any sample (see FAQ question below). But in addition, you can use a cross-replicate comparison to make sure the replicates correspond to one another. There are several ways to do this. For example, calculate the ATAC-seq log2(CPM*) correlation between each replicate.

*CPM = counts + (scaled prior count using edgeR) per million mapped reads (see Corces et al. (2018) Supplemental methods)

How do I know if my samples or replicates are high quality?

  • Look over the sample fragment length distribution plot(s). For a good quality sample you should observe a well-defined peak < 100-bp representing nucleosome-free regions, a second peak around 200-bp representing mono-nucleosomes, then sequentially weaker peaks representing multiple nucleosomes.
  • Observe the individual TSS enrichment scores for each sample, which is a representation of signal to noise. A score below 6 is a general cutoff for a sample to be "concerning." This is an empirical metric and may vary based on the individual data set, but represents a comfortable starting point.
  • Library complexity metrics (for complete explanations, see terms and definitions from ENCODE):

What if I need to restart a run?

There are two steps for restarting runs. When executing looper run, you can pass the --ignore-flags argument, e.g. looper run --looper-config /path_to_your/.looper_config.yaml --ignore-flags This will ignore any flags associated with the samples. Further reading: Sample Flags

If a run failed or timedout, there will be a lock on intermediate files. The pipeline interface can be modified to add the -R argument such that the pipeline manager (pypiper) will run in recover mode which will allow the pipeline to restart and proceed. Further reading: Pypiper CLI: built in arguments

Modify PEPATAC's sample pipeline interface, adding the -R argument to the command template, e.g:

command_template: >
  --output-parent { looper.results_subdir }
  --cores { compute.cores }
  --mem { compute.mem }
  --sample-name { sample.sample_name }
  --input { sample.read1 }