Configure peak callers

PEPATAC has the ability to use five different peak callers out of the box. If you're running the pipeline via containers, you should already have this capability, but if you installed requirements natively, you may still need to install additional peak callers should you wish to use them.

Default peak caller

PEPATAC uses MACS2 as the default peak caller. You do not need to specify any additional parameters to use MACS2. The default approach is to define fixed width peaks, to facilitate the generation of consensus peaks, which you can read more about here.

For a single sample, you may wish to enable variable width peak calling to achieve a more nuanced and specific peak call for a particular sample. Or maybe you don't wish to generate consensus peaks at all, and you want this for all your samples. You can still use MACS2 and specify PEPATAC to use --peak-type variable to call peaks in this manner.
Here's an example with the included test sample:

# This project config file describes your project. See looper docs for details.
name: test_project # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: test_annotation.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: pepatac_test 
  pipeline_interfaces: ../../project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository. 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        macs_genome_size: hs
        prealignments: rCRSd
        aligner: bowtie2         # [options: bowtie2(default), bwa]
        deduplicator: samblaster # [options: samblaster(default), picard, samtools]
        trimmer: skewer          # [options: skewer (default), pyadapt, trimmomatic]
        peak_caller: macs2       # [options: macs2 (default), fseq, genrich, hmmratac, homer]
        peak_type: variable      # [options: fixed(default), variable]
        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run

Furthermore, you can customize the settings for variable width peak calling in the pipeline configuration file, pepatac.yaml. For example, you can change the 'params:' line for macs2_variable settings:

macs2_variable:
    params: '-f BED -q 0.01 --shift 0 --nomodel'

Specifying a different peak caller

For each of the alternative peak callers, you can specify them on the command line with the --peak-caller <peak_caller> --peak-type variable variables, or you can specify them in your PEP project configuration file. You must tell PEPATAC to use --peak-type variable when specifying alternative peak callers.

Here are examples with the included test sample:

F-Seq

# This project config file describes your project. See looper docs for details.
name: test_project # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: test_annotation.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: pepatac_test 
  pipeline_interfaces: ../../project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository. 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        macs_genome_size: hs
        prealignments: rCRSd
        aligner: bowtie2         # [options: bowtie2(default), bwa]
        deduplicator: samblaster # [options: samblaster(default), picard, samtools]
        trimmer: skewer          # [options: skewer (default), pyadapt, trimmomatic]
        peak_caller: fseq        # [options: macs2 (default), fseq, genrich, hmmratac, homer]
        peak_type: variable      # [options: fixed(default), variable]
        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run

HMMRATAC

# This project config file describes your project. See looper docs for details.
name: test_project # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: test_annotation.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: pepatac_test 
  pipeline_interfaces: ../../project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository. 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        macs_genome_size: hs
        prealignments: rCRSd
        aligner: bowtie2         # [options: bowtie2(default), bwa]
        deduplicator: samblaster # [options: samblaster(default), picard, samtools]
        trimmer: skewer          # [options: skewer (default), pyadapt, trimmomatic]
        peak_caller: hmmratac    # [options: macs2 (default), fseq, genrich, hmmratac, homer]
        peak_type: variable      # [options: fixed(default), variable]
        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run

HOMER

# This project config file describes your project. See looper docs for details.
name: test_project # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: test_annotation.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: pepatac_test 
  pipeline_interfaces: ../../project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository. 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        macs_genome_size: hs
        prealignments: rCRSd
        aligner: bowtie2         # [options: bowtie2(default), bwa]
        deduplicator: samblaster # [options: samblaster(default), picard, samtools]
        trimmer: skewer          # [options: skewer (default), pyadapt, trimmomatic]
        peak_caller: homer       # [options: macs2 (default), fseq, genrich, hmmratac, homer]
        peak_type: variable      # [options: fixed(default), variable]
        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run

Genrich

# This project config file describes your project. See looper docs for details.
name: test_project # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: test_annotation.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: pepatac_test 
  pipeline_interfaces: ../../project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository. 

sample_modifiers:
  append:
    pipeline_interfaces: ../../sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      test_data_R1: "examples/data/{sample_name}_r1.fastq.gz"
      test_data_R2: "examples/data/{sample_name}_r2.fastq.gz"
  imply:
    - if: 
        organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
      then: 
        genome: hg38
        macs_genome_size: hs
        prealignments: rCRSd
        aligner: bowtie2         # [options: bowtie2(default), bwa]
        deduplicator: samblaster # [options: samblaster(default), picard, samtools]
        trimmer: skewer          # [options: skewer (default), pyadapt, trimmomatic]
        peak_caller: genrich     # [options: macs2 (default), fseq, genrich, hmmratac, homer]
        peak_type: variable      # [options: fixed(default), variable]
        extend: "250"            # Default. For fixed-width peaks, extend this distance up- and down-stream.
        frip_ref_peaks: None     # Default. Use an external reference set of peaks instead of the peaks called from this run

Each peak caller also has a separate parameterization option in the pipeline configuration file, pepatac.yaml.

For example:

  fseq:
    params: '-of npf -l 600 -t 4.0 -s 1'
  genrich:
    params: ''
    # -j: ATAC-seq mode on by default in the main pipeline, pepatac.py
  hmmratac:
    params: '--fragmem True --upper 10 --lower 5 --peaks True --window 500000'
  homer_findpeaks:
    params: '-minDist 150 -region'