MPUSP/snakemake-bacterial-riboseq
Bacterial-Riboseq: A Snakemake workflow for the analysis of riboseq data in bacteria.
Overview
Testing:
Last update: 2026-04-07
Latest release: v1.6.0
Topics: bioinformatics-pipeline conda riboseq ribosome-profiling singularity snakemake workflow
Configuration
The following configuration details are extracted from the config's README file.
Running the workflow
Input data
Reference genome
An NCBI Refseq ID, e.g. GCF_000006945.2. Find your genome assembly and corresponding ID on NCBI genomes. Alternatively use a custom pair of *.fasta file and *.gff file that describe the genome of choice.
Important requirements when using custom *.fasta and *.gff files:
*.gffgenome annotation must have the same chromosome/region name as the*.fastafile (example:NC_003197.2)*.gffgenome annotation must havegeneandCDStype annotation that is automatically parsed to extract transcripts- all chromosomes/regions in the
*.gffgenome annotation must be present in the*.fastasequence - but not all sequences in the
*.fastafile need to have annotated genes in the*.gfffile
Read data
Ribosome footprint sequencing data in *.fastq.gz format. The currently supported input data are single-end, strand-specific reads. Input data files are supplied via a mandatory table, whose location is indicated in the config.yml file (default: samples.tsv). The sample sheet has the following layout:
| sample | condition | replicate | fq1 |
|---|---|---|---|
| RPF-RTP1 | RPF-RTP | 1 | data/RPF-RTP1_R1_001.fastq.gz |
| RPF-RTP2 | RPF-RTP | 2 | data/RPF-RTP2_R1_001.fastq.gz |
Some configuration parameters of the pipeline may be specific for your data and library preparation protocol. The options should be adjusted in the config.yml file. For example:
- Minimum and maximum read length after adapter removal (see option
cutadapt: default). Here, the test data has a minimum read length of 15 + 7 = 22 (2 nt on 5'end + 5 nt on 3'end), and a maximum of 45 + 7 = 52. - Unique molecular identifiers (UMIs). For example, the protocol by McGlincy & Ingolia, 2017 creates a UMI that is located on both the 5'-end (2 nt) and the 3'-end (5 nt). These UMIs are extracted with
umi_tools(see optionsumi_extraction: methodandpattern).
Example configuration files for different sequencing protocols can be found in resources/protocols/.