MPUSP/snakemake-assembly-postprocessing
A Snakemake workflow for the post-processing of microbial genome assemblies.
Overview
Testing:
Last update: 2026-04-05
Latest release: v1.1.0
Topics: apptainer bacteria conda genome-assembly genome-sequencing microbes pipeline postprocessing quality-control snakemake-workflow genomics
Configuration
The following configuration details are extracted from the config's README file.
Workflow overview
A Snakemake workflow for the post-processing of microbial genome assemblies.
- Parse
samples.csvtable containing the samples's meta data (python) - Annotate assemblies using one of the following tools:
- NCBI's Prokaryotic Genome Annotation Pipeline (PGAP). Note: needs to be installed manually
- prokka, a fast and light-weight prokaryotic annotation tool
- bakta, a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
- Create a QC report for the assemblies using Quast
- Create a pangenome analysis (orthologs/homologs) using Panaroo
- Compute pairwise average nucleotide identity (ANI) between the assemblies using FastANI and plot a phylogenetic tree based on the ANI distances.
Running the workflow
Input data
This workflow requires fasta input data.
The samplesheet table has the following layout:
| sample | species | strain | id_prefix | file |
|---|---|---|---|---|
| EC2224 | "Streptococcus pyogenes" | SF370 | SPY | assembly.fasta |
| ... | ... | ... | ... | ... |
Note: Pangenome analysis with Panaroo and pairwise similarity analysis with FastANI requires at least two samples.