Skip to main content

MPUSP/snakemake-assembly-postprocessing

A Snakemake workflow for the post-processing of microbial genome assemblies.

Overview

Testing: GitHub Actions Workflow Status GitHub Actions Workflow Status

Last update: 2026-04-05

Latest release: v1.1.0

Topics: apptainer bacteria conda genome-assembly genome-sequencing microbes pipeline postprocessing quality-control snakemake-workflow genomics

Authors: @m-jahn @rabioinf

Configuration

The following configuration details are extracted from the config's README file.


Workflow overview

A Snakemake workflow for the post-processing of microbial genome assemblies.

  1. Parse samples.csv table containing the samples's meta data (python)
  2. Annotate assemblies using one of the following tools:
    1. NCBI's Prokaryotic Genome Annotation Pipeline (PGAP). Note: needs to be installed manually
    2. prokka, a fast and light-weight prokaryotic annotation tool
    3. bakta, a fast, alignment-free annotation tool. Note: Bakta will automatically download its companion database from zenodo (light: 1.5 GB, full: 40 GB)
  3. Create a QC report for the assemblies using Quast
  4. Create a pangenome analysis (orthologs/homologs) using Panaroo
  5. Compute pairwise average nucleotide identity (ANI) between the assemblies using FastANI and plot a phylogenetic tree based on the ANI distances.

Running the workflow

Input data

This workflow requires fasta input data. The samplesheet table has the following layout:

samplespeciesstrainid_prefixfile
EC2224"Streptococcus pyogenes"SF370SPYassembly.fasta
...............

Note: Pangenome analysis with Panaroo and pairwise similarity analysis with FastANI requires at least two samples.