MPUSP/snakemake-ont-basecalling

A Snakemake workflow for basecalling and demultiplexing of Oxford Nanopore data using Dorado.

Overview

Testing:

Last update: 2026-01-08

Latest release: v1.5.2

Topics: basecalling cluster dorado nanopore-sequencing oxford-nanopore parallel-computing slurm snakemake snakemake-workflow

Authors: @m-jahn @rabioinf

Configuration

The following configuration details are extracted from the config's README file.

Running the workflow

Input data

This workflow requires pod5 input data. These input files are supplied to the workflow using a mandatory runs table linked in the config.yml file (default: .test/config/runs.csv). Each row in the runs table corresponds to a single run, for which all pod5 files are provided via a data_folder column. Multiple runs can be defined in the table. The runs table has the following layout:

run_id	data_folder	basecalling_model	barcode_kit
MK1C_run_01	".test/data"	dna_r10.4.1_e8.2_400bps_sup@v5.0.0	SQK-PCB114-24

Execution

To define rule specific resources like GPU usage, configuration profiles will be used. See snakemake docs on profiles for more information. A default profile for local testing and a slurm specific cluster profile is provided with this workflow.

To run the workflow from command line, change to the working directory and activate the conda environment.

cd snakemake-ont-basecalling
conda activate snakemake-ont-basecalling

Adjust options in the default config file config/config.yml. Before running the entire workflow, perform a dry run using:

snakemake --cores 3 --sdm conda --directory .test --dry-run

To run the workflow with test files using conda:

snakemake --cores 3 --sdm conda --directory .test

To run the workflow with test files using conda and apptainer, set the dorado path to /share/resources/dorado-<version>-linux-x64/bin/dorado and make it available for apptainer using bind:

snakemake --cores 3 --sdm conda apptainer --directory .test --apptainer-args "--bind ../resources:/share/resources"

To run the workflow with test files on a slurm cluster, adjust the slurm-specific profile workflow/profiles/slurm/config.yaml file and run:

snakemake --cores 3 --sdm conda --workflow-profile workflow/profiles/slurm/ --directory .test

Note: It is recommended to start the snakemake pipeline on the cluster using a session multiplexer like screen or tmux.

Parameters

This table lists all parameters that can be used to run the workflow.

Parameter	Type	Details	Default
input
runs	string	table with sequencing runs	`config/runs.csv`
file_extension	string	extension for input files	`pod5`
file_regex	string	pattern to match input files	`[A-Z]{3}[0-9]{5}...`
barcodes	string	used barcodes for demultiplexing	`1-24`
dorado
path	string	path to the Dorado executable
simplex / cuda	string	CUDA device: `auto`, `cuda:0`, `cuda:all`	`cuda:all`
simplex / trim	string	`all` or `none`	`none`
simiplex / extra	string	params passed to dorado basecaller	`""`
demultiplexing	bool	whether to perform demultiplexing	`True`
report
tools	array	list of tools to include in the report	`["pycoQC", "NanoPlot"]`

Overview​

Configuration​

Running the workflow​

Input data​

Execution​

Parameters​