Katuali

Katuali is a flexible consensus pipeline implemented in Snakemake to basecall, assemble, and polish Oxford Nanopore Technologies’ sequencing data.

Features

  • fast5 to high quality consensus in a single command.

  • Recommended fixed standard and fast pipelines.

  • Interchange basecaller, assembler, and consensus components of the pipelines simply by changing the target filepath.

  • Medaka training pipeline including generation of training data, model training and model evaluation.

  • Seemless distribution of tasks over local or distributed compute.

  • Open source (Mozilla Public License 2.0).

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

Quickstart

The Katuali Basic Usage and Tests contain examples of how to basecall, assemble, and polish a small dataset that comes bundled with Katuali.

To run with other data, start by creating a directory of reads (which could contain subdirectories of reads) within a run directory (run1 in this example):

mkdir -p run1
cd run1
ln -s /path/to/fast5 reads  # create a softlink to the fast5 data
cd ..

Then make a copy of the katuali config into your working directory;

katuali_config my_config.yaml

and update the katuali config to reflect your data:

DATA:
    'run1':
        'GENOME_SIZE': '4.0M'  # for canu/flye we need to specify genome size

There are two predefined pipelines that can be used starting from fast5 input:

  1. To basecall the reads, assemble them with miniasm, and polish the assembly with racon and medaka simply run:

    katuali fast_assm_polish
    
  2. To basecall, assemble with canu then polish with racon and medaka run:

    katuali standard_assm_polish
    

See Custom pipelines for details on creating flexible multistep pipelines.

Indices and tables