Welcome to Megalodon’s documentation!¶
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
Raw nanopore reads are processed by a single command to produce basecalls (FASTA/Q), reference mappings (SAM/BAM/CRAM), sequence variant calls (per-read and VCF) and modified base calls (per-read and bedgraph/bedmethyl/modVCF).
As of version 2.0, the primary megalodon run mode requires the guppy basecaller. See the community page for download/installation instructions.
All other requirements are handled by
If installing from source,
numpy must be installed before running installation for cython optimizations.
Required python packages are:
Taiyaki is no longer required to run megalodon, but installation is required for two specific run modes: 1) output mapped signal files (for basecall models training) 2) running the taiyaki basecalling backend (for neural network designs including experimental layers)
Megalodon is a command line tool.
conda are the recommended installation interfaces for megalodon.
pip install megalodon # or conda install megalodon
Megalodon is accessed via the command line interface
The path to the
guppy_basecall_server executable is required to run megalodon.
By default, megalodon assumes this path is
--guppy-server-path argument to specify a different path.
# megalodon help (common args) megalodon -h # megalodon help (all args) megalodon --help-long # Example command to output basecalls, mappings, variants and CpG methylation # Compute settings: GPU devices 0 and 1 with 40 CPU cores megalodon raw_fast5s/ \ --outputs basecalls mappings variants mods \ --reference reference.fa --variant-filename variants.vcf.gz \ --mod-motif Z CG 0 --devices 0 1 --processes 40 \ --verbose-read-progress 3
This command produces the
megalodon_results output directory containing all requested output files and logs.
The majority of Megalodon’s functionality is accessed via the
megalodon command (exemplified above), though a number of additional operations are made available via the
These operations include modified base or variant aggregation (much faster than re-computing per-read calls), modified base result validation, and model statistic calibration.
Helper commands to perform sequence variant phasing (details here Variant Phasing) are also included in
In the future these script will move to a dedicated command line interface (likely
- Megalodon Algorithm Details
- Common Arguments
- Advanced Megalodon Arguments
- Computing Considerations
- Variant Phasing
- File Formats
- Megalodon Model Training
- Megalodon Modified Base Model Training