Welcome to Megalodon’s documentation!¶
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
Raw nanopore reads are processed by a single command to produce basecalls (FASTA/Q), reference mappings (SAM/BAM/CRAM), modified base calls (per-read and aggregated per-reference site), sequence variant calls (per-read and aggregated per-reference site) and more.
Prerequisites¶
The primary Megalodon run mode requires the Guppy basecaller (version >= 4.0). See the community page for download/installation instructions [login required].
Megalodon is a python-based command line software package.
Given a python (version >= 3.5) installation, all other requirements are handled by pip or conda.
Taiyaki is no longer required to run Megalodon, but installation is required for two specific run modes:
output mapped signal files (for basecall model training)
running the Taiyaki basecalling backend (for neural network designs including experimental layers)
Installation¶
pip is recommended for Megalodon installation.
pip install megalodon
conda installation is available, but not fully supported.
ont_pyguppy_client_lib is not available on conda and thus must be installed with pip.
conda install megalodon
pip install ont_pyguppy_client_lib
To install from github source for development, the following commands can be run.
git clone https://github.com/nanoporetech/megalodon
pip install -e megalodon/
It is recommended that Megalodon be installed in a control compute environment. See the python documentation for preparing virtual environments
Quick Start¶
Megalodon must obtain the intermediate output from the basecall neural network.
Guppy (production nanopore basecalling software) is the recommended backend to obtain this output from raw nanopore signal (from FAST5 files).
Nanopore basecalling is compute intensive and thus it is highly recommended that GPU resources are specified (--devices) for optimal Megalodon performance.
Megalodon is accessed via the command line interface megalodon command.
# megalodon help (common args)
megalodon -h
# megalodon help (advanced args)
megalodon --help-long
# Example command to output basecalls, mappings, and 5mC CpG methylation in both per-read (``mod_mappings``) and aggregated (``mods``) formats
# Compute settings: GPU devices 0 and 1 with 40 CPU cores
megalodon \
raw_fast5s/ \
--outputs basecalls mappings mod_mappings mods \
--reference reference.fa --mod-motif m CG 0 \
--devices 0 1 --processes 40
This command produces the megalodon_results output directory containing all requested output files and logs.
The format for common outputs is described briefly below and in more detail in the full documentation
The above command uses the modified base model included in Guppy.
As of the 2.3.0 megalodon release (March 2021) the models included with Guppy (4.5.2) provide the most accurate modified basecalling models.
As more accurate basecalling models are trained, they are first released into the Rerio repository for research models.
Once training pipelines are more thoroughly standardized and tested models will be transferred into Guppy.
The code below shows how to obtain and run the R9.4.1, MinION/GridION, 5mC CpG model from Rerio.
Note that this is the same model now included in Guppy 4.5.2.
# Obtain and run R9.4.1, MinION, 5mC CpG model from Rerio
git clone https://github.com/nanoporetech/rerio
rerio/download_model.py rerio/basecall_models/res_dna_r941_min_modbases_5mC_CpG_v001
megalodon \
raw_fast5s/ \
--guppy-params "-d ./rerio/basecall_models/" \
--guppy-config res_dna_r941_min_modbases_5mC_CpG_v001.cfg \
--outputs basecalls mappings mod_mappings mods \
--reference reference.fa --mod-motif m CG 0 \
--devices 0 1 --processes 40
The path to the
guppy_basecall_serverexecutable is required to run Megalodon. By default, Megalodon assumes Guppy (Linux GPU) is installed in the current working directory (i.e../ont-guppy/bin/guppy_basecall_server). Use the--guppy-server-pathargument to specify a different path.
Contents¶
- Megalodon Algorithm Details
- Common Arguments
- Advanced Megalodon Arguments
- Computing Considerations
- Variant Phasing
- File Formats
- Megalodon Model Training
- Megalodon Modified Base Model Training
megalodon_extras aggregatemegalodon_extras calibratemegalodon_extras calibrate generate_modified_base_statsmegalodon_extras calibrate generate_mod_stats_from_msfmegalodon_extras calibrate generate_variant_statsmegalodon_extras calibrate modified_basesmegalodon_extras calibrate merge_modified_basesmegalodon_extras calibrate merge_modified_bases_statsmegalodon_extras calibrate variants
megalodon_extras mergemegalodon_extras modified_basesmegalodon_extras modified_bases describe_alphabetmegalodon_extras modified_bases estimate_threshold- [DEPRECATED]
megalodon_extras modified_bases update_database megalodon_extras modified_bases create_ground_truthmegalodon_extras modified_bases create_motif_bedmegalodon_extras modified_bases per_site_thresholdsmegalodon_extras modified_bases index_databasemegalodon_extras modified_bases split_by_motif
megalodon_extras phase_variantsmegalodon_extras per_read_textmegalodon_extras validatemegalodon_extras variants