Welcome to Megalodon’s documentation!¶
Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transcriptome.
Raw nanopore reads are processed by a single command to produce basecalls (FASTA/Q), reference mappings (SAM/BAM/CRAM), modified base calls (per-read and aggregated per-reference site), sequence variant calls (per-read and aggregated per-reference site) and more.
Prerequisites¶
The primary Megalodon run mode requires the Guppy basecaller (version >= 4.0). See the community page for download/installation instructions [login required].
Megalodon is a python-based command line software package.
Given a python (version >= 3.5) installation, all other requirements are handled by pip
or conda
.
Taiyaki is no longer required to run Megalodon, but installation is required for two specific run modes:
output mapped signal files (for basecall model training)
running the Taiyaki basecalling backend (for neural network designs including experimental layers)
Installation¶
pip
is recommended for Megalodon installation.
pip install megalodon
conda
installation is available, but not fully supported.
ont_pyguppy_client_lib
is not available on conda and thus must be installed with pip
.
conda install megalodon
pip install ont_pyguppy_client_lib
To install from github source for development, the following commands can be run.
git clone https://github.com/nanoporetech/megalodon
pip install -e megalodon/
It is recommended that Megalodon be installed in a control compute environment. See the python documentation for preparing virtual environments
Quick Start¶
Megalodon must obtain the intermediate output from the basecall neural network.
Guppy (production nanopore basecalling software) is the recommended backend to obtain this output from raw nanopore signal (from FAST5 files).
Nanopore basecalling is compute intensive and thus it is highly recommended that GPU resources are specified (--devices
) for optimal Megalodon performance.
Megalodon is accessed via the command line interface megalodon
command.
# megalodon help (common args)
megalodon -h
# megalodon help (advanced args)
megalodon --help-long
# Example command to output basecalls, mappings, and 5mC CpG methylation in both per-read (``mod_mappings``) and aggregated (``mods``) formats
# Compute settings: GPU devices 0 and 1 with 40 CPU cores
megalodon \
raw_fast5s/ \
--outputs basecalls mappings mod_mappings mods \
--reference reference.fa --mod-motif m CG 0 \
--devices 0 1 --processes 40
This command produces the megalodon_results
output directory containing all requested output files and logs.
The format for common outputs is described briefly below and in more detail in the full documentation
The above command uses the modified base model included in Guppy.
As of the 2.3.0
megalodon release (March 2021) the models included with Guppy (4.5.2
) provide the most accurate modified basecalling models.
As more accurate basecalling models are trained, they are first released into the Rerio repository for research models.
Once training pipelines are more thoroughly standardized and tested models will be transferred into Guppy.
The code below shows how to obtain and run the R9.4.1, MinION/GridION, 5mC CpG model from Rerio.
Note that this is the same model now included in Guppy 4.5.2
.
# Obtain and run R9.4.1, MinION, 5mC CpG model from Rerio
git clone https://github.com/nanoporetech/rerio
rerio/download_model.py rerio/basecall_models/res_dna_r941_min_modbases_5mC_CpG_v001
megalodon \
raw_fast5s/ \
--guppy-params "-d ./rerio/basecall_models/" \
--guppy-config res_dna_r941_min_modbases_5mC_CpG_v001.cfg \
--outputs basecalls mappings mod_mappings mods \
--reference reference.fa --mod-motif m CG 0 \
--devices 0 1 --processes 40
The path to the
guppy_basecall_server
executable is required to run Megalodon. By default, Megalodon assumes Guppy (Linux GPU) is installed in the current working directory (i.e../ont-guppy/bin/guppy_basecall_server
). Use the--guppy-server-path
argument to specify a different path.
Contents¶
- Megalodon Algorithm Details
- Common Arguments
- Advanced Megalodon Arguments
- Computing Considerations
- Variant Phasing
- File Formats
- Megalodon Model Training
- Megalodon Modified Base Model Training
megalodon_extras aggregate
megalodon_extras calibrate
megalodon_extras calibrate generate_modified_base_stats
megalodon_extras calibrate generate_mod_stats_from_msf
megalodon_extras calibrate generate_variant_stats
megalodon_extras calibrate modified_bases
megalodon_extras calibrate merge_modified_bases
megalodon_extras calibrate merge_modified_bases_stats
megalodon_extras calibrate variants
megalodon_extras merge
megalodon_extras modified_bases
megalodon_extras modified_bases describe_alphabet
megalodon_extras modified_bases estimate_threshold
- [DEPRECATED]
megalodon_extras modified_bases update_database
megalodon_extras modified_bases create_ground_truth
megalodon_extras modified_bases create_motif_bed
megalodon_extras modified_bases per_site_thresholds
megalodon_extras modified_bases index_database
megalodon_extras modified_bases split_by_motif
megalodon_extras phase_variants
megalodon_extras per_read_text
megalodon_extras validate
megalodon_extras variants