Tombo Commands

This page contains breif descriptions of the most common Tombo commands. For more detail on each commands’ options and further algorithm details, please see the corresponding documentation sections.

Re-squiggle (Raw Signal Genomic Alignment)

The re-squiggle algorithm aligns raw signal (electric current nanopore measurements) to reference genomic or transcriptomic sequence.

One of the major assumptions of the re-squiggle algorithm is that the provided reference sequence is correct. Thus for a poorly assembled reference or divergent sample, an assembly polishing step (possibly from the same data/sample) may improve results.

The resquiggle command will add infomation (the mapped reference location and the raw signal to sequence assignment) to the read files provided (in FAST5 format), as well as producing an index file for more efficient file access in downstream commands.

Important

The resquiggle command must be run before any further processing by Tombo commands.

Note: Tombo currenly includes default canonical models for both DNA or RNA data (supporting R9.4 and R9.5; 1D and 1D^2; R9.*.1 chemistries). Analysis of other nanopore data types is not supported at this time (e.g. R7 data). If DNA or RNA sample type is not explicitly specified (via --dna or --rna options) the sample type will be detected automatically from the raw read files.

For more details see the re-squiggle documentation.

# annotate raw FAST5s with FASTQ files produced from the same reads
# skip this step if raw read files already contain basecalls
tombo preprocess annotate_raw_with_fastqs --fast5-basedir <fast5s-base-directory> --fastq-filenames <reads.fastq>

tombo resquiggle <fast5s-base-directory> <reference-fasta> --processes 4 --num-most-common-errors 5

Modified Base Detection

Tombo provides four (including two types of sample comparison) methods for the investigation of modified bases (within the tombo detect_modifications command group). Each method has different advantages and requirements.


_images/testing_method_comparison.png

Tombo modified base testing methods.


All modified base detection methods, except the level_sample_compare method, produce per-read, per-reference position test statistics (which can be saved via the --per-read-statistics-basename option). A threshold is then applied to these statistics to produce an estimate for the fraction of reads that appear modified at each reference location.

  1. Specific alternative base detection (recommended)

  • Run using tombo detect_modifications alternative_model command.

  • This method identifies sites where signal matches a specific alternative base expected signal levels better than the canonical expected levels producing a statistic similar to a log likelihood ratio.

  • All-context alternative DNA models are currently available for 5-methylcytosine (5mC) and N6-methyladenosine (6mA; not currently available for RNA).

  • More accurate motif specific models are available for dam and dcm methylation (found in E. coli) and CpG methylation (found in human samples).

  1. De novo canonical model comparison

  • Run using tombo detect_modifications de_novo command.

  • This method identifies sites where signal deviates from the expected canonical signal levels.

  • While this method has the highest error rates, it can be used effectively on any sample and is particularly useful for motif discovery for motif-specific modifications (e.g. bacterial samples).

  1. Sample comparison

    • These two methods requires the production of a second set of reads for comparison.

    1. level_sample_compare

      • Run using tombo detect_modifications level_sample_compare command.

      • This method performs either a ks-test (default), u-test or t-test to identify sites where signal levels deviate between two samples.

      • This method is ideal for high coverage samples (in order to accurately estimate the effect size measures) and comparison of two potentailly non-canonical samples (e.g. a KO experiment).

    2. model_sample_compare

      • Run using tombo detect_modifications model_sample_compare command.

      • This uses a canonical control sample to adjust the expected signal levels due to un-modeled effects.

      • This adjusted model is then used to test for modifications as in the de novo method.

      • This was the sample_compare method prior to version 1.5.

Both the sample comparison and the de novo methods may not identify the exact modified base location (as the shifted signal does not always center on a modified base) and gives no information as to the identity of a modified base.

The result of all tombo detect_modifications calls will be a binary statistics file(s), which can be passed to other Tombo commands.

For more details see the modified base detection documentation.

Text Output

Genome Browser File Output

In order to output re-squiggle and/or modified base detection results in a genome browser compatible format (either wiggle format or bedgraph format), the tombo text_output genome_browser command is provided.

tombo text_output browser_files --fast5-basedirs <fast5s-base-directory> \
    --statistics-filename sample_alt_model.CpG.tombo.stats \
    --browser-file-basename sample_alt_model --file-types dampened_fraction coverage

Hint

All --file-types available are:

  • fraction, dampened_fraction, and valid_coverage derived from a (non-level_sample_compare) statistics file

  • statistic derived from a level_sample_compare statistics files

  • coverage derived from the Tombo index (fast)

  • signal, signal_sd, dwell, and difference derived from read FAST5 files (slow)

Reference Sequence Output

For modified base analysis pipelines (e.g. motif detection), it may be useful to output the reference sequence surrounding the most likely modified sites. The text_output signif_sequence_context command is provided for this purpose.

tombo text_output signif_sequence_context --statistics-filename sample_alt_model.6mA.tombo.stats \
    --genome-fasta <reference-fasta> --sequences-filename sample_alt_model.6mA.most_signif.fasta

Example meme command line modified base motif detection command.

./meme -oc motif_output.meme -dna -mod zoops sample_alt_model.6mA.most_signif.fasta

For more details see the text output documentation.

Plotting Commands

Tombo provides many plotting functions for the visualization of modified bases within raw nanopore signal.

Most plotting commands are reference-anchored. That is the normalized raw signal is plotted as the re-squiggle algorithm has assigned it to the reference sequence.

Each reference anchored plotting command allows for the selection of reference positions based on generally applicable criterion.

tombo plot max_coverage --fast5-basedirs <fast5s-base-directory> --plot-standard-model

tombo plot motif_centered --fast5-basedirs <fast5s-base-directory> --motif AWC \
    --genome-fasta genome.fasta --control-fast5-basedirs <control-fast5s-base-directory>

tombo plot per_read --per-read-statistics-filename <per_read_statistics.tombo.stats> \
    --genome-locations chromosome:1000 chromosome:2000:- \
    --genome-fasta genome.fasta

Note

For regions with higher coverage, several over-plotting options are available. For those options producing a distribution, these are taken over each reads average signal assigned to a base. This requires extraction of these levels from all relevant FAST5 files and thus can be slow for very deep coverage regions.

For more details see the plotting documentation.

Read Filtering

Read filtering commands can be useful to extract the most out out of a set of reads for modified base detection. Read filtering commands effect only the Tombo index file, and so filters can be cleared or applied iteratively without re-running the re-squiggle command. Five filters are currently made available (genome_locations, raw_signal_matching, q_score, level_coverage and stuck).

# filter reads to a specific genomic location
tombo filter genome_locations --fast5-basedirs path/to/native/rna/fast5s/ \
    --include-regions chr1:0-10000000

# apply a more strigent observed to expected signal score (default: 1.1 for DNA reads)
tombo filter raw_signal_matching --fast5-basedirs path/to/native/rna/fast5s/ \
    --signal-matching-score 1.0

For more details see the filter documentation.