Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Summarizing a modBAM.

The modkit modbam summary sub-command is intended for collecting read-level statistics on either a sample of reads, a region, or an entire modBam. It is important to note that the default behavior of modkit summary is to take a sample of the reads to get a quick estimate.

Summarize the base modification calls in a modBAM.

modkit modbam summary input.bam 

will output a table similar to this

> parsing region chr20  # only present if --region option is provided
> sampling 10042 reads from BAM # modulated with --num-reads
> calculating threshold at 10% percentile # modulated with --filter-percentile
> calculated thresholds: C: 0.7167969 # calculated per-canonical base, on the fly
# bases             C
# total_reads_used  9989
# count_reads_C     9989
# pass_threshold_C  0.7167969
# region            chr20:0-64444167
 base  code  pass_count  pass_frac   all_count  all_frac
 C     m     1192533     0.58716166  1305956    0.5790408
 C     h     119937      0.0590528   195335     0.086608544
 C     -     718543      0.3537855   754087     0.33435062

Description of columns in modkit summary:

Totals table

The lines of the totals table are prefixed with a # character.

rownamedescriptiontype
1basescomma-separated list of canonical bases with modification calls.str
2total_reads_usedtotal number of reads from which base modification calls were extractedint
3+count_reads_{base}total number of reads that contained base modifications for {base}int
4+filter_threshold_{base}filter threshold used for {base}float

Modification calls table

The modification calls table follows immediately after the totals table.

columnnamedescriptiontype
1basecanonical base with modification callchar
2codebase modification code, or - for canonicalchar
3pass_counttotal number of passing (confidence >= threshold) calls for the modification in column 2int
4pass_fracfraction of passing (>= threshold) calls for the modification in column 2float
5all_counttotal number of calls for the modification code in column 2int
6all_fracfraction of all calls for the modification in column 2float

For more details on thresholds see filtering base modification calls.

By default modkit modbam summary will only use ten thousand reads when generating the summary (or fewer if the modBAM has fewer than that). To use all of the reads in the modBAM set the --no-sampling flag.

modkit modbam summary input.bam --no-sampling

There are --no-filtering, --filter-percentile, and --filter-threshold options that can be used with or without sampling.

Passing a threshold directly.

To estimate the pass thresholds on a subset of reads, but then summarize all of the reads, there is a two-step process. First, determine the thresholds with modkit modbam sample-probs (see usage for more details). Then run modkit modbam summary with the threshold value specified.

modkit modbam sample-probs input.bam [--sampling-frac <frac> | --num-reads <num>]

This command will output a table like this:

> sampling 10042 reads from BAM
 base  percentile  threshold
 C     10          0.6972656
 C     50          0.96484375
 C     90          0.9941406

You can then use pass this threshold directly to modkit modbam summary:

modkit modbam summary input.bam \
    --filter-threshold 0.6972656 \ # filter 10% lowest confidence calls
    --no-sampling