Inspecting base modification probabilities
For details on how base modification probabilities are calculated, see the FAQ page
For most use cases the automatic filtering enabled in modkit will produce nearly ideal results.
However, in some cases such as exotic organisms or specialized assays, you may want to interrogate the base modification probabilities directly and tune the pass thresholds.
The modkit sample-probs command is designed for this task.
There are two ways to use this command, first by simply running modkit sample-probs $mod_bam to get a tab-separated file of threshold values for each modified base.
This can save time in downstream steps where you wish to re-use the threshold value by passing --filter-threshold and skip re-estimating the value.
To generate more advanced output, add --hist --out-dir $output_dir to the command and generate per-modification histograms of the output probabilities.
Using the command this way produces 3 files in the $output_dir:
- An HTML document containing a histogram of the total counts of each probability emitted for each modification code (including canonical) in the sampled reads.
- Another HTML document containing the proportions of each probability emitted.
- A tab-separated table with the same information as the histograms and the percentile rank of each probability value.
The schema of the table is as follows:
| column | name | description | type | 
|---|---|---|---|
| 1 | code | modification code or '-' for canonical | string | 
| 2 | primary base | the primary DNA base for which the code applies | string | 
| 3 | range_start | the inclusive start probability of the bin | float | 
| 4 | range_end | the exclusive end probability of the bin | float | 
| 5 | count | the total count of probabilities falling in this bin | int | 
| 6 | frac | the fraction of the total calls for this code/primary base in this bin | float | 
| 7 | percentile_rank | the percentile rank of this probability bin | float | 
From these plots and tables you can decide on a pass threshold per-modification code and use --mod-threshold/--filter-threshold accordingly.