Calculating modification statistics in regions

There are many analysis operations available in modkit once you've generated a bedMethyl table. One such operation is to calculate aggregation statistics on specific regions, for example in CpG islands or gene promoters. The modkit stats command is designed for this purpose.

# these files can be found in the modkit repository
cpgs=tests/resources/cpg_chr20_with_orig_names_selection.bed
sample=tests/resources/lung_00733-m_adjacent-normal_5mc-5hmc_chr20_cpg_pileup.bed.gz
modkit stats ${sample} --regions ${cpgs} -o ./stats.tsv [--mod-codes "h,m"]

Note that the argument --mod-codes can alternatively be passed multiple times, e.g. this is equivalent:
--mod-codes c --mod-codes h

The output TSV has the following schema:

columnNameDescriptiontype
1chromname of reference sequence from BAM headerstr
2start position0-based start positionint
3end position0-based exclusive end positionint
4namename of the region from input BED (. if not provided)str
5strandStrand (+, -, .) from the input BED (. assumed for when not provided)str
6+count_xtotal number of x base modification codes in the regionint
7+count_valid_xtotal valid calls for the primary base modified by code xint
8+percent_xcount_x / count_vali_x * 100float

Columns 6, 7, and 8 are repeated for each modification code found in the bedMethyl file or provided with --mod-codes argument.

An example output:

chrom  start     end       name     strand   count_h        count_valid_h  percent_h   count_m        count_valid_m  percent_m
chr20  9838623   9839213   CpG: 47  .        12             1777           0.6752954   45             1777           2.532358
chr20  10034962  10035266  CpG: 35  .        7              1513           0.46265697  0              1513           0
chr20  10172120  10172545  CpG: 35  .        15             1229           1.2205045   28             1229           2.278275
chr20  10217487  10218336  CpG: 59  .        29             2339           1.2398461   108            2339           4.617358
chr20  10433628  10434345  CpG: 71  .        29             2750           1.0545455   2              2750           0.07272727
chr20  10671925  10674963  CpG: 255 .        43             9461           0.45449743  24             9461           0.25367296