# Command Line Interface ```{contents} Table of Contents --- depth: 3 --- ``` ## refgenome ### prepare Pre-process a reference genome #### Usage ```bash pore_c refgenome prepare [OPTIONS] REFERENCE_FASTA OUTPUT_PREFIX Pre-process a reference genome for use by pore-C tools. Prepare a reference genome or draft assembly use by pore-c tools. This tool creates the following files:  .fa - A decompressed fasta file .chromsizes - A tab-separated list of chromosome lengths .metadata.csv - A csv with metadata about each of .catalog.yaml - An intake catalog of these files ``` #### Parameters: - *reference_fasta* [required] - *output_prefix* [required] - *--genome-id TEXT*: An ID for this genome assembly *** ### virtual-digest Virtual digest of a reference genome. #### Usage ```bash pore_c refgenome virtual-digest [OPTIONS] FASTA CUT_ON OUTPUT_PREFIX Carry out a virtual digestion of the genome/assembly FASTA. The DIGEST_TYPE sets what type of value you can use for CUT_ON.  --------------------------------------------------------------------- | digest_type | cut_on | notes | --------------------------------------------------------------------- | enzyme | NlaIII | Enzyme name is case sensitive | | enzyme | HindIII | | | regex | (GAATTC|GCGGCCGC) | Two site EcoRI and NotI | | regex | RAATY | Degenerate site ApoI | | bin | 50000 | Create equal-width bins of 50k | | bin | 50k | Create equal-width bins of 50k | ===================================================================== This tool will create the following output files:  .fragments.parquet A table containing the coordinates and some metadata on each fragment .digest_stats.csv Per chromosome/contig summary statistics .catalog.yaml An intake catalog of the digest files ``` #### Parameters: - *fasta* [required] - *cut_on* [required] - *output_prefix* [required] - *--digest-type [enzyme|regex|bin]*: The type of digest you want to do - *-n, --n_workers INTEGER*: The number of dask_workers to use [default: 1] *** ### fragments-to-hicref Create a hicRef file for a virtual digest. #### Usage ```bash pore_c refgenome fragments-to-hicref [OPTIONS] FRAGMENTS_PARQUET HICREF Convert a .fragments.parquet file to hicRef format ``` #### Parameters: - *fragments_parquet* [required] - *hicref* [required] *** ## reads ### prepare Create a catalog file for a set of reads #### Usage ```bash pore_c reads prepare [OPTIONS] FASTQ OUTPUT_PREFIX Preprocess a set of reads for use with pore_c tools. This tool creates the following files:  .batch[batch_idx].fq.gz - Fastq with all the reads that pass the qscore and length filters. Fastqs are split so there are at most --batch-size reads per fastq .fail.fq.gz - Reads that fail the filters .read_metadata.parquet - Length and qscore metadata for each read and whether they pass the filter .summary.csv - Summary stats for all/pass/fail reads .catalog.yaml - An intake catalog ``` #### Parameters: - *fastq* [required] - *output_prefix* [required] - *--batch-size INTEGER*: The reads will be split into batches of this size for downstream processing [default: 10000] - *--min-read-length INTEGER*: The minimum length read to run through pore_c [default: 1] - *--max-read-length INTEGER*: The maximum length read to run through pore_c. Note that bwa mem can crash on very long reads [default: 150000] - *--min-qscore INTEGER*: The minimum read qscore [default: 0] - *--max-qscore INTEGER*: The maximum read qscore [default: 266] - *--user-metadata TEXT*: Additional user metadata to associate with this run *** ## alignments ### reformat-bam Reformat a BAM file to have a unique read name per alignment #### Usage ```bash pore_c alignments reformat-bam [OPTIONS] INPUT_SAM OUTPUT_SAM Reformat query_name in INPUT_SAM and write to OUTPUT_SAM This tool reformats an alignment file so that it works with downstream steps in the Pore-C pipeline. For both files you can supply '-' if you want to read/write from/to stdin/stdout. The 'query_name' field of the alignment file will be reformatted so that each alignment in the SAM file has a unique query name:  -> :: Where 'read_idx' is a unique integer id for each read within the file and 'align_idx' is a unique integer id for each alignment within the file. The tool also adds a 'BX' tag consisting of the 'read_id' to each record. ``` #### Parameters: - *input_sam* [required] - *output_sam* [required] - *--input-is-bam*: If piping a BAM from stdin (rather than sam) [default: False] - *--output-is-bam*: If piping a BAM to stdout (rather than sam) [default: False] - *--set-bx-flag*: Set the BX tag to the read name [default: False] *** ### create-table Parse a namesortd bam to pore-C alignment format #### Usage ```bash pore_c alignments create-table [OPTIONS] INPUT_BAM OUTPUT_TABLE Convert a BAM file to a tabular format sorted by read for downstream analysis ``` #### Parameters: - *input_bam* [required] - *output_table* [required] - *--alignment-haplotypes PATH*: The alignment to haplotype mapping from whatshap *** ### assign-fragments Parse a namesortd bam to pore-C alignment format #### Usage ```bash pore_c alignments assign-fragments [OPTIONS] ALIGN_TABLE FRAGMENTS_TABLE PORE_C_TABLE For each alignment in ALIGN_TABLE either filter out or assign a fragment from FRAGMENT_TABLE ``` #### Parameters: - *align_table* [required] - *fragments_table* [required] - *pore_c_table* [required] - *--mapping_quality_cutoff INTEGER*: Minimum mapping quality for an alignment to be considered [default: 1] - *--min_overlap_length INTEGER*: Minimum overlap in base pairs between an alignment and restriction fragment [default: 10] - *--containment_cutoff FLOAT*: Minimum percentage of a fragment included in an overlap for that fragment to be considered 'contained' within an alignment [default: 99.0] *** ### filter-bam Filter bam using pore_c table #### Usage ```bash pore_c alignments filter-bam [OPTIONS] INPUT_BAM PORE_C_TABLE OUTPUT_BAM ``` #### Parameters: - *input_bam* [required] - *pore_c_table* [required] - *output_bam* [required] - *--clean-read-name*: Strip out the extra information placed in the BAM by reformat_bam *** ### assign-consensus-haplotype Parse a namesortd bam to pore-C alignment format #### Usage ```bash pore_c alignments assign-consensus-haplotype [OPTIONS] PORE_C_TABLE OUTPUT_PORE_C_TABLE Calculate a per-read consensus haplotype for each phase_set in ALIGN_TABLE and write the results back to OUTPUT_ALIGN_TABLE ``` #### Parameters: - *pore_c_table* [required] - *output_pore_c_table* [required] - *--threshold FLOAT*: major:minor haplotype fraction must be greater than this value to assign a consensus [default: 0.8] *** ### to-contacts Parses the alignment table and converts to pairwise contacts #### Usage ```bash pore_c alignments to-contacts [OPTIONS] PORE_C_TABLE CONTACT_TABLE Covert the alignment table to a pairwise contact table and associated concatemer table ``` #### Parameters: - *pore_c_table* [required] - *contact_table* [required] *** ## contacts ### merge Summarise a contact table #### Usage ```bash pore_c contacts merge [OPTIONS] [SRC_CONTACT_TABLES]... DEST_CONTACT_TABLE ``` #### Parameters: - *src_contact_tables* [required] - *dest_contact_table* [required] - *--fofn*: If this flag is set then the SRC_CONTACT_TABLES is a file of filenames corresponding to the contact tables you want to merge. This is workaround for when the command line gets too long. *** ### downsample Downsample a contact table #### Usage ```bash pore_c contacts downsample [OPTIONS] SRC_CONTACT_TABLE DEST_CONTACT_TABLE_PREFIX [DOWNSAMPLE_INCREMENTS]... ``` #### Parameters: - *src_contact_table* [required] - *dest_contact_table_prefix* [required] - *downsample_increments* [required] - *--downsample-unit [Gb|Mb|Kb]*: - *--random-seed INTEGER*: - *--tol FLOAT*: Check if the difference between the sampled amout and the target amount is greater than this proportion - *--warn*: If the a sample fails the --tol check print a warning rather than exiting - *--max-attempts INTEGER*: The number of times to try and find a set of subsamples all within --tol *** ### summarize Summarise a contact table #### Usage ```bash pore_c contacts summarize [OPTIONS] CONTACT_TABLE READ_SUMMARY_TABLE CONCATEMER_TABLE CONCATEMER_SUMMARY_CSV ``` #### Parameters: - *contact_table* [required] - *read_summary_table* [required] - *concatemer_table* [required] - *concatemer_summary_csv* [required] - *--user-metadata TEXT*: Add additional user metadata to the summary table, must be a dictionary in json format *** ### export Export contacts to various formats #### Usage ```bash pore_c contacts export [OPTIONS] CONTACT_TABLE [cooler|salsa_bed|paired_end_fastq|pairs|merged_no_dups] OUTPUT_PREFIX Export contacts to the following formats: - cooler: a sparse representation of a contact matrix - paired_end_fastq: for each contact create a pseudo pair-end read using the reference genome sequence ``` #### Parameters: - *contact_table* [required] - *format* [required] - *output_prefix* [required] - *--min-mapping-quality INTEGER*: Both alignments have mapping qualities greater than this [default: 0] - *--min-align-base-qscore INTEGER*: Both alignments have mean base qualities greater than this [default: 0] - *--cooler-resolution INTEGER*: The bin width of the resulting matrix [default: 1000] - *--fragment-table TEXT*: The fragment table for the corresponding virtual digest(required if export format is in cooler) - *--by-haplotype*: Create a cooler for each pair of haplotypes (eg 1-1, 1-2, 2-2,...). Only valid with 'cooler' - *--chromsizes TEXT*: The chromsizes file for the corresponding genome(required if export format is in cooler,pairs) - *--reference-fasta TEXT*: The reference genome used to generate the contact table(required if export format is in paired_end_fastq,merged_no_dups) *** ## utils ### parquet-to-csv #### Usage ```bash pore_c utils parquet-to-csv [OPTIONS] INPUT_PARQUET OUTPUT_CSV Convert a parquet file to CSV ``` #### Parameters: - *input_parquet* [required] - *output_csv* [required]