Command Line Interface¶
Table of Contents
refgenome¶
prepare¶
Pre-process a reference genome
Usage¶
pore_c refgenome prepare [OPTIONS] REFERENCE_FASTA OUTPUT_PREFIX
Pre-process a reference genome for use by pore-C tools.
Prepare a reference genome or draft assembly use by pore-c tools.
This tool creates the following files:
    <output_prefix>.fa - A decompressed fasta file
    <output_prefix>.chromsizes - A tab-separated list of chromosome lengths
    <output_prefix>.metadata.csv - A csv with metadata about each of
    <output_prefix>.catalog.yaml - An intake catalog of these files
Parameters:¶
- reference_fasta [required] 
- output_prefix [required] 
- –genome-id TEXT: An ID for this genome assembly 
virtual-digest¶
Virtual digest of a reference genome.
Usage¶
pore_c refgenome virtual-digest [OPTIONS] FASTA CUT_ON OUTPUT_PREFIX
Carry out a virtual digestion of the genome/assembly FASTA.
The DIGEST_TYPE sets what type of value you can use for CUT_ON.
    ---------------------------------------------------------------------
    | digest_type |  cut_on           | notes                           |
    ---------------------------------------------------------------------
    | enzyme      | NlaIII            | Enzyme name is case sensitive   |
    | enzyme      | HindIII           |                                 |
    | regex       | (GAATTC|GCGGCCGC) | Two site EcoRI and NotI         |
    | regex       | RAATY             | Degenerate site ApoI            |
    | bin         | 50000             | Create equal-width bins of 50k  |
    | bin         | 50k               | Create equal-width bins of 50k  |
    =====================================================================
This tool will create the following output files:
    <output_prefix>.fragments.parquet
      A table containing the coordinates and some metadata on each fragment
    <output_prefix>.digest_stats.csv
      Per chromosome/contig summary statistics
    <output_prefix>.catalog.yaml
      An intake catalog of the digest files
Parameters:¶
- fasta [required] 
- cut_on [required] 
- output_prefix [required] 
- –digest-type [enzyme|regex|bin]: The type of digest you want to do 
- -n, –n_workers INTEGER: The number of dask_workers to use [default: 1] 
fragments-to-hicref¶
Create a hicRef file for a virtual digest.
Usage¶
pore_c refgenome fragments-to-hicref [OPTIONS] FRAGMENTS_PARQUET HICREF
Convert a  .fragments.parquet file to hicRef format
Parameters:¶
- fragments_parquet [required] 
- hicref [required] 
reads¶
prepare¶
Create a catalog file for a set of reads
Usage¶
pore_c reads prepare [OPTIONS] FASTQ OUTPUT_PREFIX
Preprocess a set of reads for use with pore_c tools.
This tool creates the following files:
    <output_prefix>.batch[batch_idx].fq.gz - Fastq with all the reads that pass the qscore and
      length filters. Fastqs are split so there are at most --batch-size reads per fastq
    <output_prefix>.fail.fq.gz - Reads that fail the filters
    <output_prefix>.read_metadata.parquet - Length and qscore metadata for each read and
      whether they pass the filter
    <output_prefix>.summary.csv - Summary stats for all/pass/fail reads
    <output_prefix>.catalog.yaml - An intake catalog
Parameters:¶
- fastq [required] 
- output_prefix [required] 
- –batch-size INTEGER: The reads will be split into batches of this size for downstream processing [default: 10000] 
- –min-read-length INTEGER: The minimum length read to run through pore_c [default: 1] 
- –max-read-length INTEGER: The maximum length read to run through pore_c. Note that bwa mem can crash on very long reads [default: 150000] 
- –min-qscore INTEGER: The minimum read qscore [default: 0] 
- –max-qscore INTEGER: The maximum read qscore [default: 266] 
- –user-metadata TEXT: Additional user metadata to associate with this run 
alignments¶
reformat-bam¶
Reformat a BAM file to have a unique read name per alignment
Usage¶
pore_c alignments reformat-bam [OPTIONS] INPUT_SAM OUTPUT_SAM
Reformat query_name in INPUT_SAM  and write to OUTPUT_SAM
This tool reformats an alignment file so that it works with downstream
steps in the Pore-C pipeline. For both files you can supply '-' if you want
to read/write from/to stdin/stdout. The 'query_name' field of the alignment
file will be reformatted so that each alignment in the SAM file has a
unique query name:
    <read_id> -> <read_id>:<read_idx>:<align_idx>
Where 'read_idx' is a unique integer id for each read within the file and
'align_idx' is a unique integer id for each alignment within the file. The
tool also adds a 'BX' tag consisting of the 'read_id' to each record.
Parameters:¶
- input_sam [required] 
- output_sam [required] 
- –input-is-bam: If piping a BAM from stdin (rather than sam) [default: False] 
- –output-is-bam: If piping a BAM to stdout (rather than sam) [default: False] 
- –set-bx-flag: Set the BX tag to the read name [default: False] 
create-table¶
Parse a namesortd bam to pore-C alignment format
Usage¶
pore_c alignments create-table [OPTIONS] INPUT_BAM OUTPUT_TABLE
Convert a BAM file to a tabular format sorted by read for downstream analysis
Parameters:¶
- input_bam [required] 
- output_table [required] 
- –alignment-haplotypes PATH: The alignment to haplotype mapping from whatshap 
assign-fragments¶
Parse a namesortd bam to pore-C alignment format
Usage¶
pore_c alignments assign-fragments [OPTIONS] ALIGN_TABLE FRAGMENTS_TABLE PORE_C_TABLE
For each alignment in ALIGN_TABLE either filter out or assign a fragment from FRAGMENT_TABLE
Parameters:¶
- align_table [required] 
- fragments_table [required] 
- pore_c_table [required] 
- –mapping_quality_cutoff INTEGER: Minimum mapping quality for an alignment to be considered [default: 1] 
- –min_overlap_length INTEGER: Minimum overlap in base pairs between an alignment and restriction fragment [default: 10] 
- –containment_cutoff FLOAT: Minimum percentage of a fragment included in an overlap for that fragment to be considered ‘contained’ within an alignment [default: 99.0] 
filter-bam¶
Filter bam using pore_c table
Usage¶
pore_c alignments filter-bam [OPTIONS] INPUT_BAM PORE_C_TABLE OUTPUT_BAM
Parameters:¶
- input_bam [required] 
- pore_c_table [required] 
- output_bam [required] 
- –clean-read-name: Strip out the extra information placed in the BAM by reformat_bam 
assign-consensus-haplotype¶
Parse a namesortd bam to pore-C alignment format
Usage¶
pore_c alignments assign-consensus-haplotype [OPTIONS] PORE_C_TABLE OUTPUT_PORE_C_TABLE
Calculate a per-read consensus haplotype for each phase_set in ALIGN_TABLE and write the results back
to OUTPUT_ALIGN_TABLE
Parameters:¶
- pore_c_table [required] 
- output_pore_c_table [required] 
- –threshold FLOAT: major:minor haplotype fraction must be greater than this value to assign a consensus [default: 0.8] 
to-contacts¶
Parses the alignment table and converts to pairwise contacts
Usage¶
pore_c alignments to-contacts [OPTIONS] PORE_C_TABLE CONTACT_TABLE
Covert the alignment table to a pairwise contact table and associated concatemer table
Parameters:¶
- pore_c_table [required] 
- contact_table [required] 
contacts¶
merge¶
Summarise a contact table
Usage¶
pore_c contacts merge [OPTIONS] [SRC_CONTACT_TABLES]... DEST_CONTACT_TABLE
Parameters:¶
- src_contact_tables [required] 
- dest_contact_table [required] 
- –fofn: If this flag is set then the SRC_CONTACT_TABLES is a file of filenames corresponding to the contact tables you want to merge. This is workaround for when the command line gets too long. 
downsample¶
Downsample a contact table
Usage¶
pore_c contacts downsample [OPTIONS] SRC_CONTACT_TABLE DEST_CONTACT_TABLE_PREFIX
        [DOWNSAMPLE_INCREMENTS]...
Parameters:¶
- src_contact_table [required] 
- dest_contact_table_prefix [required] 
- downsample_increments [required] 
- –downsample-unit [Gb|Mb|Kb]: 
- –random-seed INTEGER: 
- –tol FLOAT: Check if the difference between the sampled amout and the target amount is greater than this proportion 
- –warn: If the a sample fails the –tol check print a warning rather than exiting 
- –max-attempts INTEGER: The number of times to try and find a set of subsamples all within –tol 
summarize¶
Summarise a contact table
Usage¶
pore_c contacts summarize [OPTIONS] CONTACT_TABLE READ_SUMMARY_TABLE CONCATEMER_TABLE
        CONCATEMER_SUMMARY_CSV
Parameters:¶
- contact_table [required] 
- read_summary_table [required] 
- concatemer_table [required] 
- concatemer_summary_csv [required] 
- –user-metadata TEXT: Add additional user metadata to the summary table, must be a dictionary in json format 
export¶
Export contacts to various formats
Usage¶
pore_c contacts export [OPTIONS] CONTACT_TABLE
        [cooler|salsa_bed|paired_end_fastq|pairs|merged_no_dups] OUTPUT_PREFIX
Export contacts to the following formats:
 - cooler: a sparse representation of a contact matrix
 - paired_end_fastq: for each contact create a pseudo pair-end read using the reference genome sequence
Parameters:¶
- contact_table [required] 
- format [required] 
- output_prefix [required] 
- –min-mapping-quality INTEGER: Both alignments have mapping qualities greater than this [default: 0] 
- –min-align-base-qscore INTEGER: Both alignments have mean base qualities greater than this [default: 0] 
- –cooler-resolution INTEGER: The bin width of the resulting matrix [default: 1000] 
- –fragment-table TEXT: The fragment table for the corresponding virtual digest(required if export format is in cooler) 
- –by-haplotype: Create a cooler for each pair of haplotypes (eg 1-1, 1-2, 2-2,…). Only valid with ‘cooler’ 
- –chromsizes TEXT: The chromsizes file for the corresponding genome(required if export format is in cooler,pairs) 
- –reference-fasta TEXT: The reference genome used to generate the contact table(required if export format is in paired_end_fastq,merged_no_dups) 
