fast5_research package

Submodules

fast5_research.extract module

class fast5_research.extract.MultiWriter(out_path, by_id, prefix='', reads_per_file=4000)[source]

Bases: fast5_research.extract.ReadWriter

close()[source]
write_read(read)[source]

Write a read.

Parameters

read – either a Read object or an hdf group handle from a source multi-read file.

class fast5_research.extract.Read(read_id, read_number, tracking_id, channel_id, context_tags, raw)[source]

Bases: object

class fast5_research.extract.ReadWriter(out_path, by_id, prefix='')[source]

Bases: object

write_read()[source]
class fast5_research.extract.SingleWriter(out_path, by_id, prefix='')[source]

Bases: fast5_research.extract.ReadWriter

write_read(read)[source]
fast5_research.extract.build_read_index()[source]
fast5_research.extract.extract_channel_reads(source, output, prefix, flat, by_id, max_files, multi, channel, summary=None)[source]
fast5_research.extract.extract_read_summary()[source]
fast5_research.extract.extract_read_summary_internal(src, channels, out_fh, logger)[source]
fast5_research.extract.extract_reads()[source]
fast5_research.extract.filter_file_from_bam()[source]
fast5_research.extract.filter_multi_reads()[source]
fast5_research.extract.reads_in_multi(src, filt=None)[source]

Get list of read IDs contained within a multi-read file.

Parameters
  • src – source file.

  • filt – perform filtering by given set.

Returns

set of read UUIDs (as string and recorded in hdf group name).

fast5_research.extract.time_cast(time, sample_rate)[source]

Convert a float time to sample index, or return time unmodified

fast5_research.extract.triplewise(iterable)[source]

fast5_research.fast5 module

class fast5_research.fast5.Fast5(fname, read='r')[source]

Bases: h5py._hl.files.File

Class for grabbing data from single read fast5 files. Many attributes/ groups are assumed to exist currently (we’re concerned mainly with reading). Needs some development to make robust and for writing.

classmethod New(fname, read='w', tracking_id={}, context_tags={}, channel_id={})[source]

Construct a fresh single-read file, with meta data written to standard locations.

assert_writable()[source]
property attributes

Attributes for a read, assumes one read in file

property channel_meta

Channel meta information as python dict

property context_tags

Context tags meta information as python dict

static convert_channel_id(channel_id)[source]
static convert_raw_meta(meta)[source]
static convert_tracking_id(tracking_id)[source]
get_alignment_attrs(section='template', analysis='Alignment')[source]

Read the annotated alignment meta data from the fast5 file.

Parameters
  • section – String to use in paths, e.g. ‘template’.

  • analysis – Base analysis name (under /Analyses)

get_analysis_latest(name)[source]

Get group of latest (present) analysis with a given base path.

Parameters

name – Get the (full) path of newest analysis with a given base name.

get_analysis_new(name)[source]

Get group path for new analysis with a given base name.

Parameters

name – desired analysis name

get_any_mapping_data(section='template', attrs_only=False, get_model=False)[source]

Convenience method for extracting whatever mapping data might be present, favouring squiggle_mapping output over basecall_mapping.

Parameters
  • section – (Probably) ‘template’

  • attrs_only – Use attrs_only=True to return mapping attributes without events

Returns

the tuple (events, attrs) or attrs only

get_basecall_data(section='template', analysis='Basecall_1D')[source]

Read the annotated basecall_1D events from the fast5 file.

Parameters
  • section – String to use in paths, e.g. ‘template’.

  • analysis – Base analysis name (under /Analyses)

get_engine_state(state, time=None)[source]

Retrieve engine state from /EngineStates/, either across the whole read (default) or at a given time.

Parameters
  • state – name of engine state

  • time – time (in seconds) at which to retrieve temperature

get_fastq(analysis='Basecall_1D', section='template', custom=None)[source]

Get the fastq (sequence) data.

Parameters
  • analysis – Base analysis name (under /Analyses)

  • section – (Probably) ‘template’

  • custom – Custom hdf path overriding all of the above.

get_mapping_attrs(section='template', analysis='Squiggle_Map')[source]

Read the annotated mapping meta data from the fast5 file. Names which are inconsistent between squiggle_mapping and basecall_mapping are added to basecall_mapping (thus duplicating the attributes in basecall mapping).

Parameters
  • section – String to use in paths, e.g. ‘template’.

  • analysis – Base analysis name (under /Analyses) For basecall mapping use analysis = ‘Alignment’

get_mapping_data(section='template', analysis='Squiggle_Map', get_model=False)[source]

Read the annotated mapping events from the fast5 file.

Note

The seq_pos column for the events table returned from basecall_mapping is adjusted to be the genome position (consistent with squiggle_mapping)

Parameters
  • section – String to use in paths, e.g. ‘template’.

  • analysis – Base analysis name (under /Analyses). For basecall mapping use analysis = ‘AlignToRef’.

get_model(section='template', analysis='Squiggle_Map')[source]

Get model used for squiggle mapping

get_raw(scale=True)[source]

Get raw data in file, might not be present.

Parameters

scale – Scale data to pA? (rather than ADC values)

Warning

This method is deprecated and should not be used, instead use .get_read(raw=True) to read both MinKnow conformant files and previous Tang files.

get_read(group=False, raw=False, read_number=None)[source]

Like get_reads, but only the first read in the file

Parameters

group – return hdf group rather than event/raw data

get_read_stats()[source]

Combines stats based on events with output of .summary, assumes a one read file.

get_reads(group=False, raw=False, read_numbers=None)[source]

Iterator across event data for all reads in file

Parameters

group – return hdf group rather than event data

get_reference_fasta(analysis='Alignment', section='template', custom=None)[source]

Get fasta sequence of known DNA fragment for the read.

Parameters
  • analysis – Base analysis name (under /Analyses)

  • section – (Probably) ‘template’

  • custom – Custom hdf path overriding all of the above.

get_sam(analysis='Alignment', section='template', custom=None)[source]

Get SAM (alignment) data.

Parameters
  • analysis – Base analysis name (under /Analyses)

  • section – (Probably) ‘template’

  • custom – Custom hdf path overriding all of the above.

get_section_events(section, analysis='Segment_Linear')[source]

Get the event data for a signal section

Parameters

analysis – Base analysis path (under /Analyses)

get_section_indices(analysis='Segment_Linear')[source]

Get two tuples indicating the event indices for signal segmentation boundaries.

Parameters

analysis – Base analysis path (under /Analyses)

get_split_data(analysis='Segment_Linear')[source]

Get signal segmentation data.

Parameters

analysis – Base analysis name (under /Analyses)

get_temperature(time=None, field='heatsink')[source]

Retrieve temperature data from /EngineStates/, either across the whole read (default) or at a given time.

Parameters
  • time – time at which to get temperature

  • field – one of (‘heatsink’, ‘asic’)

repack(pack_opts='')[source]

Run h5repack on the current file. Returns a fresh object.

set_basecall_data(events, scale, path, model, seq, section='template', name='unknown', post=None, score=None, quality_data=None, qstring=None, analysis='Basecall_1D')[source]

Create an annotated event table and 1D basecalling summary similiar to chimaera and add them to the fast5 file.

Parameters
  • events – Numpy record array of events. Must contain the mean, stdv, start and length fields.

  • scale – Scaling object.

  • path – Viterbi path containing model pointers (1D np.array).

  • model – Model object.

  • seq – Basecalled sequence string for fastq.

  • section – String to use in paths, e.g. ‘template’.

  • name – Identifier string for fastq.

  • post – Numpy 2D array containing the posteriors (event, state), used to annotate events.

  • score – Quality value for the whole strand.

  • quality_data – Numpy 2D array containing quality_data, used to annotate events.

  • qstring – Quality string for fastq.

  • analysis – Base analysis name (under /Analyses)

set_engine_state(data)[source]

Set the engine state data.

Parameters

data – a 1D-array containing two fields, the first of which must be named ‘time’. The name of the second field will be used to name the engine state and be used in the dataset path.

set_mapping_data(events, scale, path, model, seq, ref_name, section='template', post=None, score=None, is_reverse=False, analysis='Squiggle_Map')[source]

Create an annotated event table and mapping summary similiar to chimaera and add them to the fast5 file.

Parameters
  • eventsnp.ndarray of events. Must contain mean, stdv, start and length fields.

  • scale – Scaling object.

  • pathnp.ndarray containing position in reference. Negative values will be interpreted as “bad emissions”.

  • model – Model object to use.

  • seq – String representation of the reference sequence.

  • section – Section of strand, e.g. ‘template’.

  • name – Reference name.

  • post – Two-dimensional np.ndarray containing posteriors.

  • score – Mapping quality score.

  • is_reverse – Mapping refers to ‘-‘ strand (bool).

  • analysis – Base analysis name (under /Analyses)

set_raw(raw, meta=None, read_number=None)[source]

Set the raw data in file.

Parameters
  • raw – raw data to add

  • read_number – read number (as usually given in filename and contained within HDF paths, viz. Reads/Read_<>/). If not given attempts will be made to guess the number (assumes single read per file).

set_raw_old(raw, meta)[source]

Set the raw data in file.

Parameters
  • raw – raw data to add

  • meta – meta data dictionary

Warning

This method does not write raw data conforming to the Fast5 specification. This class will currently still read data written by this method.

set_read(data, meta)[source]

Write event data to file

Parameters
  • data – event data

  • meta – meta data to attach to read

  • read_number – per-channel read counter

set_split_data(data, analysis='Segment_Linear')[source]

Write a dict containing split point data.

Parameters
  • datadict-like object containing attrs to add

  • analysis – Base analysis name (under /Analyses)

Warning

Not checking currently for required fields.

strip_analyses(keep='EventDetection_000', 'RawData')[source]

Remove all analyses from file

Parameters

keep – whitelist of analysis groups to keep

summary(rename=True, delete=True, scale=True)[source]

A read summary, assumes one read in file

property tracking_id

Tracking id meta information as python dict

property writable

Can we write to the file.

fast5_research.fast5.iterate_fast5(path='Stream', strand_list=None, paths=False, mode='r', limit=None, shuffle=False, robust=False, progress=False, recursive=False)[source]

Iterate over directory of fast5 files, optionally only returning those in list

Parameters
  • path – Directory in which single read fast5 are located or filename.

  • strand_list – List of strands, can be a python list of delimited table. If the later and a filename field is present, this is used to locate files. If a file is given and a strand field is present, the directory index file is searched for and filenames built from that.

  • paths – Yield file paths instead of fast5 objects.

  • mode – Mode for opening files.

  • limit – Limit number of files to consider.

  • shuffle – Shuffle files to randomize yield of files.

  • robust – Carry on with iterating over FAST5 files after an exception was raised.

  • progress – Display progress bar.

  • recursive – Perform a recursive search for files in subdirectories of path.

fast5_research.fast5.recursive_glob(treeroot, pattern)[source]

fast5_research.fast5_bulk module

class fast5_research.fast5_bulk.AsicBCommand(command)[source]

Bases: object

Wrapper around the asicb command structure

property configuration
property min_temperature
class fast5_research.fast5_bulk.AsicBConfiguration(config)[source]

Bases: object

Wrapper around the asicb configuration struct passed to the asicb over usb

active_mux(channel)[source]

Gets the active mux for the specified channel :param channel: 0 based

property bias_voltage
bits_at(start, end)[source]
int_at(start, end)[source]
class fast5_research.fast5_bulk.BulkFast5(filename, mode='r')[source]

Bases: h5py._hl.files.File

Class for reading data from a bulk fast5 file

classmethod New(fname, read='a', tracking_id={}, context_tags={}, channel_id={})[source]

Construct a fresh bulk file, with meta data written to standard locations. There is currently no checking this meta data. TODO: Add meta data checking.

get_bias_voltage_changes()[source]

Get changes in the bias voltage.

Note

For a long (-long-long) time the only logging of the common electrode voltage was the experimental history (accurate to one second). The addition of the voltage trace changed this, but this dataset is cumbersome. MinKnow 1.x(.3?) added the asic command history which is typically much shorter and therefore quicker to query. The bias voltage is numerously record. For MinION asics there is typically a -5X multiplier to convert the data into correct units with the sign people are used to.

get_bias_voltage_changes_in_window(times=None, raw_indices=None)[source]

Find all mux voltage changes within a time window.

Parameters
  • times – tuple of floats (start_second, end_second)

  • raw_indices – tuple of ints (start_index, end_index)

Note

This is the bias voltage from the expt history (accurate to 1 second), and will not include any changes in voltage related to waveforms. For the full voltage trace, use get_voltage.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_context_meta()[source]

Get context meta

get_engine_state(state, time=None)[source]

Get changes in an engine state or the value of an engine state at a given time.

Parameters
  • state – the engine state to retrieve.

  • time – the time at which to grab engine state.

get_event_detection_parameters()[source]

Get the full set of parameters related to event detection

get_events(channel, times=None, raw_indices=None, event_indices=None, None, use_scaling=True)[source]

Parse channel event data.

Parameters
  • channel – channel number int

  • times – tuple of floats (start_second, end_second)

  • raw_indices – tuple of ints (start_index, end_index)

  • event_indices – tuple of ints (start_index, end_index)

  • use_scaling – if True, scale the current level

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices > event_indices.

get_metadata(channel)[source]

Get the metadata for the specified channel.

Look for first for events metadata, and fall-back on raw metadata, returning an empty dict if neither could be found.

get_mux(channel, raw_index=None, time=None, wells_only=False, return_raw_index=False)[source]

Find the multiplex well_id (“the mux”) at a given time

Parameters
  • channel – channel number int

  • raw_index – sample index

  • time – time in seconds

Wells_only

bool, if True, ignore changes to mux states not in [1,2,3,4] and hence return the last well mux.

Return_raw_index

bool, if True, return tuple (mux, raw_index), raw_index being raw index when the mux was set.

Note

There are multiple mux states associated with each well (e.g. common_voltage_1 and unblock_volage_1). Here, we return the well_id associated with the mux state (using self.enum_to_mux), i.e. 1 in both these cases.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_mux_changes(channel, wells_only=False)[source]

Get changes in multiplex settings for given channel.

Parameters

channel – channel for which to fetch data

Wells_only

bool, if True, ignore changes to mux states not in [1,2,3,4]

Note

There are multiple mux states associated with each well (e.g. 1:common_voltage_1 and 6:unblock_voltage_1). Here, we return mux state numbers, e.g. 1 and 6, which can be linked to the well_id using self.enum_to_mux

get_mux_changes_in_window(channel, times=None, raw_indices=None)[source]

Find all mux changes within a time window.

Parameters
  • channel – channel number int

  • times – tuple of floats (start_second, end_second)

  • raw_indices – tuple of ints (start_index, end_index)

Note

There are multiple mux values associated with each well (e.g. 1:common_voltage_1 and 6:unblock_voltage_1). Here, we return mux values, e.g. 1 and 6, which can be linked to the well_id using self.enum_to_mux.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_raw(channel, times=None, raw_indices=None, None, use_scaling=True)[source]

If available, parse channel raw data.

Parameters
  • channel – channel number int

  • times – tuple of floats (start_second, end_second)

  • raw_indices – tuple of ints (start_index, end_index)

  • use_scaling – if True, scale the current level

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_reads(channel, transitions=False, multi_row_class='auto')[source]

Parse channel read data to yield details of reads.

Parameters
  • channel – channel number int

  • transitions – if True, include transition reads

  • multi_row_class – options: ‘auto’, modal, ‘penultimate’, ‘final’. For reads which span multiple rows, use the classification from ‘auto’: modal class if present, penultimate row if not ‘modal’: modal class if present ‘penultimate’: penultimate row ‘final’: final row. Modal classification not supported by very old versions of MinKNOW.

get_state(channel, raw_index=None, time=None)[source]

Find the channel state at a given time

Parameters
  • channel – channel number int

  • raw_index – sample index

  • time – time in seconds

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_state_changes(channel)[source]

Parse channel state changes.

Parameters

channel – channel number int

get_states_in_window(channel, times=None, raw_indices=None)[source]

Find all channel states within a time window.

Parameters
  • channel – channel number int

  • times – tuple of floats (start_second, end_second)

  • raw_indices – tuple of ints (start_index, end_index)

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_temperature(time=None, field='heatsink')[source]
get_tracking_meta()[source]

Get tracking meta data

get_voltage(times=None, raw_indices=None, None, use_scaling=True)[source]

Extracts raw common electrode trace

Raw_indices

tuple of ints to limit section of voltage data loaded.

Use_scaling

bool, whether to scale voltage data. If no scaling meta is found, scale by -5 (as appropriate for MinION).

Returns

voltage as array (including 5x multiplyer for MinKnow)

get_waveform_timings()[source]

Extract the timings of the waveforms (if any).

Returns

list of tuples of start and end times

has_raw(channel)[source]

Return True if there is raw data for this channel.

has_reads(channel)[source]

Return True if there is read data for this channel.

has_states(channel)[source]

Return True if there is State data for this channel.

parse_history()[source]

Parse the experimental history to pull out various environmental factors. The functions below are quite nasty, don’t enquire too hard.

set_events(data, meta, channel)[source]

Write event data to file

Parameters
  • data – event data

  • meta – meta data to attach to read

  • read_number – per-channel read counter

set_raw(raw, channel, meta=None)[source]

Set the raw data in file.

Parameters
  • raw – raw data to add

  • channel – channel number

set_voltage(data, meta)[source]

fast5_research.util module

class fast5_research.util.MockZeroArray(shape)[source]

Bases: numpy.ndarray

argmax(axis=0)[source]

Fake argmax values of an array.

fast5_research.util.build_mapping_summary_table(mapping_summary)[source]

Build a mapping summary table

Parameters

mapping_summary – List of curr_map dictionaries

Returns

Numpy record array containing summary contents. One record per array element of mapping_summary

fast5_research.util.build_mapping_table(events, ref_seq, post, scale, path, model)[source]

Build a mapping table based on output of a dragonet.mapper style object. Taken from chimaera.common.utilities.

Parameters
  • events – Numpy record array of events. Must contain the mean, stdv, start and length fields.

  • ref_seq – String representation of the reference sequence.

  • post – Numpy 2D array containing the posteriors (event, state).

  • scale – Scaling object.

  • path – Numpy 1D array containing position in reference. May contain negative values, which will be interpreted as “bad emissions”.

  • model – Model object to use.

Returns

numpy record array containing summary fields. One record per event.

Output Field

Description

mean

mean value of event samples (level)

scaled_mean

mean scaled to the bare level emission (mean/mode)

stdv

standard deviation of event samples (noise)

scaled_stdv

stdv scaled to the bare stdv emission (mode)

start

start time of event /s

length

length of event /s

model_level

modelled event level, i.e. the level emission associated with the kmer kmer, scaled to the data

model_scaled_level

bare level emission

model_sd

modelled event noise, i.e. the sd emission associated with the kmer kmer, scaled to the data

model_scaled_sd

bare noise emission

seq_pos

aligned sequence position, position on Viterbi path

p_seq_pos

posterior probability of states on Viterbi path

kmer

kmer identity of seq_pos

mp_pos

aligned sequence position, position with highest posterioir

p_mp_pos

posterior probability of most probable states

mp_kmer

kmer identity of mp_kmer

good_emission

whether or not the HMM has tagged event as fitting the model

fast5_research.util.compute_movement_stats(path)[source]

Compute movement stats from a mapping state path

Parameters

pathnp.ndarry containing position in reference. Negative values are interpreted as “bad emissions”.

fast5_research.util.create_basecall_1d_output(raw_events, scale, path, model, post=None)[source]

Create the annotated event table and basecalling summaries similiar to chimaera.

Parameters
  • raw_eventsnp.ndarray with fields mean, stdv, start and, length fields.

  • scaledragonet.basecall.scaling.Scaler object (or object with attributes shift, scale, drift, var, scale_sd, var_sd, and var_sd.

  • path – list containing state indices with respect to model.

  • model:class:dragonet.util.model.Model object.

  • post – Two-dimensional np.ndarray containing posteriors (event, state).

  • quality_data – :class:np.ndarray Array containing quality_data, used to annotate events.

Returns

A tuple of:

  • the annotated input event table

  • a dict of result

fast5_research.util.create_mapping_output(raw_events, scale, path, model, seq, post=None, n_states=None, is_reverse=False, substates=False)[source]

Create the annotated event table and summaries similiar to chimaera

Parameters
  • raw_eventsnp.ndarray with fields mean, stdv, start, and length fields.

  • scaledragonet.basecall.scaling.Scaler object (or object with attributes shift, scale, drift, var, scale_sd, var_sd, and var_sd.

  • path – list containing state indices with respect to model.

  • model:class:dragonet.util.model.Model object.

  • seq – String representation of the reference sequence.

  • post – Two-dimensional np.ndarray containing posteriors (event, state).

  • is_reverse – Mapping refers to ‘-‘ strand (bool).

  • substate – Mapping contains substates?

Returns

A tuple of: * the annotated input event table, * a dict of result.

fast5_research.util.docstring_parameter(*sub)[source]

Allow docstrings to contain parameters.

fast5_research.util.dtype_descr(arr)[source]

Get arr.dtype.descr Views of structured arrays in which columns have been re-ordered nolonger support arr.dtype.descr see https://github.com/numpy/numpy/commit/dd8a2a8e29b0dc85dca4d2964c92df3604acc212

fast5_research.util.file_has_fields(fname, fields=None)[source]

Check that a tsv file has given fields

Parameters
  • fname – filename to read. If the filename extension is gz or bz2, the file is first decompressed.

  • fields – list of required fields.

Returns

boolean

fast5_research.util.get_changes(data, ignore_cols=None, use_cols=None)[source]

Return only rows of a structured array which are not equal to the previous row.

Parameters
  • data – Numpy record array.

  • ignore_cols – iterable of column names to ignore in checking for equality between rows.

  • use_cols – iterable of column names to include in checking for equality between rows (only used if ignore_cols is None).

Returns

Numpy record array.

fast5_research.util.group_vector(arr)[source]

Group a vector by unique values.

Parameters

arr – input vector to be grouped.

Returns

a dictionary mapping unique values to arrays of indices of the input vector.

fast5_research.util.kmer_overlap_gen(kmers, moves=None)[source]

From a list of kmers return the character shifts between them. (Movement from i to i+1 entry, e.g. [AATC,ATCG] returns [0,1]). Allowed moves may be specified in moves argument in order of preference. Taken from dragonet.bio.seq_tools

Parameters
  • kmers – sequence of kmer strings.

  • moves – allowed movements, if None all movements to length of kmer are allowed.

fast5_research.util.mad(data, factor=None, axis=None, keepdims=False)[source]

Compute the Median Absolute Deviation, i.e., the median of the absolute deviations from the median, and (by default) adjust by a factor for asymptotically normal consistency.

Parameters
  • data – A ndarray object

  • factor – Factor to scale MAD by. Default (None) is to be consistent with the standard deviation of a normal distribution (i.e. mad( N(0,sigma^2) ) = sigma).

  • axis – For multidimensional arrays, which axis to calculate the median over.

  • keepdims – If True, axis is kept as dimension of length 1

Returns

the (scaled) MAD

fast5_research.util.mean_qscore(scores)[source]

Returns the phred score corresponding to the mean of the probabilities associated with the phred scores provided. Taken from chimaera.common.utilities.

Parameters

scores – Iterable of phred scores.

Returns

Phred score corresponding to the average error rate, as estimated from the input phred scores.

fast5_research.util.med_mad(data, factor=None, axis=None, keepdims=False)[source]

Compute the Median Absolute Deviation, i.e., the median of the absolute deviations from the median, and the median

Parameters
  • data – A ndarray object

  • factor – Factor to scale MAD by. Default (None) is to be consistent with the standard deviation of a normal distribution (i.e. mad( N(0,sigma^2) ) = sigma).

  • axis – For multidimensional arrays, which axis to calculate over

  • keepdims – If True, axis is kept as dimension of length 1

Returns

a tuple containing the median and MAD of the data

fast5_research.util.qstring_to_phred(quality)[source]

Compute standard phred scores from a quality string.

fast5_research.util.readtsv(fname, fields=None, **kwargs)[source]

Read a tsv file into a numpy array with required field checking

Parameters
  • fname – filename to read. If the filename extension is gz or bz2, the file is first decompressed.

  • fields – list of required fields.

fast5_research.util.seq_to_kmers(seq, length)[source]

Turn a string into a list of (overlapping) kmers.

e.g. perform the transformation:

‘ATATGCG’ => [‘ATA’,’TAT’, ‘ATG’, ‘TGC’, ‘GCG’]

Parameters
  • seq – character string

  • length – length of kmers in output

Returns

A list of overlapping kmers

fast5_research.util.validate_event_table(table)[source]

Check if an object contains all columns of a basic event array.

fast5_research.util.validate_model_table(table)[source]

Check if an object contains all columns of a dragonet Model.

fast5_research.util.validate_scale_object(obj)[source]

Check if an object contains all attributes of dragonet Scaler.

fast5_research.util.window(iterable, size)[source]

Create an iterator returning a sliding window from another iterator

Parameters
  • iterable – Iterator

  • size – Size of window

Returns

an iterator returning a tuple containing the data in the window

Module contents