fast5_research package¶

Submodules¶

fast5_research.extract module¶

class fast5_research.extract.MultiWriter(out_path, by_id, prefix='', reads_per_file=4000)[source]¶

Bases: fast5_research.extract.ReadWriter

close()[source]¶

write_read(read)[source]¶

Write a read.

Parameters: read – either a Read object or an hdf group handle from a source multi-read file.

class fast5_research.extract.Read(read_id, read_number, tracking_id, channel_id, context_tags, raw)[source]¶: Bases: object

class fast5_research.extract.ReadWriter(out_path, by_id, prefix='')[source]¶

Bases: object

write_read()[source]¶

class fast5_research.extract.SingleWriter(out_path, by_id, prefix='')[source]¶

Bases: fast5_research.extract.ReadWriter

write_read(read)[source]¶

fast5_research.extract.build_read_index()[source]¶

fast5_research.extract.extract_channel_reads(source, output, prefix, flat, by_id, max_files, multi, channel, summary=None)[source]¶

fast5_research.extract.extract_read_summary()[source]¶

fast5_research.extract.extract_read_summary_internal(src, channels, out_fh, logger)[source]¶

fast5_research.extract.extract_reads()[source]¶

fast5_research.extract.filter_file_from_bam()[source]¶

fast5_research.extract.filter_multi_reads()[source]¶

fast5_research.extract.reads_in_multi(src, filt=None)[source]¶

Get list of read IDs contained within a multi-read file.

Parameters

src – source file.
filt – perform filtering by given set.

Returns

set of read UUIDs (as string and recorded in hdf group name).

fast5_research.extract.time_cast(time, sample_rate)[source]¶: Convert a float time to sample index, or return time unmodified

fast5_research.extract.triplewise(iterable)[source]¶

fast5_research.fast5 module¶

class fast5_research.fast5.Fast5(fname, read='r')[source]¶

Bases: h5py._hl.files.File

Class for grabbing data from single read fast5 files. Many attributes/ groups are assumed to exist currently (we’re concerned mainly with reading). Needs some development to make robust and for writing.

classmethod New(fname, read='w', tracking_id={}, context_tags={}, channel_id={})[source]¶: Construct a fresh single-read file, with meta data written to standard locations.

assert_writable()[source]¶

property attributes¶: Attributes for a read, assumes one read in file

property channel_meta¶: Channel meta information as python dict

property context_tags¶: Context tags meta information as python dict

static convert_channel_id(channel_id)[source]¶

static convert_raw_meta(meta)[source]¶

static convert_tracking_id(tracking_id)[source]¶

get_alignment_attrs(section='template', analysis='Alignment')[source]¶

Read the annotated alignment meta data from the fast5 file.

Parameters

section – String to use in paths, e.g. ‘template’.
analysis – Base analysis name (under /Analyses)

get_analysis_latest(name)[source]¶

Get group of latest (present) analysis with a given base path.

Parameters: name – Get the (full) path of newest analysis with a given base name.

get_analysis_new(name)[source]¶

Get group path for new analysis with a given base name.

Parameters: name – desired analysis name

get_any_mapping_data(section='template', attrs_only=False, get_model=False)[source]¶

Convenience method for extracting whatever mapping data might be present, favouring squiggle_mapping output over basecall_mapping.

Parameters

section – (Probably) ‘template’
attrs_only – Use attrs_only=True to return mapping attributes without events

Returns

the tuple (events, attrs) or attrs only

get_basecall_data(section='template', analysis='Basecall_1D')[source]¶

Read the annotated basecall_1D events from the fast5 file.

Parameters

section – String to use in paths, e.g. ‘template’.
analysis – Base analysis name (under /Analyses)

get_engine_state(state, time=None)[source]¶

Retrieve engine state from /EngineStates/, either across the whole read (default) or at a given time.

Parameters

state – name of engine state
time – time (in seconds) at which to retrieve temperature

get_fastq(analysis='Basecall_1D', section='template', custom=None)[source]¶

Get the fastq (sequence) data.

Parameters

analysis – Base analysis name (under /Analyses)
section – (Probably) ‘template’
custom – Custom hdf path overriding all of the above.

get_mapping_attrs(section='template', analysis='Squiggle_Map')[source]¶

Read the annotated mapping meta data from the fast5 file. Names which are inconsistent between squiggle_mapping and basecall_mapping are added to basecall_mapping (thus duplicating the attributes in basecall mapping).

Parameters

section – String to use in paths, e.g. ‘template’.
analysis – Base analysis name (under /Analyses) For basecall mapping use analysis = ‘Alignment’

get_mapping_data(section='template', analysis='Squiggle_Map', get_model=False)[source]¶

Read the annotated mapping events from the fast5 file.

Note

The seq_pos column for the events table returned from basecall_mapping is adjusted to be the genome position (consistent with squiggle_mapping)

Parameters

section – String to use in paths, e.g. ‘template’.
analysis – Base analysis name (under /Analyses). For basecall mapping use analysis = ‘AlignToRef’.

get_model(section='template', analysis='Squiggle_Map')[source]¶: Get model used for squiggle mapping

get_raw(scale=True)[source]¶

Get raw data in file, might not be present.

Parameters: scale – Scale data to pA? (rather than ADC values)

Warning

This method is deprecated and should not be used, instead use .get_read(raw=True) to read both MinKnow conformant files and previous Tang files.

get_read(group=False, raw=False, read_number=None)[source]¶

Like get_reads, but only the first read in the file

Parameters: group – return hdf group rather than event/raw data

get_read_stats()[source]¶: Combines stats based on events with output of .summary, assumes a one read file.

get_reads(group=False, raw=False, read_numbers=None)[source]¶

Iterator across event data for all reads in file

Parameters: group – return hdf group rather than event data

get_reference_fasta(analysis='Alignment', section='template', custom=None)[source]¶

Get fasta sequence of known DNA fragment for the read.

Parameters

analysis – Base analysis name (under /Analyses)
section – (Probably) ‘template’
custom – Custom hdf path overriding all of the above.

get_sam(analysis='Alignment', section='template', custom=None)[source]¶

Get SAM (alignment) data.

Parameters

analysis – Base analysis name (under /Analyses)
section – (Probably) ‘template’
custom – Custom hdf path overriding all of the above.

get_section_events(section, analysis='Segment_Linear')[source]¶

Get the event data for a signal section

Parameters: analysis – Base analysis path (under /Analyses)

get_section_indices(analysis='Segment_Linear')[source]¶

Get two tuples indicating the event indices for signal segmentation boundaries.

Parameters: analysis – Base analysis path (under /Analyses)

get_split_data(analysis='Segment_Linear')[source]¶

Get signal segmentation data.

Parameters: analysis – Base analysis name (under /Analyses)

get_temperature(time=None, field='heatsink')[source]¶

Retrieve temperature data from /EngineStates/, either across the whole read (default) or at a given time.

Parameters

time – time at which to get temperature
field – one of (‘heatsink’, ‘asic’)

repack(pack_opts='')[source]¶: Run h5repack on the current file. Returns a fresh object.

set_basecall_data(events, scale, path, model, seq, section='template', name='unknown', post=None, score=None, quality_data=None, qstring=None, analysis='Basecall_1D')[source]¶

Create an annotated event table and 1D basecalling summary similiar to chimaera and add them to the fast5 file.

Parameters

events – Numpy record array of events. Must contain the mean, stdv, start and length fields.
scale – Scaling object.
path – Viterbi path containing model pointers (1D np.array).
model – Model object.
seq – Basecalled sequence string for fastq.
section – String to use in paths, e.g. ‘template’.
name – Identifier string for fastq.
post – Numpy 2D array containing the posteriors (event, state), used to annotate events.
score – Quality value for the whole strand.
quality_data – Numpy 2D array containing quality_data, used to annotate events.
qstring – Quality string for fastq.
analysis – Base analysis name (under /Analyses)

set_engine_state(data)[source]¶

Set the engine state data.

Parameters: data – a 1D-array containing two fields, the first of which must be named ‘time’. The name of the second field will be used to name the engine state and be used in the dataset path.

set_mapping_data(events, scale, path, model, seq, ref_name, section='template', post=None, score=None, is_reverse=False, analysis='Squiggle_Map')[source]¶

Create an annotated event table and mapping summary similiar to chimaera and add them to the fast5 file.

Parameters

events – np.ndarray of events. Must contain mean, stdv, start and length fields.
scale – Scaling object.
path – np.ndarray containing position in reference. Negative values will be interpreted as “bad emissions”.
model – Model object to use.
seq – String representation of the reference sequence.
section – Section of strand, e.g. ‘template’.
name – Reference name.
post – Two-dimensional np.ndarray containing posteriors.
score – Mapping quality score.
is_reverse – Mapping refers to ‘-‘ strand (bool).
analysis – Base analysis name (under /Analyses)

set_raw(raw, meta=None, read_number=None)[source]¶

Set the raw data in file.

Parameters

raw – raw data to add
read_number – read number (as usually given in filename and contained within HDF paths, viz. Reads/Read_<>/). If not given attempts will be made to guess the number (assumes single read per file).

set_raw_old(raw, meta)[source]¶

Set the raw data in file.

Parameters

raw – raw data to add
meta – meta data dictionary

Warning

This method does not write raw data conforming to the Fast5 specification. This class will currently still read data written by this method.

set_read(data, meta)[source]¶

Write event data to file

Parameters

data – event data
meta – meta data to attach to read
read_number – per-channel read counter

set_split_data(data, analysis='Segment_Linear')[source]¶

Write a dict containing split point data.

Parameters

data – dict-like object containing attrs to add
analysis – Base analysis name (under /Analyses)

Warning

Not checking currently for required fields.

strip_analyses(keep='EventDetection_000', 'RawData')[source]¶

Remove all analyses from file

Parameters: keep – whitelist of analysis groups to keep

summary(rename=True, delete=True, scale=True)[source]¶: A read summary, assumes one read in file

property tracking_id¶: Tracking id meta information as python dict

property writable¶: Can we write to the file.

fast5_research.fast5.iterate_fast5(path='Stream', strand_list=None, paths=False, mode='r', limit=None, shuffle=False, robust=False, progress=False, recursive=False)[source]¶

Iterate over directory of fast5 files, optionally only returning those in list

Parameters

path – Directory in which single read fast5 are located or filename.
strand_list – List of strands, can be a python list of delimited table. If the later and a filename field is present, this is used to locate files. If a file is given and a strand field is present, the directory index file is searched for and filenames built from that.
paths – Yield file paths instead of fast5 objects.
mode – Mode for opening files.
limit – Limit number of files to consider.
shuffle – Shuffle files to randomize yield of files.
robust – Carry on with iterating over FAST5 files after an exception was raised.
progress – Display progress bar.
recursive – Perform a recursive search for files in subdirectories of path.

fast5_research.fast5.recursive_glob(treeroot, pattern)[source]¶

fast5_research.fast5_bulk module¶

class fast5_research.fast5_bulk.AsicBCommand(command)[source]¶

Bases: object

Wrapper around the asicb command structure

property configuration¶

property min_temperature¶

class fast5_research.fast5_bulk.AsicBConfiguration(config)[source]¶

Bases: object

Wrapper around the asicb configuration struct passed to the asicb over usb

active_mux(channel)[source]¶: Gets the active mux for the specified channel :param channel: 0 based

property bias_voltage¶

bits_at(start, end)[source]¶

int_at(start, end)[source]¶

class fast5_research.fast5_bulk.BulkFast5(filename, mode='r')[source]¶

Bases: h5py._hl.files.File

Class for reading data from a bulk fast5 file

classmethod New(fname, read='a', tracking_id={}, context_tags={}, channel_id={})[source]¶: Construct a fresh bulk file, with meta data written to standard locations. There is currently no checking this meta data. TODO: Add meta data checking.

get_bias_voltage_changes()[source]¶: Get changes in the bias voltage.

Note

For a long (-long-long) time the only logging of the common electrode voltage was the experimental history (accurate to one second). The addition of the voltage trace changed this, but this dataset is cumbersome. MinKnow 1.x(.3?) added the asic command history which is typically much shorter and therefore quicker to query. The bias voltage is numerously record. For MinION asics there is typically a -5X multiplier to convert the data into correct units with the sign people are used to.

get_bias_voltage_changes_in_window(times=None, raw_indices=None)[source]¶

Find all mux voltage changes within a time window.

Parameters

times – tuple of floats (start_second, end_second)
raw_indices – tuple of ints (start_index, end_index)

Note

This is the bias voltage from the expt history (accurate to 1 second), and will not include any changes in voltage related to waveforms. For the full voltage trace, use get_voltage.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_context_meta()[source]¶: Get context meta

get_engine_state(state, time=None)[source]¶

Get changes in an engine state or the value of an engine state at a given time.

Parameters

state – the engine state to retrieve.
time – the time at which to grab engine state.

get_event_detection_parameters()[source]¶: Get the full set of parameters related to event detection

get_events(channel, times=None, raw_indices=None, event_indices=None, None, use_scaling=True)[source]¶

Parse channel event data.

Parameters

channel – channel number int
times – tuple of floats (start_second, end_second)
raw_indices – tuple of ints (start_index, end_index)
event_indices – tuple of ints (start_index, end_index)
use_scaling – if True, scale the current level

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices > event_indices.

get_metadata(channel)[source]¶

Get the metadata for the specified channel.

Look for first for events metadata, and fall-back on raw metadata, returning an empty dict if neither could be found.

get_mux(channel, raw_index=None, time=None, wells_only=False, return_raw_index=False)[source]¶

Find the multiplex well_id (“the mux”) at a given time

Parameters

channel – channel number int
raw_index – sample index
time – time in seconds

Wells_only

bool, if True, ignore changes to mux states not in [1,2,3,4] and hence return the last well mux.

Return_raw_index

bool, if True, return tuple (mux, raw_index), raw_index being raw index when the mux was set.

Note

There are multiple mux states associated with each well (e.g. common_voltage_1 and unblock_volage_1). Here, we return the well_id associated with the mux state (using self.enum_to_mux), i.e. 1 in both these cases.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_mux_changes(channel, wells_only=False)[source]¶

Get changes in multiplex settings for given channel.

Parameters: channel – channel for which to fetch data
Wells_only: bool, if True, ignore changes to mux states not in [1,2,3,4]

Note

There are multiple mux states associated with each well (e.g. 1:common_voltage_1 and 6:unblock_voltage_1). Here, we return mux state numbers, e.g. 1 and 6, which can be linked to the well_id using self.enum_to_mux

get_mux_changes_in_window(channel, times=None, raw_indices=None)[source]¶

Find all mux changes within a time window.

Parameters

channel – channel number int
times – tuple of floats (start_second, end_second)
raw_indices – tuple of ints (start_index, end_index)

Note

There are multiple mux values associated with each well (e.g. 1:common_voltage_1 and 6:unblock_voltage_1). Here, we return mux values, e.g. 1 and 6, which can be linked to the well_id using self.enum_to_mux.

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_raw(channel, times=None, raw_indices=None, None, use_scaling=True)[source]¶

If available, parse channel raw data.

Parameters

channel – channel number int
times – tuple of floats (start_second, end_second)
raw_indices – tuple of ints (start_index, end_index)
use_scaling – if True, scale the current level

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_reads(channel, transitions=False, multi_row_class='auto')[source]¶

Parse channel read data to yield details of reads.

Parameters

channel – channel number int
transitions – if True, include transition reads
multi_row_class – options: ‘auto’, modal, ‘penultimate’, ‘final’. For reads which span multiple rows, use the classification from ‘auto’: modal class if present, penultimate row if not ‘modal’: modal class if present ‘penultimate’: penultimate row ‘final’: final row. Modal classification not supported by very old versions of MinKNOW.

get_state(channel, raw_index=None, time=None)[source]¶

Find the channel state at a given time

Parameters

channel – channel number int
raw_index – sample index
time – time in seconds

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_state_changes(channel)[source]¶

Parse channel state changes.

Parameters: channel – channel number int

get_states_in_window(channel, times=None, raw_indices=None)[source]¶

Find all channel states within a time window.

Parameters

channel – channel number int
times – tuple of floats (start_second, end_second)
raw_indices – tuple of ints (start_index, end_index)

Note

Exactly one of the slice keyword arguments needs to be specified, as the method will override them in the order of times > raw_indices.

get_temperature(time=None, field='heatsink')[source]¶

get_tracking_meta()[source]¶: Get tracking meta data

get_voltage(times=None, raw_indices=None, None, use_scaling=True)[source]¶

Extracts raw common electrode trace

Raw_indices: tuple of ints to limit section of voltage data loaded.
Use_scaling: bool, whether to scale voltage data. If no scaling meta is found, scale by -5 (as appropriate for MinION).
Returns: voltage as array (including 5x multiplyer for MinKnow)

get_waveform_timings()[source]¶

Extract the timings of the waveforms (if any).

Returns: list of tuples of start and end times

has_raw(channel)[source]¶: Return True if there is raw data for this channel.

has_reads(channel)[source]¶: Return True if there is read data for this channel.

has_states(channel)[source]¶: Return True if there is State data for this channel.

parse_history()[source]¶: Parse the experimental history to pull out various environmental factors. The functions below are quite nasty, don’t enquire too hard.

set_events(data, meta, channel)[source]¶

Write event data to file

Parameters

data – event data
meta – meta data to attach to read
read_number – per-channel read counter

set_raw(raw, channel, meta=None)[source]¶

Set the raw data in file.

Parameters

raw – raw data to add
channel – channel number

set_voltage(data, meta)[source]¶

fast5_research.util module¶

class fast5_research.util.MockZeroArray(shape)[source]¶

Bases: numpy.ndarray

argmax(axis=0)[source]¶: Fake argmax values of an array.

fast5_research.util.build_mapping_summary_table(mapping_summary)[source]¶

Build a mapping summary table

Parameters: mapping_summary – List of curr_map dictionaries
Returns: Numpy record array containing summary contents. One record per array element of mapping_summary

fast5_research.util.build_mapping_table(events, ref_seq, post, scale, path, model)[source]¶

Build a mapping table based on output of a dragonet.mapper style object. Taken from chimaera.common.utilities.

Parameters

events – Numpy record array of events. Must contain the mean, stdv, start and length fields.
ref_seq – String representation of the reference sequence.
post – Numpy 2D array containing the posteriors (event, state).
scale – Scaling object.
path – Numpy 1D array containing position in reference. May contain negative values, which will be interpreted as “bad emissions”.
model – Model object to use.

Returns

numpy record array containing summary fields. One record per event.

Output Field	Description
mean	mean value of event samples (level)
scaled_mean	mean scaled to the bare level emission (mean/mode)
stdv	standard deviation of event samples (noise)
scaled_stdv	stdv scaled to the bare stdv emission (mode)
start	start time of event /s
length	length of event /s
model_level	modelled event level, i.e. the level emission associated with the kmer kmer, scaled to the data
model_scaled_level	bare level emission
model_sd	modelled event noise, i.e. the sd emission associated with the kmer kmer, scaled to the data
model_scaled_sd	bare noise emission
seq_pos	aligned sequence position, position on Viterbi path
p_seq_pos	posterior probability of states on Viterbi path
kmer	kmer identity of seq_pos
mp_pos	aligned sequence position, position with highest posterioir
p_mp_pos	posterior probability of most probable states
mp_kmer	kmer identity of mp_kmer
good_emission	whether or not the HMM has tagged event as fitting the model

fast5_research.util.compute_movement_stats(path)[source]¶

Compute movement stats from a mapping state path

Parameters: path – np.ndarry containing position in reference. Negative values are interpreted as “bad emissions”.

fast5_research.util.create_basecall_1d_output(raw_events, scale, path, model, post=None)[source]¶

Create the annotated event table and basecalling summaries similiar to chimaera.

Parameters

raw_events – np.ndarray with fields mean, stdv, start and, length fields.
scale – dragonet.basecall.scaling.Scaler object (or object with attributes shift, scale, drift, var, scale_sd, var_sd, and var_sd.
path – list containing state indices with respect to model.
model – :class:dragonet.util.model.Model object.
post – Two-dimensional np.ndarray containing posteriors (event, state).
quality_data – :class:np.ndarray Array containing quality_data, used to annotate events.

Returns

A tuple of:

the annotated input event table
a dict of result

fast5_research.util.create_mapping_output(raw_events, scale, path, model, seq, post=None, n_states=None, is_reverse=False, substates=False)[source]¶

Create the annotated event table and summaries similiar to chimaera

Parameters

raw_events – np.ndarray with fields mean, stdv, start, and length fields.
scale – dragonet.basecall.scaling.Scaler object (or object with attributes shift, scale, drift, var, scale_sd, var_sd, and var_sd.
path – list containing state indices with respect to model.
model – :class:dragonet.util.model.Model object.
seq – String representation of the reference sequence.
post – Two-dimensional np.ndarray containing posteriors (event, state).
is_reverse – Mapping refers to ‘-‘ strand (bool).
substate – Mapping contains substates?

Returns

A tuple of: * the annotated input event table, * a dict of result.

fast5_research.util.docstring_parameter(*sub)[source]¶: Allow docstrings to contain parameters.

fast5_research.util.dtype_descr(arr)[source]¶: Get arr.dtype.descr Views of structured arrays in which columns have been re-ordered nolonger support arr.dtype.descr see https://github.com/numpy/numpy/commit/dd8a2a8e29b0dc85dca4d2964c92df3604acc212

fast5_research.util.file_has_fields(fname, fields=None)[source]¶

Check that a tsv file has given fields

Parameters

fname – filename to read. If the filename extension is gz or bz2, the file is first decompressed.
fields – list of required fields.

Returns

boolean

fast5_research.util.get_changes(data, ignore_cols=None, use_cols=None)[source]¶

Return only rows of a structured array which are not equal to the previous row.

Parameters

data – Numpy record array.
ignore_cols – iterable of column names to ignore in checking for equality between rows.
use_cols – iterable of column names to include in checking for equality between rows (only used if ignore_cols is None).

Returns

Numpy record array.

fast5_research.util.group_vector(arr)[source]¶

Group a vector by unique values.

Parameters: arr – input vector to be grouped.
Returns: a dictionary mapping unique values to arrays of indices of the input vector.

fast5_research.util.kmer_overlap_gen(kmers, moves=None)[source]¶

From a list of kmers return the character shifts between them. (Movement from i to i+1 entry, e.g. [AATC,ATCG] returns [0,1]). Allowed moves may be specified in moves argument in order of preference. Taken from dragonet.bio.seq_tools

Parameters

kmers – sequence of kmer strings.
moves – allowed movements, if None all movements to length of kmer are allowed.

fast5_research.util.mad(data, factor=None, axis=None, keepdims=False)[source]¶

Compute the Median Absolute Deviation, i.e., the median of the absolute deviations from the median, and (by default) adjust by a factor for asymptotically normal consistency.

Parameters

data – A ndarray object
factor – Factor to scale MAD by. Default (None) is to be consistent with the standard deviation of a normal distribution (i.e. mad( N(0,sigma^2) ) = sigma).
axis – For multidimensional arrays, which axis to calculate the median over.
keepdims – If True, axis is kept as dimension of length 1

Returns

the (scaled) MAD

fast5_research.util.mean_qscore(scores)[source]¶

Returns the phred score corresponding to the mean of the probabilities associated with the phred scores provided. Taken from chimaera.common.utilities.

Parameters: scores – Iterable of phred scores.
Returns: Phred score corresponding to the average error rate, as estimated from the input phred scores.

fast5_research.util.med_mad(data, factor=None, axis=None, keepdims=False)[source]¶

Compute the Median Absolute Deviation, i.e., the median of the absolute deviations from the median, and the median

Parameters

data – A ndarray object
factor – Factor to scale MAD by. Default (None) is to be consistent with the standard deviation of a normal distribution (i.e. mad( N(0,sigma^2) ) = sigma).
axis – For multidimensional arrays, which axis to calculate over
keepdims – If True, axis is kept as dimension of length 1

Returns

a tuple containing the median and MAD of the data

fast5_research.util.qstring_to_phred(quality)[source]¶: Compute standard phred scores from a quality string.

fast5_research.util.readtsv(fname, fields=None, **kwargs)[source]¶

Read a tsv file into a numpy array with required field checking

Parameters

fname – filename to read. If the filename extension is gz or bz2, the file is first decompressed.
fields – list of required fields.

fast5_research.util.seq_to_kmers(seq, length)[source]¶

Turn a string into a list of (overlapping) kmers.

e.g. perform the transformation:

‘ATATGCG’ => [‘ATA’,’TAT’, ‘ATG’, ‘TGC’, ‘GCG’]

Parameters

seq – character string
length – length of kmers in output

Returns

A list of overlapping kmers

fast5_research.util.validate_event_table(table)[source]¶: Check if an object contains all columns of a basic event array.

fast5_research.util.validate_model_table(table)[source]¶: Check if an object contains all columns of a dragonet Model.

fast5_research.util.validate_scale_object(obj)[source]¶: Check if an object contains all attributes of dragonet Scaler.

fast5_research.util.window(iterable, size)[source]¶

Create an iterator returning a sliding window from another iterator

Parameters

iterable – Iterator
size – Size of window

Returns

an iterator returning a tuple containing the data in the window

fast5_research package¶

Submodules¶

fast5_research.extract module¶

fast5_research.fast5 module¶

fast5_research.fast5_bulk module¶

fast5_research.util module¶

Module contents¶