Skip to content

FASTQ

Format version: 0.1

FASTQ is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

Paths

The following path patterns are used to place the data on disk:

File Path pattern
FASTQ file fastq{basecall_status}{duplex_status}/{alias}/{flow_cell_id}{basecall_status}{duplex_status}_{alias_}{short_protocol_run_id}_{short_run_id}_{batch_number}.fastq.gz

See the Patterns documentation for more information on file patterns.

Read batching

The following batching options are used by default:

Option Value
Duration 3600s

For more information on batching see Batching.

Record structure

Oxford Nanopore Technologies FASTQ records contain a key value section after the required unique read id. This should be treated as an unordered set of values.

The approximate structure of a record is:

@<read-id>(\s<key>=<value>)*
ATCG...
+
QQQQ...

For example:

@bd8655fb-383c-45cc-bff3-eb1dc86533e0 key1=value1 key2=value2
ATCG
+
QQQQ

Attributes included in the key value section are listed below.

Required header attributes

runid

Regex [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
Required
Common fields BAM: DSSequencing summary: run_id
Examples
e4994c62-93f9-439a-bc8f-d20c95a137a5

A random-generated uuid for the sequencing protocol (eg: e4994c62-93f9-439a-bc8f-d20c95a137a5) This consists only of lower-case ASCII letters (a-z), digits (0-9) and dashes (-). This maps to the protocol_run_id in the POD5 file.

ch

Regex [0-9]+
Required
Common fields BAM: chSequencing summary: channel
Examples
1
512
3000

The number of the channel the read was acquired on. The first channel is 1.

start_time

Regex ((?:(\d{4}-\d{2}-\d{2})T(\d{2}:\d{2}:\d{2}(?:\.\d+)?))(Z|[\+-]\d{2}:\d{2})?)
Required
Common fields BAM: stSequencing summary: start_time
Examples
2025-01-13T10:45:28.681306+00:00
2016-01-19T15:21:32.59+02:00

The time the read started in RFC3339 format.

flow_cell_id

Regex [A-Z0-9_-]+
Required
Common fields BAM: PUSample sheet: flow_cell_id
Examples
FXX12345
PXX12345
AAA123

The human-readable identifier for the flow cell (eg: FAK54854).

protocol_group_id

Regex [a-zA-Z0-9_\.-]+
Required
Examples
My_Group
my-group-1

Set by the user in the GUI as "Experiment ID".

sample_id

Regex [a-zA-Z0-9_\.-]+
Required
Common fields BAM: LBSequencing summary: sample_idSample sheet: sample_id
Examples
My_Sample
my-sample-1

Set by the user in the GUI as "Sample ID".

barcode

Regex unclassified|barcode([0-9]+)
Required
Only When barcoding
Common fields BAM: SMSequencing summary: barcode_arrangementSample sheet: barcode
Examples
unclassified
barcode01

The barcode assigned to this read by the basecaller (eg: "barcode01"). unclassified if no barcode was detected.

barcode_alias

Regex unclassified|[A-Za-z0-9\-_\.]+
Required
Only When barcoding
Common fields BAM: alSequencing summary: aliasSample sheet: alias
Examples
my_sample
sample01

The user-supplied alias for the barcode. Empty if barcoding is not running. The same as barcode if the user did not supply an alias.

parent_read_id

Regex [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
Required
Common fields BAM: piSequencing summary: parent_read_id
Examples
e4994c62-93f9-439a-bc8f-d20c95a137a5

The read_id of the read which was the source of this FASTQ entry. This may be the same as the FASTQ entry id if no read splitting was performed for this read, or will be a new globally unique UUID value if this read was split out of another read by the basecaller.

basecall_model_version_id

Regex [a-z0-9_@\.]+
Required
Common fields BAM: DS
Examples
rna004_130bps_fast@v5.1.0

The unique identifier for the basecall model used to generate this FASTQ file, as supplied by the basecaller (e.g. 2021-05-17_dna_r9.4.1_minion_384_d37a2ab9).

basecall_gpu

Regex .*
Required
Only When gpu_calling
Common fields BAM: DS
Examples
Nvidia_3090

A string description of the connected GPU.

Header patterns

Additional header patterns are available for FASTQ files on top of the normal Patterns.

Name Value
gpu_header_info A string description of the connected GPU (without spaces), with fastq attribute name. eg 'basecall_gpu=Nvidia_3090'