FASTQ¶
Format version: 0.1
FASTQ is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.
Paths¶
The following path patterns are used to place the data on disk:
File | Path pattern |
---|---|
FASTQ file | fastq{basecall_status}{duplex_status}/{alias}/{flow_cell_id}{basecall_status}{duplex_status}_{alias_}{short_protocol_run_id}_{short_run_id}_{batch_number}.fastq.gz |
See the Patterns documentation for more information on file patterns.
Read batching¶
The following batching options are used by default:
Option | Value |
---|---|
Duration | 3600s |
For more information on batching see Batching.
Record structure¶
Oxford Nanopore Technologies FASTQ records contain a key value section after the required unique read id. This should be treated as an unordered set of values.
The approximate structure of a record is:
@<read-id>(\s<key>=<value>)*
ATCG...
+
QQQQ...
For example:
@bd8655fb-383c-45cc-bff3-eb1dc86533e0 key1=value1 key2=value2
ATCG
+
QQQQ
Attributes included in the key value section are listed below.
Required header attributes¶
runid
¶
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
Examples |
---|
e4994c62-93f9-439a-bc8f-d20c95a137a5 |
A random-generated uuid for the sequencing protocol (eg:
e4994c62-93f9-439a-bc8f-d20c95a137a5) This consists only of lower-case ASCII letters
(a-z), digits (0-9) and dashes (-). This maps to the protocol_run_id
in the POD5 file.
ch
¶
[0-9]+
Examples |
---|
1 |
512 |
3000 |
The number of the channel the read was acquired on. The first channel is 1.
start_time
¶
((?:(\d{4}-\d{2}-\d{2})T(\d{2}:\d{2}:\d{2}(?:\.\d+)?))(Z|[\+-]\d{2}:\d{2})?)
BAM: st
Sequencing summary: start_time
Examples |
---|
2025-01-13T10:45:28.681306+00:00 |
2016-01-19T15:21:32.59+02:00 |
The time the read started in RFC3339 format.
flow_cell_id
¶
[A-Z0-9_-]+
BAM: PU
Sample sheet: flow_cell_id
Examples |
---|
FXX12345 |
PXX12345 |
AAA123 |
The human-readable identifier for the flow cell (eg: FAK54854).
protocol_group_id
¶
[a-zA-Z0-9_\.-]+
Examples |
---|
My_Group |
my-group-1 |
Set by the user in the GUI as "Experiment ID".
sample_id
¶
[a-zA-Z0-9_\.-]+
Examples |
---|
My_Sample |
my-sample-1 |
Set by the user in the GUI as "Sample ID".
barcode
¶
unclassified|barcode([0-9]+)
barcoding
Examples |
---|
unclassified |
barcode01 |
The barcode assigned to this read by the basecaller (eg: "barcode01"
). unclassified
if no barcode was detected.
barcode_alias
¶
unclassified|[A-Za-z0-9\-_\.]+
barcoding
Examples |
---|
my_sample |
sample01 |
The user-supplied alias for the barcode. Empty if barcoding is not running. The same as barcode
if the user did not supply an alias.
parent_read_id
¶
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
BAM: pi
Sequencing summary: parent_read_id
Examples |
---|
e4994c62-93f9-439a-bc8f-d20c95a137a5 |
The read_id
of the read which was the source of this FASTQ entry. This may be the same as the
FASTQ entry id if no read splitting was performed for this read, or will be a new globally
unique UUID value if this read was split out of another read by the basecaller.
basecall_model_version_id
¶
[a-z0-9_@\.]+
BAM: DS
Examples |
---|
rna004_130bps_fast@v5.1.0 |
The unique identifier for the basecall model used to generate
this FASTQ file, as supplied by the basecaller
(e.g. 2021-05-17_dna_r9.4.1_minion_384_d37a2ab9
).
basecall_gpu
¶
.*
gpu_calling
BAM: DS
Examples |
---|
Nvidia_3090 |
A string description of the connected GPU.
Header patterns¶
Additional header patterns are available for FASTQ files on top of the normal Patterns.
Name | Value |
---|---|
gpu_header_info | A string description of the connected GPU (without spaces), with fastq attribute name. eg 'basecall_gpu=Nvidia_3090' |