Most NGS sequencers will save their output as text files in FASTQ format. In the modern incarnation of this format each sequence is written using 4 lines:
- The first will contain the sequence name, followed by the “@” symbol
- The DNA sequence itself
- A spacing line, a “+”, optionally followed by the sequence name (repeated)
- The quality line
An example of a single sequencing read written in FASTQ format is:
@SRR5232030.1 1 length=101 NATCAATAGTATTCGTACCAATAGAACGAATATCCGCCAGCACCATTTGTTTGGCGGCGTCGCCCACCACGACAATGGAAACCACCGACGCAATACCGATT + #>BBABFFFFFFGGGGGGGGGGHHHHHGGGGGHHHGGGGGGHHHHHHHHHGHHGHGGGGGGGGGGGGGHHGGGGGHHHHHGHHHGGGGGGGGFGHHHGGGG
The quality is encoded to have a single character representing the Phred score of a base. This means that the quality of the tenth base is encoded in the tenth character of the quality line.
One Reply to “FastQ file: the common output from NGS sequencers”