#!/bin/bash ## This script uses the FASTX-Toolkit to check the quality of sequences in a given FastQ file and can discard any sequences that are of low quality. ## This method is generally better than "trimming" methods of quality filtering for analysis of in vitro selection sequences. Trimming involves removing bases (typically near the ends) of sequences based on low quality, which is acceptable for genomic sequencing, but because the entire sequence is needed for functional DNA, trimming the ends is not useful. Instead, if too many bases of a sequence are low quality, the entire sequence is discarded. ## Requires installation of the FASTX-Toolkit, found here: http://hannonlab.cshl.edu/fastx_toolkit/ ## Quality Filter Variables filter=25 ## Minimum quality score to keep. percent=100 ## Minimum % of bases that must have $filter quality to keep. input_file=/full/path/to/file.fastq ## Input file to analyze. output_file=/full/path/to/file.fastq ## Output file. fastq_quality_filter -Q 33 -q $filter -p $percent -i $input_file -o $output_file ## Note: The "-Q" paramater is undocumented, but allows different phred quality encoding to be used (default is 64). Modern HTS typically uses +33 phred encoding.