#!/bin/bash
## FASTAptamer Enrich compares all sequences from any two or three input files, gives the abundance of all unique sequences between all of the files, and gives all combinations of pairwise enrichment values (in RPM) of all sequences found in multiple files.
## Can use either Count or Cluster files as input. Will output as a tab-separated values (.tsv) file that can be opened by any spreadsheet program for analysis and sorting.
## Can specify a "filter" to only analyze sequences that are greater than this cut-off value (given as reads-per-million, RPM, NOT as number of reads) in ALL specified files. This can be useful for pools with large numbers of unique sequences as the output file can be too large to open in typical spreadsheets programs. Could also use the "grep searcher" script for a more "manual" method of searching for the most abundant sequences from each pool instead, if this is the case.
## Requires installation of the FASTAptamer program, found here: https://burkelab.missouri.edu/fastaptamer.html
## FASTAptamer Publication: https://doi.org/10.1038%2Fmtna.2015.4

## FASTAptamer Enrich Variables
filter=5                          ## Cut-off to only output seqs > this number of RPM (NOT reads).
fastapt_dir=/full/path/to/dir             ## Directory where FASTAptamer Perl Scripts are located.
rX_input_path=/full/path/to/file-1.fasta  ## First Input File to compare.
rY_input_path=/full/path/to/file-2.fasta  ## Second Input File to compare.
rZ_input_path=/full/path/to/file-3.fasta  ## Third Input File to compare.
output_path=/full/path/to/file.tsv        ## Output Enrich File (.tsv).

## Note: To compare only two files, remove " -z $rZ_input_path".
perl $fastapt_dir/fastaptamer_enrich -x $rX_input_path -y $rY_input_path -z $rZ_input_path -f $filter -o $output_path