fastx-toolkit

The FASTX-Toolkit is a handy set of Linux utilities for processing FASTQ files.  You can find the documentation on the FASTX-Toolkit website.

First, make sure you have installed the FASTX-Toolkit on your Virtual Machine:

jabelsky@dinesh-mdh:~$ sudo apt-get install fastx-toolkit

Confirm that you want to install the program (and any necessary dependencies) by entering "Y", and then the tool will be installed in your virtual machine.

Installation of most Linux packages results in the creation of several binaries (e.g. programs).  You can see which ones were installed by the following command:

jabelsky@dinesh-mdh:~$ dpkg-query -L fastx-toolkit
/.
/usr
/usr/share
/usr/share/man
/usr/share/man/man1
/usr/share/man/man1/fastx_uncollapser.1.gz
...
/usr/bin
/usr/bin/fastx_uncollapser
/usr/bin/fastx_barcode_splitter.pl
/usr/bin/fastx_clipper

/usr/bin/fastq_quality_boxplot_graph.sh
/usr/bin/fastx_quality_stats
/usr/bin/fasta_clipping_histogram.pl/
usr/bin/fasta_nucleotide_changer
/usr/bin/fastx_trimmer
... 

Everything placed in the /usr/bin/ directory is a binary, and can be executed by just entering the name:

jabelsky@dinesh-mdh:~$ fastx_trimmer -help
usage: fastx_trimmer [-h] [-f N] [-l N] [-t N] [-m MINLEN] [-z] [-v] [-i INFILE] [-o OUTFILE]
Part of FASTX Toolkit 0.0.14 by A. Gordon (assafgordon@gmail.com)

[-h] = This helpful help screen.
[-f N] = First base to keep. Default is 1 (=first base).
[-l N] = Last base to keep. Default is entire read.
[-t N] = Trim N nucleotides from the end of the read. '-t' can not be used with '-l' and '-f'.
[-m MINLEN] = With [-t], discard reads shorter than MINLEN.
[-z] = Compress output with GZIP. [-i INFILE] = FASTA/Q input file. default is STDIN.
[-o OUTFILE] = FASTA/Q output file. default is STDOUT.

Truseq RNA v2 Adapter Sequences: found of page 19 of the Illumina Adapter Sequences pdf.

Excellent resource about how TruSeq Illumina adapters are appended to sequences: TUCF_Understanding_Illumina_TruSeq_Adapters.pdf