General information

dadaist2 is a wrapper designed to perform a basic DADA2 analysis from the command line, starting from a directory containing a set of paired-end samples in FASTQ (eventually gzipped) format.

The wrapper is written in Perl, and will run R scripts via Rscript --vanilla.

Main dependencies

  • Perl, with some modules: FASTX::Reader, File::Temp, and other standard modules (notably Pod::Usage and Digest::MD5)
  • R, and some libraries: dada2, phyloseq, and DECIPHER
    • DADA2 is used for denoising, feature table generation and representative sequences picking
    • DECIPHER (or alternatively DADA2) is used for taxonomy assignments
  • cutadapt for primer removal
    • fastp (alternative to SeqFu QC, no trimming is performed)
  • clustalo (for multiple sequence alignment)
  • fasttree (to generate a tree of the representative sequences)

Bibliography

  • Benjamin J Callahan, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. Dada2: high-resolution sample inference from illumina amplicon data. Nature methods, 13(7):581, 2016. doi:10.1038/nmeth.3869.
  • Sievers F, Wilm A, Dineen D, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology. 2011 Oct;7:539. doi:10.1038/msb.2011.75.
  • Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. Plos one. 2010 Mar;5(3):e9490. doi:10.1371/journal.pone.0009490.
  • McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. Plos one. 2013 ;8(4):e61217. doi:10.1371/journal.pone.0061217.
  • Lagkouvardos I, Fischer S, Kumar N, Clavel T. (2017) Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons. PeerJ 5:e2836 doi:10.7717/peerj.2836
  • Martin M. Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads doi:10.14806/ej.17.1.200

The DADA2 wrapper was based on the R scripts from: