dadaist2

dadaist2 - a shell wrapper for DADA2, to detect representative sequences and generate a feature table starting from Illumina Paired End reads. This is the main program of the dadaist2 toolkit that includes several wrappers and utilities to streamline the analysis of metabarcoding reads from the Linux shell to R.

Author

Andrea Telatin andrea.telatin@quadram.ac.uk

Synopsis

dadaist2 [options] -i INPUT_DIR -o OUTPUT_DIR

Parameters

Main Parameters

  • -i, –input-directory DIRECTORY

    Directory containing the paired end files in FASTQ format, gzipped or not.

  • -o, –output-directory DIRECTORY

    Output directory (will be created). It is recommended recomment using a new directory for each run.

  • -d, –database DATABASE

    Reference database in gzipped FASTA format. Optional (default: skip) but highly recommended.

  • -m, –metadata FILE

    Metadata file in TSV format, first column must match sample IDs. If not supplied a template will be autogenerated using dadaist2-metadata.

  • -t, –threads INT

    Number of threads (default: 2)

  • –primers FOR:REV

    Strip primers with cutadapt, supply both sequences separated by a colon.

  • -j, –just-concat

    Do not try merging paired end reads, just concatenate.

  • –fastp

    Perform the legacy "fastp" QC

  • –no-trim

    Do not trim primers (using fastp). Equivalent to --trim-primer-for 0 --trim-primer-rev 0.

  • –force

    Will overwrite output folder if it already exists, and will attempt to produce MicrobiomeAnalyst and Rhea folders even when DADA2 filters too many reads.

  • –dada-pool

    Pool samples in DADA2 analysis (experimental)

Input reads

We recommend to prepare a polished directory of input reads having filenames like Samplename_R1.fastq.gz and Samplename_R2.fastq.gz.

Filename starting by numbers are not accepted.

  • -1, –for-tag TAG

    Tag to recognize forward reads (default: _R1)

  • -2, –rev-tag TAG

    Tag to recognize reverse reads (default: _R2)

  • -s, –id-separator CHAR

    Character separating the sample name from the rest of the filename (default: _)

Metabarcoding processing

  • –trunc-len-1 and –trunc-len-2 INT

    Position at which truncate reads (forward and reverse, respectively).

  • -q, –min-qual FLOAT

    Minimum average quality for DADA2 truncation (default: 28)

  • –no-trunc

    Do not truncate reads at the end (required for non-overlapping amplicons, like ITS)

  • –maxee1, and –maxee2 FLOAT

    Maximum Expected Errors in R1 and R2, respectively (default: 1.0 and 1.5)

  • –trunc-qual FLOAT

    DADA2 truncate quality (default: 10)

  • -s1, –trim-primer-for INT

    Trim primer from R1 read specifying the number of bases. Similarly use -s2 (--trim-primer-rev) to remove the front bases from the reverse pair (R2). Default: 20 bases each side.

  • –save-rds

    Save a copy of the RDS file (default: off)

  • –max-loss FLOAT

    After DADA2 run, check the amount of reads globally remaining from input to non-chimeric, abort if the ratio is below threshold (default: 0.2)

Other parameters

  • –crosstalk

    Remove crosstalk using the UNCROSS2 algorithm as described here https://doi.org/10.1101/400762.

  • -p, –prefix STRING

    Prefix for the output FASTA file, if "MD5" is specified, the sequence MD5 hash will be used instead. Default is "ASV".

  • -l, –log-file FILE

    Filename for the program log.

  • –tmp-dir DIR

    Where to place the temporary directory (default are system temp dir or $TMPDIR).

  • –skip-tree

    Do not generate tree. Experimental|Not recommended.

  • –skip-plots

    Do not generate quality plots.

  • –popup

    Display popup notifications (tested on MacOS and Ubuntu)

  • –quiet

    Reduce verbosity

  • –verbose and –debug

    Increase reported information

Source code and documentation

The program is freely available at https://quadram-institute-bioscience.github.io/dadaist2 released under the MIT licence. The website contains the full DOCUMENTATION and we recommend checking for updates.

The paper describing Dadaist2 was published in:

Ansorge, R.; Birolo, G.; James, S.A.; Telatin, A. Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments. Int. J. Mol. Sci. 2021, 22, 5309. https://doi.org/10.3390/ijms22105309