Skip to content

VADR_Update

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level Dockstore
Genomic Characterization, Standalone HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV v4.0.0 Yes Sample-level VADR_Update_PHB

VADR_Update_PHB

VADR_Update_PHB is a standalone workflow dedicated to running VADR. By default, the workflow uses a slimmed-down docker image running VADR (v1.6.4), which requires models to be provided separately. The table below outlines the recommended models and VADR parameters for use in the workflow.

Organism vadr_model_file vadr_opts max_length
sars-cov-2 "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.6.3-1.tar.gz" "--mkey sarscov2 --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --noseqnamemax --out_allfasta" 30000
MPXV "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mpxv-1.4.2-1.tar.gz" "--mkey mpxv --glsearch --minimap2 -s -r --nomisc --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --s_overhang 150 --out_allfasta" 210000
WNV "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flavi-1.2-1.tar.gz" "--mkey flavi --nomisc --noprotid --out_allfasta" 11000
flu "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flu-1.6.3-2.tar.gz" "--mkey flu --atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3" 13500
rsv_a "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz" "--mkey rsv --xnocomp -r" 15500
rsv_b "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz" "--mkey rsv --xnocomp -r" 15500
measles "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mev-1.02.tar.gz" "--mkey mev -r --indefclass 0.01" 18000
mumps "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-muv-1.01.tar.gz" "--mkey muv -r --indefclass 0.025" 18000
rubella "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-ruv-1.01.tar.gz" "--mkey ruv -r" 10000

Inputs

Please note the default values are for SARS-CoV-2.

This workflow runs on the sample level.

Terra Task Name Variable Type Description Default Value Terra Status
vadr_update genome_fasta File Consensus genome assembly Required
consensus_qc cpu Int Number of CPUs to allocate to the task 1 Optional
consensus_qc disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
consensus_qc docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 Optional
consensus_qc genome_length Int Internal component, do not modify Optional
consensus_qc memory Int Amount of memory/RAM (in GB) to allocate to the task 2 Optional
consensus_qc reference_genome File Internal component, do not modify Optional
organism_parameters auspice_config File Internal component, do not modify Optional
organism_parameters clades_tsv File Internal component, do not modify Optional
organism_parameters flu_genoflu_genotype String Internal component, do not modify N/A Optional
organism_parameters flu_segment String Internal component, do not modify N/A Optional
organism_parameters flu_subtype String Internal component, do not modify N/A Optional
organism_parameters gene_locations_bed_file File Internal component, do not modify Optional
organism_parameters genome_length_input Int Internal component, do not modify Optional
organism_parameters hiv_primer_version String Internal component, do not modify v1 Optional
organism_parameters kraken_target_organism_input String Internal component, do not modify Optional
organism_parameters min_date Float Internal component, do not modify Optional
organism_parameters min_num_unambig Int Internal component, do not modify Optional
organism_parameters narrow_bandwidth Float Internal component, do not modify Optional
organism_parameters nextclade_dataset_name_input String Internal component, do not modify Optional
organism_parameters nextclade_dataset_tag_input String Internal component, do not modify Optional
organism_parameters pangolin_docker_image String Internal component, do not modify Optional
organism_parameters pivot_interval Int Internal component, do not modify Optional
organism_parameters primer_bed_file File Internal component, do not modify Optional
organism_parameters proportion_wide Float Internal component, do not modify Optional
organism_parameters reference_genbank File Internal component, do not modify Optional
organism_parameters reference_genome File Internal component, do not modify Optional
organism_parameters reference_gff_file File Internal component, do not modify Optional
vadr cpu Int Number of CPUs to allocate to the task 4 Optional
vadr disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
vadr docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/staphb/vadr:1.6.4 Optional
vadr min_length Int Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script 50 Optional
vadr_update organism String Target organism for VADR sars-cov-2 Optional
vadr_update vadr_max_length Int Maximum length for the fasta-trim-terminal-ambigs.pl VADR script 30000 Optional
vadr_update vadr_memory Int Amount of memory/RAM (in GB) to allocate to the task 16 Optional
vadr_update vadr_model_file File Path to the a tar + gzipped VADR model file gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.6.3-1.tar.gz Optional
vadr_update vadr_opts String Options for the v-annotate.pl VADR script --noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta Optional
vadr_update vadr_skip_length Int Minimum assembly length (unambiguous) to run VADR 10000 Optional
version_capture docker String The Docker container to use for the task us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Workflow Tasks

VADR: Assembly Validation and Annotation

VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.

As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.

Outputs

Variable Type Description
vadr_alerts_list File A file containing all of the fatal alerts as determined by VADR
vadr_all_outputs_tar_gz File A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations.
vadr_classification_summary_file File Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description.
vadr_docker String Docker image used to run VADR
vadr_fastas_zip_archive File Zip archive containing all fasta files created during VADR analysis
vadr_feature_tbl_fail File 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_feature_tbl_pass File 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description.
vadr_num_alerts String Number of fatal alerts as determined by VADR
vadr_update_analysis_date String Date of analysis
vadr_update_version String Version of the Public Health Bioinformatics (PHB) repository used