VADR_Update¶
Quick Facts¶
| Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level | Dockstore |
|---|---|---|---|---|---|
| Genomic Characterization, Standalone | HAV, Influenza, Monkeypox virus, RSV-A, RSV-B, SARS-CoV-2, Viral, WNV | v4.0.0 | Yes | Sample-level | VADR_Update_PHB |
VADR_Update_PHB¶
VADR_Update_PHB is a standalone workflow dedicated to running VADR. By default, the workflow uses a slimmed-down docker image running VADR (v1.6.4), which requires models to be provided separately. The table below outlines the recommended models and VADR parameters for use in the workflow.
| Organism | vadr_model_file | vadr_opts | max_length |
|---|---|---|---|
| sars-cov-2 | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.6.3-1.tar.gz" |
"--mkey sarscov2 --glsearch -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --noseqnamemax --out_allfasta" |
30000 |
| MPXV | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mpxv-1.4.2-1.tar.gz" |
"--mkey mpxv --glsearch --minimap2 -s -r --nomisc --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --s_overhang 150 --out_allfasta" |
210000 |
| WNV | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flavi-1.2-1.tar.gz" |
"--mkey flavi --nomisc --noprotid --out_allfasta" |
11000 |
| flu | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-flu-1.6.3-2.tar.gz" |
"--mkey flu --atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3" |
13500 |
| rsv_a | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz" |
"--mkey rsv --xnocomp -r" |
15500 |
| rsv_b | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-rsv-1.5-2.tar.gz" |
"--mkey rsv --xnocomp -r" |
15500 |
| measles | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-mev-1.02.tar.gz" |
"--mkey mev -r --indefclass 0.01" |
18000 |
| mumps | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-muv-1.01.tar.gz" |
"--mkey muv -r --indefclass 0.025" |
18000 |
| rubella | "gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-ruv-1.01.tar.gz" |
"--mkey ruv -r" |
10000 |
Inputs¶
Please note the default values are for SARS-CoV-2.
This workflow runs on the sample level.
| Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
|---|---|---|---|---|---|
| vadr_update | genome_fasta | File | Consensus genome assembly | Required | |
| consensus_qc | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
| consensus_qc | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| consensus_qc | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/utility:1.1 | Optional |
| consensus_qc | genome_length | Int | Internal component, do not modify | Optional | |
| consensus_qc | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 2 | Optional |
| consensus_qc | reference_genome | File | Internal component, do not modify | Optional | |
| organism_parameters | auspice_config | File | Internal component, do not modify | Optional | |
| organism_parameters | clades_tsv | File | Internal component, do not modify | Optional | |
| organism_parameters | flu_genoflu_genotype | String | Internal component, do not modify | N/A | Optional |
| organism_parameters | flu_segment | String | Internal component, do not modify | N/A | Optional |
| organism_parameters | flu_subtype | String | Internal component, do not modify | N/A | Optional |
| organism_parameters | gene_locations_bed_file | File | Internal component, do not modify | Optional | |
| organism_parameters | genome_length_input | Int | Internal component, do not modify | Optional | |
| organism_parameters | hiv_primer_version | String | Internal component, do not modify | v1 | Optional |
| organism_parameters | kraken_target_organism_input | String | Internal component, do not modify | Optional | |
| organism_parameters | min_date | Float | Internal component, do not modify | Optional | |
| organism_parameters | min_num_unambig | Int | Internal component, do not modify | Optional | |
| organism_parameters | narrow_bandwidth | Float | Internal component, do not modify | Optional | |
| organism_parameters | nextclade_dataset_name_input | String | Internal component, do not modify | Optional | |
| organism_parameters | nextclade_dataset_tag_input | String | Internal component, do not modify | Optional | |
| organism_parameters | pangolin_docker_image | String | Internal component, do not modify | Optional | |
| organism_parameters | pivot_interval | Int | Internal component, do not modify | Optional | |
| organism_parameters | primer_bed_file | File | Internal component, do not modify | Optional | |
| organism_parameters | proportion_wide | Float | Internal component, do not modify | Optional | |
| organism_parameters | reference_genbank | File | Internal component, do not modify | Optional | |
| organism_parameters | reference_genome | File | Internal component, do not modify | Optional | |
| organism_parameters | reference_gff_file | File | Internal component, do not modify | Optional | |
| vadr | cpu | Int | Number of CPUs to allocate to the task | 4 | Optional |
| vadr | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
| vadr | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/staphb/vadr:1.6.4 | Optional |
| vadr | min_length | Int | Minimum length subsequence to possibly replace Ns for the fasta-trim-terminal-ambigs.pl VADR script | 50 | Optional |
| vadr_update | organism | String | Target organism for VADR | sars-cov-2 | Optional |
| vadr_update | vadr_max_length | Int | Maximum length for the fasta-trim-terminal-ambigs.pl VADR script | 30000 | Optional |
| vadr_update | vadr_memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 16 | Optional |
| vadr_update | vadr_model_file | File | Path to the a tar + gzipped VADR model file | gs://theiagen-public-resources-rp/reference_data/databases/vadr_models/vadr-models-sarscov2-1.6.3-1.tar.gz | Optional |
| vadr_update | vadr_opts | String | Options for the v-annotate.pl VADR script | --noseqnamemax --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --out_allfasta | Optional |
| vadr_update | vadr_skip_length | Int | Minimum assembly length (unambiguous) to run VADR | 10000 | Optional |
| version_capture | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 | Optional |
| version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Workflow Tasks¶
VADR: Assembly Validation and Annotation
VADR (Viral Annotation DefineR) annotates and validates completed assembly files. For details on VADR default models/parameters, see the organism-specific parameters and logic section. It was primarily developed to test viral sequences to confirm they would be accepted to NCBI's GenBank data repository, but has found wide usage in general sequence validation and annotation.
As part of the analysis of the assemblies, more than 70 types of unexpected characteristics, also known as alerts, can be reported. Any identified alerts can be found in the vadr_alerts_list output. Fatal alerts indicate that the sample is unlikely to be accepted to GenBank; non-fatal alerts are designated as passing sequences, but may still require further investigation. A full description of the potential alerts can be found on the VADR README here, including details on how to allow sequencecs to pass despite having fatal alerts.
VADR Technical Details
| Links | |
|---|---|
| Task | task_vadr.wdl |
| Software Source Code | https://github.com/ncbi/vadr |
| Software Documentation | https://github.com/ncbi/vadr/wiki |
| Original Publication(s) | For SARS-CoV-2: Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR For non-SARS_CoV-2: VADR: validation and annotation of virus sequence submissions to GenBank |
Outputs¶
| Variable | Type | Description |
|---|---|---|
| vadr_alerts_list | File | A file containing all of the fatal alerts as determined by VADR |
| vadr_all_outputs_tar_gz | File | A .tar.gz file (gzip-compressed tar archive file) containing all outputs from the VADR command v-annotate.pl. This file must be uncompressed & extracted to see the many files within. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description of all files present within the archive. Useful when deeply investigating a sample's genome & annotations. |
| vadr_classification_summary_file | File | Per-sequence tabular classification file. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#explanation-of-sqc-suffixed-output-files for more complete description. |
| vadr_docker | String | Docker image used to run VADR |
| vadr_fastas_zip_archive | File | Zip archive containing all fasta files created during VADR analysis |
| vadr_feature_tbl_fail | File | 5 column feature table output for failing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
| vadr_feature_tbl_pass | File | 5 column feature table output for passing sequences. See https://github.com/ncbi/vadr/blob/master/documentation/formats.md#format-of-v-annotatepl-output-files for more complete description. |
| vadr_num_alerts | String | Number of fatal alerts as determined by VADR |
| vadr_update_analysis_date | String | Date of analysis |
| vadr_update_version | String | Version of the Public Health Bioinformatics (PHB) repository used |