CZGenEpi_Prep¶

Quick Facts¶

Workflow Type	Applicable Kingdom	Last Known Changes	Command-line Compatibility	Workflow Level
Phylogenetic Construction	Monkeypox virus, SARS-CoV-2, Viral	v1.3.0	No	Set-level

CZGenEpi_Prep_PHB¶

The CZGenEpi_Prep workflow prepares data for upload to the Chan Zuckerberg GEN EPI platform, where phylogenetic trees and additional data processing can occur. This workflow extracts the necessary metadata fields from your Terra table.

Inputs¶

In order to enable customization for where certain fields should be pulled from the Terra table, the user can specify different column names in the appropriate location. For example, if the user wants to use the "clearlabs_fasta" column for the assembly file instead of the default "assembly_fasta" column, they can write "clearlabs_fasta" for the assembly_fasta_column_name optional variable.

Variables with both the "Optional" and "Required" tag require the column (regardless of name) to be present in the data table.

This workflow runs on the set level.

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
czgenepi_prep	sample_names	Array[String]	The array of sample ids you want to prepare for CZ GEN EPI		Required
czgenepi_prep	terra_project_name	String	The name of the Terra project where the data is hosted		Required
czgenepi_prep	terra_table_name	String	The name of the Terra table where the data is hosted		Required
czgenepi_prep	terra_workspace_name	String	The name of the Terra workspace where the data is hosted		Required
czgenepi_prep	assembly_fasta_column_name	String	The column name where the sample's assembly file can be found	assembly_fasta	Optional, Required
czgenepi_prep	collection_date_column_name	String	The column name where the sample's collection date can be found	collection_date	Optional, Required
czgenepi_prep	continent_column_name	String	The column name where the sample's originating continent can be found	continent	Optional, Required
czgenepi_prep	country_column_name	String	The column name where the sample's originating country can be found	country	Optional, Required
czgenepi_prep	county_column_name	String	The column name where the samples' originating county can be found	county	Optional, Required
czgenepi_prep	private_id_column_name	String	The column name where the Private ID for the sample can be found	terra_table_name_id	Optional, Required
czgenepi_prep	state_column_name	String	The column name where the sample's originating state can be found	state	Optional, Required
czgenepi_prep	genbank_accession_column_name	String	The column name where the genbank accession for the sample can be found	genbank_accession	Optional
czgenepi_prep	is_private	Boolean	Sets whether sample status is provate, or not	TRUE	Optional
czgenepi_prep	organism	String	The organism for data preparation. Options: "mpox" or "sars-cov-2"	sars-cov-2	Optional
czgenepi_prep	sequencing_date_column_name	String	The column name where the sample's sequencing data can be found	sequencing_date	Optional
czgenepi_wrangling	cpu	Int	Number of CPUs to allocate to the task	1	Optional
czgenepi_wrangling	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
czgenepi_wrangling	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-08-2	Optional
czgenepi_wrangling	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional
download_terra_table	cpu	Int	Number of CPUs to allocate to the task	1	Optional
download_terra_table	disk_size	Int	Amount of storage (in GB) to allocate to the task	10	Optional
download_terra_table	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-06-21	Optional
download_terra_table	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	2	Optional
version_capture	docker	String	The Docker container to use for the task	us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0	Optional
version_capture	timezone	String	Set the time zone to get an accurate date of analysis (uses UTC by default)		Optional

Outputs¶

The concatenated_czgenepi_fasta and concatenated_czgenepi_metadata files can be uploaded directly to CZ GEN EPI without any adjustments.

Variable	Type	Description
concatenate_czgenepi_fasta	File	The concatenated fasta file with the renamed headers (the headers are renamed to account for clearlabs data which has unique headers)
concatenate_czgenepi_metadata	File	The concatenated metadata that was extracted from the terra table using the specified columns
czgenepi_prep_analysis_date	String	The date the workflow was run
czgenepi_prep_version	String	The version of PHB the workflow is in

References¶

CZ GEN EPI Help Center "Uploading Data" https://help.czgenepi.org/hc/en-us/articles/6160372401172-Uploading-data