CZGenEpi_Prep¶
Quick Facts¶
Workflow Type | Applicable Kingdom | Last Known Changes | Command-line Compatibility | Workflow Level |
---|---|---|---|---|
Phylogenetic Construction | Viral | PHB v1.3.0 | No | Set-level |
CZGenEpi_Prep_PHB¶
The CZGenEpi_Prep workflow prepares data for upload to the Chan Zuckerberg GEN EPI platform, where phylogenetic trees and additional data processing can occur. This workflow extracts the necessary metadata fields from your Terra table.
Inputs¶
In order to enable customization for where certain fields should be pulled from the Terra table, the user can specify different column names in the appropriate location. For example, if the user wants to use the "clearlabs_fasta" column for the assembly file instead of the default "assembly_fasta" column, they can write "clearlabs_fasta" for the assembly_fasta_column_name
optional variable.
Variables with both the "Optional" and "Required" tag require the column (regardless of name) to be present in the data table.
This workflow runs on the set level.
Terra Task Name | Variable | Type | Description | Default Value | Terra Status |
---|---|---|---|---|---|
czgenepi_prep | sample_names | Array[String] | The array of sample ids you want to prepare for CZ GEN EPI | Required | |
czgenepi_prep | terra_table_name | String | The name of the Terra table where the data is hosted | Required | |
czgenepi_prep | terra_project_name | String | The name of the Terra project where the data is hosted | Required | |
czgenepi_prep | terra_workspace_name | String | The name of the Terra workspace where the data is hosted | Required | |
download_terra_table | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 10 | Optional |
download_terra_table | docker | String | The Docker container to use for the task | quay.io/theiagen/terra-tools:2023-06-21 | Optional |
download_terra_table | disk_size | String | The size of the disk used when running this task | 1 | Optional |
download_terra_table | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
czgenepi_prep | assembly_fasta_column_name | String | The column name where the sample's assembly file can be found | assembly_fasta | Optional, Required |
czgenepi_prep | county_column_name | String | The column name where the samples' originating county can be found | county | Optional, Required |
czgenepi_prep | organism | String | The organism for data preparation. Options: "mpox" or "sars-cov-2" | sars-cov-2 | Optional |
czgenepi_prep | is_private | Boolean | Sets whether sample status is provate, or not | true | Optional |
czgenepi_prep | genbank_accession_column_name | String | The column name where the genbank accession for the sample can be found | genbank_accession | Optional |
czgenepi_prep | country_column_name | String | The column name where the sample's originating country can be found | country | Optional, Required |
czgenepi_prep | collection_date_column_name | String | The column name where the sample's collection date can be found | collection_date | Optional, Required |
czgenepi_prep | state_column_name | String | The column name where the sample's originating state can be found | state | Optional, Required |
czgenepi_prep | continent_column_name | String | The column name where the sample's originating continent can be found | continent | Optional, Required |
czgenepi_prep | sequencing_date_column_name | String | The column name where the sample's sequencing data can be found | sequencing_date | Optional |
czgenepi_prep | private_id_column_name | String | The column name where the Private ID for the sample can be found | terra_table_name_id | Optional, Required |
czgenepi_wrangling | memory | Int | Amount of memory/RAM (in GB) to allocate to the task | 8 | Optional |
czgenepi_wrangling | docker | String | The Docker container to use for the task | us-docker.pkg.dev/general-theiagen/theiagen/terra-tools:2023-08-08-2 | Optional |
czgenepi_wrangling | disk_size | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional |
czgenepi_wrangling | cpu | Int | Number of CPUs to allocate to the task | 1 | Optional |
version_capture | docker | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional |
version_capture | timezone | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | Optional |
Outputs¶
The concatenated_czgenepi_fasta and concatenated_czgenepi_metadata files can be uploaded directly to CZ GEN EPI without any adjustments.
Variable | Type | Description |
---|---|---|
concatenate_czgenepi_fasta | File | The concatenated fasta file with the renamed headers (the headers are renamed to account for clearlabs data which has unique headers) |
concatenate_czgenepi_metadata | File | The concatenated metadata that was extracted from the terra table using the specified columns |
czgenepi_prep_version | String | The version of PHB the workflow is in |
czgenepi_prep_analysis_date | String | The date the workflow was run |
References¶
CZ GEN EPI Help Center "Uploading Data" https://help.czgenepi.org/hc/en-us/articles/6160372401172-Uploading-data