SRA_Fetch¶

Quick Facts¶

Workflow Type	Applicable Kingdom	Last Known Changes	Command-line Compatibility	Workflow Level
Data Import	Any taxa	PHB v2.2.0	Yes	Sample-level

SRA_Fetch_PHB¶

The SRA_Fetch workflow downloads sequence data from NCBI's Sequence Read Archive (SRA). It requires an SRA run accession then populates the associated read files to a Terra data table.

Read files associated with the SRA run accession provided as input are copied to a Terra-accessible Google bucket. Hyperlinks to those files are shown in the "read1" and "read2" columns of the Terra data table.

Inputs¶

This workflow runs on the sample level.

Terra Task Name	Variable	Type	Description	Default Value	Terra Status
fetch_sra_to_fastq	sra_accession	String	SRA, ENA, or DRA accession number		Required
fetch_sra_to_fastq	cpu	Int	Number of CPUs to allocate to the task	2	Optional
fetch_sra_to_fastq	disk_size	Int	Amount of storage (in GB) to allocate to the task	100	Optional
fetch_sra_to_fastq	docker_image	String	The Docker container to use for the task	"us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0"	Optional
fetch_sra_to_fastq	fastq_dl_options	String	Additional parameters to pass to fastq_dl from here	"--provider sra"	Optional
fetch_sra_to_fastq	memory	Int	Amount of memory/RAM (in GB) to allocate to the task	8	Optional

The only required input for the SRA_Fetch workflow is an SRA run accession beginning "SRR", an ENA run accession beginning "ERR", or a DRA run accession which beginning "DRR".

Please see the NCBI Metadata and Submission Overview for assistance with identifying accessions. Briefly, NCBI-accessioned objects have the following naming scheme:

STUDY	SRP#
SAMPLE	SRS#
EXPERIMENT	SRX#
RUN	SRR#

Outputs¶

Read data are available either with full base quality scores (SRA Normalized Format) or with simplified quality scores (SRA Lite). The SRA Normalized Format includes full, per-base quality scores, whereas base quality scores have been simplified in SRA Lite files. This means that all quality scores have been artificially set to Q-30 or Q3. More information about these files can be found here.

Given the lack of usefulness of SRA Lite formatted FASTQ files, we try to avoid these by selecting as provided SRA directly (SRA-Lite is more probably to be the file synced to other repositories), but some times downloading these files is unavoidable. To make the user aware of this, a warning column is present that is populated when an SRA-Lite file is detected.

Variable	Type	Description	Production Status
read1	File	File containing the forward reads	Always produced
read2	File	File containing the reverse reads (not availablae for single-end or ONT data)	Produced only for paired-end data
fastq_dl_date	String	The date of download	Always produced
fastq_dl_docker	String	The docker used	Always produced
fastq_dl_metadata	File	File containing metadata of the provided accession such as submission_accession, library_selection, instrument_platform, among others	Always produced
fastq_dl_version	String	Fastq_dl version used	Always produced
fastq_dl_warning	String	This warning field is populated if SRA-Lite files are detected. These files contain all quality encoding as Phred-30 or Phred-3.	Depends on internal workflow logic

References¶

This workflow relies on fastq-dl, a very handy bioinformatics tool by Robert A. Petit III