Skip to content

Clair3 Variants

Quick Facts

Workflow Type Applicable Kingdom Last Known Changes Command-line Compatibility Workflow Level
Phylogenetic Construction Any taxa PHB v3.0.0 Yes Sample-level

Clair3_Variants_ONT

The Clair3_Variants workflow processes Oxford Nanopore Technologies (ONT) sequencing data to identify genetic variations compared to a reference genome. It combines minimap2's long-read alignment capabilities with Clair3's deep learning-based variant calling, designed specifically for ONT data characteristics. The workflow first aligns raw reads to a reference genome using ONT-optimized parameters, processes these alignments into sorted and indexed BAM files, and then employs Clair3's specialized models to detect variants including single nucleotide polymorphisms (SNPs) and insertions/deletions (indels). If enabled, the workflow can also identify longer indels and generate genome-wide variant calls in gVCF format for downstream analysis.

Clair3_Variants Workflow Diagram

Clair3_Variants Workflow Diagram

Example Use Cases

  • Variant Discovery: Identify genetic variations in ONT sequencing data compared to a reference genome
  • SNP and Indel Detection: Accurately detect both small variants and longer indels
  • Population Studies: Generate standardized variant calls suitable for population-level analyses

Supported Clair3 Models

Model Chemistry Source
r941_prom_sup_g5014 R9.4.1 Clair3 1.0.10 Release
r941_prom_hac_g360+g422 R9.4.1 Clair3 1.0.10 Release
r941_prom_hac_g238 R9.4.1 Clair3 1.0.10 Release
r1041_e82_400bps_sup_v500 R10.4.1 nanoporetech/rerio
r1041_e82_400bps_hac_v500 R10.4.1 nanoporetech/rerio
r1041_e82_400bps_sup_v410 R10.4.1 nanoporetech/rerio
r1041_e82_400bps_hac_v410 R10.4.1 nanoporetech/rerio
ont Various Legacy (Recommended for Guppy3 and Guppy4)
ont_guppy2 Various Legacy (For Guppy2 data)
ont_guppy5 Various Legacy (For Guppy5 data)

The latest models for ONT are downloaded from the nanoporetech/rerio github. Please let us know if there is a model not included you would like to see added.

Inputs

Note on Haploid Settings

Several parameters are set by default for haploid genome analysis:

  • clair3_disable_phasing is set to true since phasing is not relevant for haploid genomes
  • clair3_include_all_contigs is set to true to ensure complete genome coverage
  • clair3_enable_haploid_precise is set to true to only consider homozygous variants (1/1), which is appropriate for haploid genomes
Terra Task Name Variable Type Description Default Value Terra Status
clair3_variants_ont clair3_cpu Int Number of CPUs to use 4 Optional
clair3_variants_ont clair3_disable_phasing Boolean Disable whatshap phasing true Optional
clair3_variants_ont clair3_disk_size Int Disk size in GB 100 Optional
clair3_variants_ont clair3_docker String Docker container for task us-docker.pkg.dev/general-theiagen/staphb/clair3:1.0.10 Optional
clair3_variants_ont clair3_enable_gvcf Boolean Output gVCF format false Optional
clair3_variants_ont clair3_enable_haploid_precise Boolean Enable haploid precise calling, only 1/1 is considered as a variant true Optional
clair3_variants_ont clair3_enable_long_indel Boolean Enable long indel calling false Optional
clair3_variants_ont clair3_include_all_contigs Boolean Call variants on all contigs, should always be true for non-human samples true Optional
clair3_variants_ont clair3_memory Int Memory allocation in GB 8 Optional
clair3_variants_ont clair3_model String Model name for variant calling (see supported models for available options) r941_prom_hac_g360+g422 Optional
clair3_variants_ont clair3_variant_quality Int Minimum variant quality score 2 Optional
clair3_variants_ont read1 File ONT sequencing reads in FASTQ format Required
clair3_variants_ont reference_genome_file File Reference genome in FASTA format Required
clair3_variants_ont samplename String Name of Samples Required
minimap2 cpu Int Number of CPUs to allocate to the task 2 Optional
minimap2 disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
minimap2 docker String Docker image used for this task. "us-docker.pkg.dev/general-theiagen/staphb/minimap2:2.22" Optional
minimap2 memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
minimap2 query2 File Internal component. Do not modify None Do not modify, Optional
sam_to_sorted_bam cpu Int Number of CPUs to allocate to the task 2 Optional
sam_to_sorted_bam disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
sam_to_sorted_bam docker String Docker image used for this task. "us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17" Optional
sam_to_sorted_bam memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
samtools_faidx cpu Int Number of CPUs to allocate to the task 1 Optional
samtools_faidx disk_size Int Amount of storage (in GB) to allocate to the task 100 Optional
samtools_faidx docker String Docker image used for this task. "us-docker.pkg.dev/general-theiagen/staphb/samtools:1.17" Optional
samtools_faidx memory Int Amount of memory/RAM (in GB) to allocate to the task 8 Optional
version_capture docker String Docker container for versioning us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0 Optional
version_capture timezone String Set the time zone to get an accurate date of analysis (uses UTC by default) Optional

Workflow Tasks

minimap2: Read Alignment

minimap2 is used with long read specific parameters (-L --cs --MD flags) to align ONT reads to the reference genome. These specialized parameters are essential for proper handling of long read error profiles, generation of detailed alignment information, and improved mapping accuracy for long reads.

minimap2 Technical Details

Links
Task task_minimap2.wdl
Software Source Code minimap2 on GitHub
Software Documentation minimap2
Original Publication(s) Minimap2: pairwise alignment for nucleotide sequences
samtools: BAM Processing

The bam processing step aligns files through several coordinate-based steps to prepare for variant calling. The task converts SAM format to BAM, sorts the BAM file by coordinate, and creates a BAM index file. This processed BAM is required for Clair3's variant calling pipeline.

samtools Technical Details

Links
Task task_samtools.wdl
Software Source Code samtools on GitHub
Software Documentation samtools
Original Publication(s) The Sequence Alignment/Map format and SAMtools
Twelve Years of SAMtools and BCFtools
samtools faidx: Reference Genome Indexing

samtools faidx creates necessary index files for the reference. This indexing step is essential for enabling efficient random access to the reference sequence during variant calling.

samtools Technical Details

Links
Task task_samtools.wdl
Software Source Code samtools on GitHub
Software Documentation samtools
Original Publication(s) The Sequence Alignment/Map format and SAMtools
Twelve Years of SAMtools and BCFtools
Clair3: Variant Calling

Clair3 performs deep learning-based variant detection using a multi-stage approach. The process begins with pileup-based calling for initial variant identification, followed by full-alignment analysis for comprehensive variant detection. Results are merged into a final high-confidence call set.

The variant calling pipeline employs specialized neural networks trained on ONT data to accurately identify: - Single nucleotide variants (SNVs) - Small insertions and deletions (indels) - Structural variants

Clair3 Technical Details

Links
Task task_clair3.wdl
Software Source Code Clair3 on GitHub
Software Documentation Clair3 Documentation
Original Publication(s) Symphonizing pileup and full-alignment for deep learning-based long-read variant calling

Outputs

Variable Type Description
aligned_bam File Sorted BAM file containing the minimap2 alignments of reads to the reference genome
aligned_bai File Index file for the aligned BAM
aligned_fai File Index file for the reference genome
clair3_docker_image String Version of the Docker container used for Clair3 variant calling
clair3_model_used String Name of the Clair3 model used for variant calling
clair3_variants_vcf File Final merged VCF file containing high-confidence variant calls, combining results from both pileup and full-alignment approaches
clair3_variants_gvcf File Optional genome VCF file containing information about all genomic positions, including non-variant sites
clair3_variants_wf_version String Version of the PHB workflow used
clair3_version String Clair3 Version being used
samtools_version String Version of samtools used for BAM processing