Coverage report
The coverage report lists every gene and its percent gene coverage over a minimum depth (default: 10) relative to the H37Rv genome; regions are determined by using the --coverage_bed input file. This report is useful for determining whether any genes of interest have low coverage that may impact the reliability of resistance calls.
Coverage report columns¶
ERR columns only appear if an --err_coverage_bed input file is provided. Please see the Inputs for more details.
| Column name | Explanation |
|---|---|
| sample_name | The name of the sample |
| locus_tag | The locus tag for the region |
| gene_name | The name of the gene |
| percent_coverage | The percentage of a region (specified by the --coverage_bed input file) that has a read depth over the minimum value (default: 10; user-customizable by altering --min_depth) |
| average_depth | The average read depth across that region (specified by the --coverage_bed input file) |
| err_percent_coverage | The percentage of the region (specified by the --err_coverage_bed input file) that has a read depth over the minimum value (default: 10; user-customizable by altering --min_depth)Only appears if --err_coverage_bed is provided |
| err_average_depth | The average read depth across a region (specified by the --err_coverage_bed input file)Only appears if --err_coverage_bed is provided |
| qc_warning | Indicates if any deletions were identified in the gene which may contribute to lower than expected coverage |
Locus vs Target coverage reports¶
Each line/entry in your BED file defines a BedRecord — a stretch of the genome where tbp-parser counts reads and calculates coverage. How you define these regions directly controls what gets measured and what appears in your coverage reports.
- The locus coverage report aggregates all
BedRecords that share the same locus tag and considers them as a single unit. - The target coverage report keeps each
BedRecordas its own separate unit.
Which report gets generated?
If any BedRecords share the same locus tag, both the locus and target coverage reports will be generated. Otherwise, only the locus coverage report will be generated.
QC uses locus-level coverage
All coverage-based QC determinations use the locus-level breadth of coverage.
For example, if primer 1 for katG has poor coverage but primer 2 for katG has sufficient coverage, QC checks use the overall ratio of the aggregated coverage, not the individual primer regions.
Example 1: one target region in BED file¶
When a BED file contains a single BedRecord (one gene name for one locus tag),
only the locus coverage report is produced:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv0005 gyrB 100.000 542.310
→ percent_coverage = (num_positions ≥ min_depth) / num_positions
→ average_depth = (sum of reads at every position) / num_positions
Example 2: split regions¶
A gene can be covered by multiple non-overlapping target regions (common in tNGS), resulting in multiple BedRecords with the same locus tag but different gene names and a gap between them:
BED file:
Chrom 100 200 Rv0667 rpoB_1
Chrom 300 400 Rv0667 rpoB_2
├── rpoB_1 ──┤ ├── rpoB_2 ──┤
100 200 300 400
↑
(gap: 201–299)
Because there are more targets (2) than loci (1), both reports are generated:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv0667 rpoB 95.000 280.500
↑ aggregated across positions 100–200 AND 300–400 (202 total positions)
target_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv0667 rpoB_1 100.000 350.100
sample01 Rv0667 rpoB_2 90.000 210.900
↑ each `BedRecord` measured independently
Note that the gap (positions 201–299) is not measured at all and no reads are counted there. The locus report aggregates only the positions defined by the BedRecords (100–200 and 300–400), meaning locus-level breadth of coverage reflects only the regions defined in the BED file, not the entire gene.
Example 3: overlapping regions¶
A gene can be covered by multiple overlapping target regions (common in tNGS), resulting in multiple BedRecords with the same locus tag but different gene names that overlap:
BED file:
Chrom 1 200 Rv1908c katG1
Chrom 150 400 Rv1908c katG2
├──── katG1 ────┤
1 200
├────── katG2 ──────┤
150 400
Because there are more targets (2) than loci (1), both reports are generated:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG 100.000 3471.463
↑ aggregated across ALL positions 1–400
target_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG1 100.000 2202.284
sample01 Rv1908c katG2 100.000 3556.657
↑ each `BedRecord` measured independently (positions 1–200 for katG1, 150–400 for katG2)
The locus coverage report combines all reads from katG1 and katG2 into a single measurement across positions 1–400. The target report calculates coverage for each BedRecord separately.
Example 4: resolving overlapping regions¶
In Example 3 above, katG1 and katG2 share an overlapping region (positions 150–200). Without overlap resolution, a read spanning that region gets counted in both BedRecords — inflating the locus coverage when they're aggregated together.
With --resolve_overlapping_regions enabled, tbp-parser identifies reads that only appear in non-overlapping portions of each BedRecord and uses those as a whitelist:
katG1: ├──────────────────┤
1 200
├ overlap ┤
katG2: ├───────────────────┤
150 400
Non-overlapping katG1: 1 ─── 149 (unique to katG1)
Overlap zone: 150 ─── 200 (shared)
Non-overlapping katG2: 201 ─── 400 (unique to katG2)
For each BedRecord, tbp-parser will:
- Find all reads that appear in the non-overlapping portion of that
BedRecord - Use those read names as a whitelist
- Reanalyze all positions (including the overlap zone), keeping only the whitelisted reads
This prevents double-counting when the BedRecords are aggregated together in the locus coverage report. The impact is visible in both reports:
WITHOUT --resolve_overlapping_regions:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG 100.000 3471.463 ← inflated by double-counted reads
target_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG1 100.000 2202.284 ← inflated by double-counted reads
sample01 Rv1908c katG2 100.000 3556.657 ← inflated by double-counted reads
WITH --resolve_overlapping_regions:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG 100.000 2683.554 ← only whitelisted reads
target_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv1908c katG1 100.000 2068.351 ← only whitelisted reads
sample01 Rv1908c katG2 100.000 2766.803 ← only whitelisted reads
Trade-off: short reads and large overlaps
Note that overlap resolution may reduce coverage in both reports. Reads that fall entirely within an overlapping region, without extending into any unique region, are excluded because they cannot be confidently assigned to a specific BedRecord. This means that larger overlaps relative to the read length result in more reads being excluded. This is a conservative and more accurate approach for handling tNGS data with overlapping regions. Consider this trade-off when deciding whether to enable --resolve_overlapping_regions.
A visual example
This example shows how the reads associated with each BedRecord are whitelisted based on their presence in the non-overlapping regions when --resolve_overlapping_regions is enabled.
QC warnings in coverage reports¶
A deletion is considered valid if:
- It is a deletion within the ORF (open reading frame)
- It passes positional QC (sufficient depth, frequency, and read support)
- Its genomic coordinates overlap with the coverage region (either locus or target) being measured
Valid deletions are flagged and reported in the qc_warning column and may explain why a gene's coverage may be lower than expected.
How --err_coverage_bed affects deletion warnings
The qc_warning column reports deletions differently depending on whether ERR coverage is active:
- Without
--err_coverage_bed: All valid deletions in the full target/locus region are reported - With
--err_coverage_bed: Only valid deletions that fall within the ERR region are reported
A deletion that exists in the full locus/target region but falls outside the ERR region will not appear in qc_warning when an --err_coverage_bed file is provided.
The --use_err_for_qc flag does not affect the coverage report
The --use_err_for_qc flag controls whether ERR regions are used in place of full regions for locus QC decisions (e.g., determining "Insufficient Coverage"). The coverage report's qc_warning column is controlled solely by whether --err_coverage_bed is provided — if it is, only deletions within the ERR region are reported regardless of the --use_err_for_qc setting.
Deletions are shown only for the region they overlap. Consider rpoB split across two BedRecords, with two deletions:
c.727_728delGTinsAC— falls within rpoB_1c.1291_1292delAGinsCC— falls within rpoB_2
In the locus coverage report, all deletions across both BedRecords are aggregated into a single warning for the locus:
locus_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv0667 rpoB 85.000 280.500 Contains valid deletion(s) in full locus region: c.727_728delGTinsAC; c.1291_1292delAGinsCC
In the target coverage report, each deletion only appears on the specific BedRecord it overlaps with:
target_coverage_report.csv:
sample_name locus_tag gene_name percent_coverage average_depth qc_warning
sample01 Rv0667 rpoB_1 90.000 350.100 Contains valid deletion(s) in full target region: c.727_728delGTinsAC
sample01 Rv0667 rpoB_2 80.000 210.900 Contains valid deletion(s) in full target region: c.1291_1292delAGinsCC
The target coverage report lets you see exactly which region contains which deletion, while the locus coverage report gives the combined picture.
Customizing column names¶
To overwrite any of the output column names or text in the coverage reports, please use the following format in a configuration file or use the command-line parameter --find_and_replace:
FIND_AND_REPLACE:
"gene_name": "My_Gene_Column"
"percent_coverage": "My_Percent_Coverage_Column"
...
Please note that this will rename every instance of that text in all output reports (every instance of "gene_name" will be renamed to "My_Gene_Column" in all output files, etc.).
