Skip to content

Getting Started

Installation

Docker

We highly recommend using the following Docker iamge to run tbp-parser:

docker pull us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.3.0 #(1)!
  1. We host our Docker images on the Google Artifact Registry so that they are always availble for usage.

The entrypoint for this Docker image is the tbp-parser help message. To run this container interactively, you can use the following command:

docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/tbp-parser:2.1.1

# Once inside the container interactively, you can run the tbp-parser tool
python3 /tbp-parser/tbp_parser/tbp_parser.py -v
# 2.3.0

Locally with Python

tbp-parser is not yet available with pip or conda. To run tbp-parser in your local command-line environment, install the following dependencies:

  • python3
  • pandas >= 1.4.2
  • importlib_resources
  • samtools

After installation of these dependencies, download and extract the latest release of tbp-parser and run the script with python3.

Usage

Example Usage

This shows how the script can be run if used inside the Docker container provided above.

python3 /tbp-parser/tbp_parser/tbp_parser.py \
    /path/to/data/tbprofiler_output.json \
    /path/to/data/tbprofiler_output.bam \
    -o "example" \
    --min_depth 12 \
    --min_frequency 0.9 \
    --sequencing_method "Illumina NextSeq" \
    --operator "John Doe" 

Please note that the BAM file must have the accompanying BAI file in the same directory.

Help Message

The help message printed by tbp-parser is quite extensive, but has a lot of useful information regarding the input parameters. Here is the entire message in full. You can find more information regarding these inputs in the Inputs section.

usage: python3 /tbp-parser/tbp_parser/tbp_parser.py [-h|-v] <input_json> <input_bam> [<args>]

Parses Jody Phelon's TB-Profiler JSON output into four files:
- a Laboratorian report,
- a LIMS report
- a Looker report, and
- a coverage report

positional arguments:
  input_json
          the JSON file produced by TBProfiler
  input_bam
          the BAM file produced by TBProfiler

optional arguments:
  -h, --help
          show this help message and exit
  -v, --version
          show program's version number and exit

quality control arguments:
  options that determine what passes QC

  -d, --min_depth
          the minimum depth of coverage for a site to pass QC
          default=10
  -c, --min_percent_coverage
          the minimum percentage of a region that has depth above the threshold set by min_depth
            (used for a gene/locus to pass QC)
          default=100
  -s, --min_read_support
          the minimum read support for a mutation to pass QC
          default=10
  -f, --min_frequency
          the minimum frequency for a mutation to pass QC (0.1 -> 10%)
          default=0.1
  -r, --coverage_regions
          the BED file containing the regions to calculate percent coverage for
          default=data/tbdb-modified-regions.bed

text arguments:
  arguments that are used verbatim in the reports or to name the output files

  -m, --sequencing_method
          the sequencing method used to generate the data; used in the LIMS & Looker reports
          ** Enclose in quotes if includes a space
          default="Sequencing method not provided"
  -p, --operator
          the operator who ran the sequencing; used in the LIMS & Looker reports
          ** Enclose in quotes if includes a space
          default="Operator not provided"
  -o, --output_prefix
          the output file name prefix
          ** Do not include any spaces

tNGS-specific arguments:
  options that are primarily used for tNGS data
  (all frequency arguments are compatible with WGS data)

  --tngs
          indicates that the input data was generated using Deeplex + CDPH modified protocol
          Turns on tNGS-specific global parameters
  --tngs_expert_regions
          the BED file containing the regions to calculate coverage for expert rule regions
            (used to determine coverage quality in the regions where resistance-conferring
            mutations are found, or where a CDC expert rule is applied; not for QC)
          default=data/tngs-expert-rule-regions.bed
  --rrs_frequency
          the minimum frequency for an rrs mutation to pass QC
            (rrs has several problematic sites in the Deeplex tNGS assay)
          default=0.1
  --rrl_frequency
          the minimum frequency for an rrl mutation to pass QC
            (rrl has several problematic sites in the Deeplex tNGS assay)
          default=0.1
  --rpob449_frequency
          the minimum frequency for an rpoB mutation at protein position 449 to pass QC
            (this is a problematic site in the Deeplex tNGS assay)
          default=0.1
  --etha237_frequency
          the minimum frequency for an ethA mutation at protein position 237 to pass QC
            (this is a problematic site in the Deeplex tNGS assay)
          default=0.1

logging arguments:
  options that change the verbosity of the stdout log

  --verbose
          increase output verbosity
  --debug
          increase output verbosity to debug; overwrites --verbose

Please contact support@theiagen.com with any questions