Command-line Arguments
The inputs on this page reflect the parameters that are applicable for the command-line tool. To see the inputs required for tbp-parser
when run as part of the TheiaProk workflow series, please refer to the TheiaProk Inputs page.
Required Inputs¶
tbp-parser
is designed to run immediately after Jody Phelan’s TBProfiler tool. Only three inputs are required: the JSON file produced by TBProfiler
and the BAM and BAI file produced by TBProfiler
.
The JSON file contains information about the mutations detected in the sample: the quality, the type, and if that mutation confers resistance to an antimicrobial drug. The BAM file contains the alignment information for the sample and is needed for determining sequencing quality.
Parameter | Description |
---|---|
input_json | The path to the JSON file that was produced by TBProfiler |
input_bam | The path to the BAM file that was produced by TBProfiler |
BAM index file required
The BAM file must have the accompanying BAI file in the same directory. It must also be named exactly the same as the BAM file but ending with a .bai
suffix.
Optional Inputs¶
tbp-parser
can be customized with a number of optional input parameters. These parameters can be used to control the quality control thresholds, the text that appears in the reports, and the names of the output files. The following is a list of all the input parameters that can be used with tbp-parser
.
In addition to these arguments, tbp-parser
also has a -h, --help
argument that will out the list of possible arguments and their descriptions and a -v, --version
argument that will print out the version of tbp-parser
that is installed. Both of these commands exit the program after printing their output.
Configuration File¶
Instead of providing the input parameters on the command line, the ability to provide a configuration file in YAML format is available. The configuration file will overwrite all command-line arguments, except for the --verbose
and --debug
arguments. The configuration file can be provided using the --config
argument.
The configuration file can also be used to overwrite the global variables that are in use. The global variables available can be found in the Global Variables page.
To overwriite a variable, please use the following format in the configuration file. The variable names are case-sensitive.
# this variable is found in the globals.py file
GENES_FOR_LIMS:
- "rpoB"
- "inhA"
- "pncA"
- "inhA"
# although this variable can be set with an input parameter, it must be in uppercase here as it appears in the globals.py file
MIN_DEPTH: 15
# these command-line input parameters are not found in the globals.py file so they are indicated in lowercase
add_cs_lims: True
output_prefix: "Test"
coverage_regions: "/path/to/file"
# only variables that can be found in either globals.py or in the command-line arguments will be used
extra_variable: "This will be ignored"
Quality Control Arguments¶
These options determine the thresholds for quality control.
Short Version | Long Version | Description | Default Value |
---|---|---|---|
-d | --min_depth | The minimum depth of coverage required for a site to pass QC | 10 |
-c | --min_percent_coverage | The minimum percentage of a region that has depth above the threshold set by min_depth (used for a gene/locus to pass QC) |
100 |
-s | --min_read_support | The minimum read support for a mutation to pass QC | 10 |
-f | --min_frequency | The minimum frequency for a mutation to pass QC (0.1 -> 10%) | 0.1 |
-r | --coverage_regions | A BED file containing the regions to calculate percent coverage for | /data/tbdb-modified-regions.md |
Text Arguments¶
These options are used verbatim in the reports, or are used to name the output files.
Short Version | Long Version | Description | Default Value |
---|---|---|---|
-m | --sequencing_method | The sequencing method used to gerneate the data; used in the LIMS & Looker reports. Enclose in quotes if including a space | "Sequencing method not provided" |
-p | --operator | The operator who ran the analysis; used in the LIMS & Looker reports. Enclose in quotes if including a space | "Operator not provided" |
-o | --output_prefix | The prefix to use for the output files. Do not include any spaces | "tbp-parser" |
LIMS Arguments¶
These options are used to customize the LIMS report
Name | Description | Default Value |
---|---|---|
--add_cs_lims | Adds Cycloserine (CS) fields to the LIMS report | false |
tNGS-specific Arguments¶
These options are primarily used for tNGS data, although all frequency and read support arguments are compatible with WGS data.
Name | Description | Default Value |
---|---|---|
--tngs | Indicates that the input data was generated using the Deeplex + CDPH modified protocol. Turns on tNGS-specific global parameters | false |
--tngs_expert_regions | A BED file containing the regions to calculate coverage for expert rule regions. This is used to determine coverage quality in the regions where resistance-conferring mutations are found, or where a CDC expert rule is applied. This is not used for QC purposes | /data/tbdb-expert-regions.bed |
--rrs_frequency | The minimum frequency for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay | 0.1 |
--rrl_frequency | The minimum frequency for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay | 0.1 |
--rrs_read_support | The minimum read support for an rrs mutation to pass QC, as rrs has several problematic sites in the Deeplex tNGS assay | 10 |
--rrl_read_support | The minimum read support for an rrl mutation to pass QC, as rrl has several problematic sites in the Deeplex tNGS assay | 10 |
--rpob449_frequency | The minimum frequency for an rpoB mutation at protein position 449 to pass QC, as this site is problematic in the Deeplex tNGS assay | 0.1 |
--etha237_frequency | The minimum frequency for an ethA mutation at protein position 237 to pass QC, as this site is problematic in the Deeplex tNGS assay | 0.1 |
Logging Arguments¶
These options change the verbosity of the stdout
log
Name | Description | Default Value |
---|---|---|
--verbose | Increases the output verbosity to describe which stage of the analysis is currently running | false |
--debug | The highest level of output verbosity detailing every step of the analysis and logic implemented; overwrites --verbose | false |