Command-line tools for barcode extraction and processing
Introduction
This notebook implements the command-line interface (CLI) for BarcodeSeqKit, allowing users to easily extract barcodes from BAM and FASTQ files without writing Python code.
Command-Line Argument Parser
Let’s define the argument parser for the command-line interface.
# Testing the CLI with example arguments# Note: In a real environment, these would be passed via the command line# Define test arguments test_args = ["--bam", "../tests/test.bam","--barcode5", "CTGACTCCTTAAGGGCC","--barcode3", "TAACTGAGGCCGGC","--output-prefix", "test_cli_out","--output-dir", "../tests/cli_output","--search-softclipped","--verbose"]# Uncomment to run the testreturn_code = run_cli(test_args)print(f"CLI test returned: {return_code}")
Input BAM file: ../tests/test.bam
Using 5' barcode with sequence: CTGACTCCTTAAGGGCC
Using 3' barcode with sequence: TAACTGAGGCCGGC
Saved configuration to ../tests/cli_output/test_cli_out_config.yaml
2025-03-24 13:59:32,029 - BarcodeSeqKit - INFO - First pass complete: classified 18 reads
2025-03-24 13:59:32,056 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode5_orientFR.bam
2025-03-24 13:59:32,068 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode5_orientRC.bam
2025-03-24 13:59:32,077 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode3_orientFR.bam
2025-03-24 13:59:32,085 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode3_orientRC.bam
2025-03-24 13:59:32,093 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_noBarcode.bam
Extraction complete
CLI test returned: 0
# Testing the CLI with only-stats# Note: In a real environment, these would be passed via the command line# Define test arguments test_args = ["--bam", "../tests/test.bam","--barcode5", "CTGACTCCTTAAGGGCC","--barcode3", "TAACTGAGGCCGGC","--output-prefix", "test_cli_out","--output-dir", "../tests/cli_output_only_stats","--search-softclipped","--only-stats","--verbose"]# Uncomment to run the testreturn_code = run_cli(test_args)print(f"CLI test returned: {return_code}")
Input BAM file: ../tests/test.bam
Using 5' barcode with sequence: CTGACTCCTTAAGGGCC
Using 3' barcode with sequence: TAACTGAGGCCGGC
Saved configuration to ../tests/cli_output_only_stats/test_cli_out_config.yaml
2025-03-24 14:02:05,456 - BarcodeSeqKit - INFO - First pass complete: classified 18 reads
Extraction complete
CLI test returned: 0
Command-Line Examples
Here are some example command-line invocations for reference:
Here’s an example of a barcode configuration YAML file:
barcodes:-sequence: CTGACTCCTTAAGGGCClocation:5name: 5primedescription: 5' barcode for my experiment-sequence: TAACTGAGGCCGGClocation:3name: 3primedescription: 3' barcode for my experiment
Conclusion
This notebook implements a command-line interface for BarcodeSeqKit, making it easy to use the library’s functionality without writing Python code. The CLI provides access to all the major features of the library, including barcode extraction from BAM and FASTQ files, customization options for searching and output, and comprehensive logging.