Command-Line Interface

Command-line tools for barcode extraction and processing

Introduction

This notebook implements the command-line interface (CLI) for BarcodeSeqKit, allowing users to easily extract barcodes from BAM and FASTQ files without writing Python code.

Command-Line Argument Parser

Let’s define the argument parser for the command-line interface.


source

add_extract_arguments

 add_extract_arguments (parser:argparse.ArgumentParser)

*Add arguments for the extract command.

Args: parser: ArgumentParser to add arguments to*

Main Entry Points


source

run_cli

 run_cli (args:Optional[List[str]]=None)

*Handle the extract command.

Args: args: Command-line arguments

Returns: Exit code (0 for success, non-zero for error)*


source

main

 main ()

Main entry point for command-line execution.

# Testing the CLI with example arguments
# Note: In a real environment, these would be passed via the command line
# Define test arguments 
test_args = [
    "--bam", "../tests/test.bam",
    "--barcode5", "CTGACTCCTTAAGGGCC",
    "--barcode3", "TAACTGAGGCCGGC",
    "--output-prefix", "test_cli_out",
    "--output-dir", "../tests/cli_output",
    "--search-softclipped",
    "--verbose"
]

# Uncomment to run the test
return_code = run_cli(test_args)
print(f"CLI test returned: {return_code}")
2025-03-24 13:59:31,992 - BarcodeSeqKit - INFO - BAM file: ../tests/test.bam (498 reads)
2025-03-24 13:59:31,993 - BarcodeSeqKit - INFO - Output categories: ['barcode5_orientFR', 'barcode5_orientRC', 'barcode3_orientFR', 'barcode3_orientRC', 'noBarcode']
Input BAM file: ../tests/test.bam
Using 5' barcode with sequence: CTGACTCCTTAAGGGCC
Using 3' barcode with sequence: TAACTGAGGCCGGC
Saved configuration to ../tests/cli_output/test_cli_out_config.yaml
2025-03-24 13:59:32,029 - BarcodeSeqKit - INFO - First pass complete: classified 18 reads
2025-03-24 13:59:32,056 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode5_orientFR.bam
2025-03-24 13:59:32,068 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode5_orientRC.bam
2025-03-24 13:59:32,077 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode3_orientFR.bam
2025-03-24 13:59:32,085 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_barcode3_orientRC.bam
2025-03-24 13:59:32,093 - BarcodeSeqKit - INFO - Sorting and indexing ../tests/cli_output/test_cli_out_noBarcode.bam
Extraction complete
CLI test returned: 0
# Testing the CLI with only-stats
# Note: In a real environment, these would be passed via the command line
# Define test arguments 
test_args = [
    "--bam", "../tests/test.bam",
    "--barcode5", "CTGACTCCTTAAGGGCC",
    "--barcode3", "TAACTGAGGCCGGC",
    "--output-prefix", "test_cli_out",
    "--output-dir", "../tests/cli_output_only_stats",
    "--search-softclipped",
    "--only-stats",
    "--verbose"
]

# Uncomment to run the test
return_code = run_cli(test_args)
print(f"CLI test returned: {return_code}")
2025-03-24 14:02:05,410 - BarcodeSeqKit - INFO - BAM file: ../tests/test.bam (498 reads)
2025-03-24 14:02:05,412 - BarcodeSeqKit - INFO - Output categories: ['barcode5_orientFR', 'barcode5_orientRC', 'barcode3_orientFR', 'barcode3_orientRC', 'noBarcode']
Input BAM file: ../tests/test.bam
Using 5' barcode with sequence: CTGACTCCTTAAGGGCC
Using 3' barcode with sequence: TAACTGAGGCCGGC
Saved configuration to ../tests/cli_output_only_stats/test_cli_out_config.yaml
2025-03-24 14:02:05,456 - BarcodeSeqKit - INFO - First pass complete: classified 18 reads
Extraction complete
CLI test returned: 0

Command-Line Examples

Here are some example command-line invocations for reference:

Example 1: Extract barcodes from a BAM file

barcodeseqkit --bam tests/test.bam --barcode5 CTGACTCCTTAAGGGCC --output-prefix barcode_extraction --output-dir results

Example 2: Extract barcodes from paired FASTQ files

barcodeseqkit --fastq1 tests/test.1.fastq.gz --fastq2 tests/test.2.fastq.gz --barcode GCCTCGCGA --output-prefix fastq_results --output-dir results

Extract 5’ and 3’ barcodes from a directory containing FASTQ files

barcodeseqkit --fastq-dir ./fastq_dir --barcode5 ACTGACTG --barcode3 GTCAGTCA --output-prefix sample_barcoded --output-dir ./output --search-both-reads

Use a barcode configuration file

barcodeseqkit --bam tests/test.bam --barcode-config barcode_config.yaml --output-prefix config_extraction --output-dir results

Barcode Configuration YAML Example

Here’s an example of a barcode configuration YAML file:

barcodes:
  - sequence: CTGACTCCTTAAGGGCC
    location: 5
    name: 5prime
    description: 5' barcode for my experiment
  - sequence: TAACTGAGGCCGGC
    location: 3
    name: 3prime
    description: 3' barcode for my experiment

Conclusion

This notebook implements a command-line interface for BarcodeSeqKit, making it easy to use the library’s functionality without writing Python code. The CLI provides access to all the major features of the library, including barcode extraction from BAM and FASTQ files, customization options for searching and output, and comprehensive logging.