Counting Module#

Overview#

wasp2-count counts reads supporting reference and alternate alleles at variant positions in BAM files.

It provides two commands:

  • count-variants for bulk data

  • count-variants-sc for single-cell data with CB-tagged barcodes

Bulk Counting#

Basic usage:

wasp2-count count-variants sample.bam variants.vcf.gz --out_file counts.tsv

With sample filtering and region annotation:

wasp2-count count-variants \
  sample.bam \
  variants.vcf.gz \
  --samples SAMPLE1 \
  --region genes.gtf \
  --out_file counts.tsv

Supported region files:

  • BED

  • MACS2 narrowPeak / broadPeak

  • GTF

  • GFF3

For GTF/GFF3 inputs, WASP2 derives interval annotations from feature rows and defaults to gene features when present.

Useful options:

  • --samples / -s: select het sites for one or more samples

  • --region / -r: restrict/annotate variants by overlapping regions

  • --gene_feature: choose the GTF/GFF3 feature type

  • --gene_attribute: choose the GTF/GFF3 attribute used as the feature ID

  • --gene_parent: choose the parent/grouping attribute for gene annotations

  • --use_region_names: prefer region names instead of coordinate strings

  • --include-indels: count indels in addition to SNPs

Output columns always include:

  • chrom

  • pos or pos0 / pos depending on input path

  • ref

  • alt

  • ref_count

  • alt_count

  • other_count

When sample filtering is active, genotype columns are included. When region annotation is active, region or gene columns are included as well.

Single-Cell ATAC Counting#

Single-cell counting is designed for scATAC-seq data. It requires a BAM with CB tags and a positional barcode file containing one barcode per line.

wasp2-count count-variants-sc \
  sc_atac.bam \
  variants.vcf.gz \
  barcodes.tsv \
  --samples sample1 \
  --feature peaks.bed \
  --out_file allele_counts.h5ad

Important points:

  • barcodes.tsv is a positional argument, not --barcode_map

  • --feature and --region are aliases on the single-cell command

  • Accepts BED and MACS2 peak files (GTF/GFF3 are supported only by the bulk count-variants command)

The output is an AnnData .h5ad file with:

  • sparse count layers for ref, alt, and other

  • variant metadata in adata.obs

  • barcode names in adata.var_names

  • feature-to-variant mapping in adata.uns["feature"] when annotations are used

Examples#

Count variants without regional annotation:

wasp2-count count-variants \
  filtered.bam \
  variants.vcf.gz \
  --samples SAMPLE1 \
  --out_file counts.tsv

Count variants inside peaks:

wasp2-count count-variants \
  filtered.bam \
  variants.vcf.gz \
  --samples SAMPLE1 \
  --region peaks.bed \
  --out_file counts_peaks.tsv

Count variants inside genes:

wasp2-count count-variants \
  filtered.bam \
  variants.vcf.gz \
  --samples SAMPLE1 \
  --region genes.gtf \
  --gene_feature gene \
  --gene_attribute gene_id \
  --out_file counts_genes.tsv

Next Steps#