Analysis Module#
Overview#
wasp2-analyze runs allelic imbalance analysis on bulk or single-cell count
outputs.
Commands:
find-imbalance: bulk allelic imbalance from TSV countsfind-imbalance-sc: per-group single-cell imbalance from.h5adcountscompare-imbalance: differential imbalance between single-cell groups
Bulk Analysis#
wasp2-analyze find-imbalance \
counts.tsv \
--min 10 \
--pseudocount 1 \
--output ai_results.tsv
Useful options:
--min/--min_count: minimum total count threshold--pseudocount: pseudocount added before modeling--model: dispersion model (currentlysingleorlinearinput)--output/--out_file/-o: output TSV path--region_col: explicit region column name if auto-detection is not desired--groupby: group on an alternate annotation column, such as a parent gene column
Single-Cell Analysis#
wasp2-analyze find-imbalance-sc \
allele_counts.h5ad \
barcode_groups.tsv \
--sample SAMPLE1 \
--min 10 \
--out_file ai_results.tsv
barcode_groups.tsv is a two-column TSV:
BARCODE<TAB>GROUP
The command writes one output file per group using the requested output prefix.
Comparative Single-Cell Analysis#
wasp2-analyze compare-imbalance \
allele_counts.h5ad \
barcode_groups.tsv \
--sample SAMPLE1 \
--groups B_cell T_cell \
--out_file compare_ai.tsv
This compares allelic imbalance between the requested groups and writes one TSV
per comparison. Each output has per-group columns mu1 and mu2 alongside
the pooled-null combined_mu, plus null_ll / alt_ll / pval /
fdr_pval.
Notes#
If your count file contains genotype columns for multiple samples, you must provide
--samplefor single-cell analysis.For bulk analysis, region columns are auto-detected when present in the count TSV. Use
--region_colonly when you need to override that behavior.
Defaults and contracts#
pseudocountdefaults to 1 (added to bothref_countandalt_count— Laplace-style shrinkage toward \(\mu = 0.5\), conservative under \(H_0\)).min_countdefaults to 10; regions with \(N < 10\) are dropped.Beta-binomial LRT: \(\rho\) is held at its null-model MLE while maximizing the alternative likelihood over \(\mu\) (profile likelihood, df = 1). \(\rho\) is not jointly re-estimated under \(H_1\).
Dispersion optimizer bounds \(\rho \in (10^{-6},\, 1-10^{-6})\); the linear-dispersion model clips the logit at \(\pm 10\) for numerical stability on extreme \(N\).
FDR correction uses
scipy.stats.false_discovery_control()withmethod='bh'(Benjamini–Hochberg).scipyraises on NaN p-values, but a hand-written BH loop usingnp.minimum.accumulatesilently propagates NaN through the cumulative minimum — always drop or impute NaN p-values before BH correction.
See the upstream mapping-filter docs (WASP Mapping Bias Correction) for the WASP2 canonical filter contract that applies to both remapping and counting steps.
Outputs#
Typical bulk outputs include:
region or feature identifier
aggregated
ref_countandalt_countp-values and FDR-adjusted p-values
Typical single-cell outputs include the same statistics stratified by barcode group.
Next Steps#
Counting Module to generate bulk or single-cell counts