Frequently Asked Questions ========================== Installation ------------ **Which install method should I use?** For most users: ``mamba install -c conda-forge -c bioconda wasp2`` (Bioconda). This installs WASP2 and all dependencies (samtools, bcftools, bedtools) in one step. Use ``pip install wasp2`` if you are on a system without conda or want a specific Python environment. You will need to install samtools, bcftools, and bedtools separately. Use Docker or Singularity on HPC clusters or when you need full reproducibility. **What Python versions are supported?** Python 3.10, 3.11, 3.12, and 3.13. Pre-built wheels are available for all four on Linux (x86_64, aarch64) and macOS (Intel, Apple Silicon). **Why do I get an error about missing samtools/bcftools/bedtools?** The PyPI wheel bundles the Rust extension and htslib but not the system binaries. Install them via conda (``mamba install -c bioconda samtools bcftools bedtools``) or your system package manager. Input Data ---------- **Do I need phased genotypes?** Yes. WASP2 assigns reads to haplotypes using phased heterozygous variants. Without phase information, WASP2 cannot distinguish which allele a read came from. Use WhatsHap, SHAPEIT4, or Eagle2 to phase your VCF before running WASP2. **What VCF formats does WASP2 support?** * VCF or BCF (bgzip-compressed + tabix-indexed: ``.vcf.gz`` + ``.tbi``) * PLINK2 PGEN format (``.pgen`` + ``.pvar`` + ``.psam``) Multi-sample VCFs are supported; use ``-s SAMPLE_ID`` to specify the target sample. **Can I use an unphased VCF?** The counting step (``wasp2-count``) will still run but the allele assignments will be arbitrary. The statistical results will have reduced power and increased false positives. Always use phased genotypes when possible. **My BAM doesn't have read groups. Will WASP2 work?** Yes, for counting. Read groups are not required for allele counting. For the remapping step (``wasp2-map``), the sample ID is needed to look up variants in a multi-sample VCF — pass it explicitly with ``-s SAMPLE_ID``. Running WASP2 ------------- **How long does each step take?** Typical runtimes on a single core for a 30× whole-genome BAM (~100M reads): * ``wasp2-map make-reads``: 2–4 hours * Re-alignment (external): depends on aligner * ``wasp2-map filter-remapped``: 1–2 hours * ``wasp2-count count-variants``: 30–60 minutes * ``wasp2-analyze find-imbalance``: < 5 minutes Use the Nextflow pipelines for automatic parallelization across chromosomes/samples. **Can I run WASP2 on multiple samples at once?** Yes. WASP2 CLI processes one sample at a time; run multiple samples in parallel with a job scheduler (SLURM, PBS) or use the Nextflow pipelines which handle parallelization automatically. **What is the ``--region`` flag for?** Restrict counting to a specific genomic region (e.g., ``chr1:1000000-2000000``). Useful for testing on a subset of data or for chromosome-level parallelization. Single-Cell ----------- **What single-cell chemistries are supported?** All 10x Genomics Chromium chemistries (scRNA v1/v2/v3, scATAC v1/v2) and any other protocol with a cell barcode tag in the BAM (CB tag by default). See :doc:`user_guide/single_cell` for barcode format details. **Do I need Cell Ranger output?** No, but it is the most common input. WASP2 needs: * A BAM with cell barcodes in a BAM tag (default: ``CB``) * A whitelist of valid barcodes (optional but recommended) * A phased VCF Any aligner that produces CB-tagged BAMs will work (STARsolo, Alevin-fry, etc.). **How do I get per-cell-type results?** Run WASP2 on the full BAM to get per-cell allele counts, then use the output with your cell type annotations in Python (AnnData/Scanpy) to aggregate by cell type. See :doc:`tutorials/single_cell_workflow` for an example. Output and Results ------------------ **What does the p-value in the output represent?** The p-value comes from a likelihood ratio test comparing the beta-binomial model under allelic imbalance vs. the null model of balanced expression. The test is calibrated for the overdispersion typical of RNA-seq count data. **What FDR threshold should I use?** The standard threshold is FDR < 0.05. For discovery analyses you may want FDR < 0.1. For validation or follow-up experiments, consider FDR < 0.01. See :doc:`user_guide/analysis` for the BH procedure and the NaN-propagation warning. **My output has very few significant sites. What's wrong?** Common causes: * Low coverage at heterozygous sites (increase ``--min_count``) * Too few heterozygous variants in the VCF * VCF and BAM use different chromosome naming conventions (``chr1`` vs ``1``) * VCF is not phased **My output has too many significant sites (inflated FDR).** This typically means mapping bias is driving the signal. Run the WASP remapping step (``wasp2-map``) before counting. See :doc:`user_guide/mapping`. **For ATAC-seq, do I need to use WASP-remapped BAMs?** Yes. WASP2 counting applies only the unmapped filter (see :doc:`methods/mapping_filter` "Canonical Filter Contract"); it does **not** correct reference mapping bias on its own. You must run ``wasp2-map make-reads`` + re-alignment + ``wasp2-map filter-remapped`` first, then pass the resulting ``*_wasp_filt_rmdup.bam`` to ``wasp2-count``. Counting on raw BWA output leaves reference bias uncorrected — reads carrying the alt allele are systematically under-represented. The same requirement applies to RNA-seq and scATAC-seq. The only difference is the aligner used in the re-alignment step (STAR for RNA, BWA for ATAC). Troubleshooting --------------- **I get "chromosome not found" errors.** VCF and BAM must use the same chromosome naming convention. If your VCF uses ``chr1`` and your BAM uses ``1`` (or vice versa), use ``bcftools annotate --rename-chrs`` to harmonize the VCF. **The Rust extension fails to load.** This happens if the wheel was built for a different platform or Python version. Try reinstalling: ``pip install --force-reinstall wasp2``. If building from source, run ``pixi run verify`` to rebuild. **WASP2 runs but produces an empty counts file.** Check that: * The BAM is coordinate-sorted and indexed (``.bai`` file present) * The VCF overlaps the regions in your BAM * The sample name passed with ``-s`` matches a sample in the VCF Use ``bcftools query -l variants.vcf.gz`` to list VCF sample names.