Single-Cell Workflow (scRNA-seq / scATAC-seq) ============================================== End-to-end allele-specific workflow for single-cell data — 10X Chromium scRNA-seq and 10X scATAC-seq. Pipeline is the same in both cases; the data-type difference shows up as GTF (for scRNA-seq genes) vs. BED (for scATAC-seq peaks) in the ``--feature`` argument. Inputs ------ - Cell Ranger BAM with cell barcodes in the ``CB:Z:...`` tag + index - Phased VCF/BCF/PGEN for the donor - Barcode-to-group TSV (cell type or other assignment — see :doc:`/user_guide/single_cell` for Seurat/Scanpy export code and format) - **scRNA-seq**: GTF gene annotation - **scATAC-seq**: BED peak file (usually from Cell Ranger ``filtered_peak_bc_matrix`` or a consensus peak set) Step 1 — Count alleles per cell -------------------------------- **scRNA-seq (genes):** .. code-block:: bash wasp2-count count-variants-sc \ cellranger_output/outs/possorted_genome_bam.bam \ phased_variants.vcf.gz \ barcodes_celltype.tsv \ --feature genes.gtf \ --samples SAMPLE_ID \ --out_file allele_counts.h5ad **scATAC-seq (peaks):** .. code-block:: bash wasp2-count count-variants-sc \ cellranger_output/outs/possorted_bam.bam \ phased_variants.vcf.gz \ barcodes_celltype.tsv \ --feature peaks.bed \ --samples SAMPLE_ID \ --out_file allele_counts.h5ad Output: an AnnData ``.h5ad`` with ``ref`` / ``alt`` / ``other`` layers, genotype columns in ``.obs``, and cell-type assignments in ``.var``. See :doc:`/user_guide/single_cell` for the full schema. Step 2 — Per-group imbalance ---------------------------- .. code-block:: bash wasp2-analyze find-imbalance-sc \ allele_counts.h5ad \ barcodes_celltype.tsv \ --sample SAMPLE_ID \ --phased --min 10 -z 3 \ --out_file imbalance_by_celltype.tsv Output columns: ``region``, ``cell_type``, aggregated ``ref_count`` / ``alt_count``, ``pval``, ``fdr_pval``, ``effect_size`` (log₂ ref/alt). Step 3 — Compare groups (optional) ----------------------------------- .. code-block:: bash wasp2-analyze compare-imbalance \ allele_counts.h5ad \ barcodes_celltype.tsv \ --groups "CD4_T_cell,CD8_T_cell" \ --phased \ --out_file differential_imbalance.tsv Omit ``--groups`` to compare all available groups pairwise. See :doc:`/user_guide/analysis` for the full CLI reference and output columns. Per-cell vs. pseudo-bulk ------------------------ Single-cell ATAC data is especially sparse — most cells contribute zero reads to most peaks. Two analysis modes are common: .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Aspect - Per-cell - Pseudo-bulk (per-cell-type) * - Resolution - Single cell - Cell population * - Power - Low (sparse) - High (aggregated) * - Use case - Outlier cells - Population-level imbalance Pseudo-bulk (the default, via the barcode-to-group TSV) is the right starting point for most scATAC experiments. Per-cell analysis is useful when investigating rare subpopulations or outlier effects. Interpreting results -------------------- .. code-block:: python import pandas as pd results = pd.read_csv('imbalance_by_celltype.tsv', sep='\t') sig = results[results['fdr_pval'] < 0.05] top = (sig.groupby('cell_type') .apply(lambda x: x.nsmallest(10, 'fdr_pval')) .reset_index(drop=True)) print(top[['region', 'cell_type', 'effect_size', 'fdr_pval']]) Troubleshooting --------------- **Zero barcodes matched.** Confirm barcode format in the BAM vs. the TSV — the ``CB:Z:...`` tag often has a ``-1`` suffix that your export must match: .. code-block:: bash samtools view your.bam | head -10000 | grep -o 'CB:Z:[^[:space:]]*' \ | cut -d: -f3 | sort -u > bam_bc.txt cut -f1 barcodes.tsv | sort -u > file_bc.txt comm -12 bam_bc.txt file_bc.txt | wc -l # should be > 0 Fix a missing suffix: .. code-block:: bash awk -F'\t' '{print $1"-1\t"$2}' barcodes_no_suffix.tsv > barcodes.tsv **Sparse counts / low power.** Aggregate to pseudo-bulk by cell type, lower ``--min`` / ``--min_count``, or focus on highly expressed genes (scRNA-seq) / high-coverage peaks (scATAC-seq). **Memory.** For large cohorts, split the feature file by chromosome and process chunks: .. code-block:: bash for chr in chr{1..22}; do grep "^${chr}\s" peaks.bed > peaks_${chr}.bed wasp2-count count-variants-sc sample.bam variants.vcf.gz barcodes.tsv \ --feature peaks_${chr}.bed --out_file counts_${chr}.h5ad done See Also -------- - :doc:`/user_guide/single_cell` — barcode format, Seurat/Scanpy export - :doc:`/user_guide/analysis` — analysis CLI reference - :doc:`/methods/mapping_filter` — canonical WASP filter contract - :doc:`bulk_workflow` — sibling tutorial for bulk RNA-seq / ATAC-seq