Mapping Module ============== Overview -------- ``wasp2-map`` implements the WASP remap-and-filter workflow for removing reference mapping bias before allele counting. The public CLI has two commands: 1. ``make-reads``: find reads overlapping sample variants and generate allele-swapped FASTQ files for remapping 2. ``filter-remapped``: keep only remapped reads that return to the same locus There is no separate ``find-intersecting-snps`` command in WASP2. That overlap step is part of ``make-reads``. Typical Workflow ---------------- Step 1: Generate swapped reads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash wasp2-map make-reads \ sample.bam \ variants.vcf.gz \ --samples SAMPLE1 \ --out_dir wasp_output This writes: * ``sample_to_remap.bam``: original reads that must be remapped * ``sample_keep.bam``: reads that never overlapped eligible variants * ``sample_swapped_alleles_r1.fq`` and ``sample_swapped_alleles_r2.fq``: swapped FASTQ reads to realign * ``sample_wasp_data_files.json``: metadata for ``filter-remapped`` Step 2: Realign swapped reads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use the same aligner and alignment settings used for the original BAM. .. code-block:: bash bwa mem -M -t 8 genome.fa \ wasp_output/sample_swapped_alleles_r1.fq \ wasp_output/sample_swapped_alleles_r2.fq | \ samtools sort -o wasp_output/sample_remapped.bam - samtools index wasp_output/sample_remapped.bam Step 3: Filter remapped reads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash wasp2-map filter-remapped \ wasp_output/sample_remapped.bam \ --wasp_data_json wasp_output/sample_wasp_data_files.json \ --out_bam wasp_output/sample_wasp_filtered.bam You can also provide ``to_remap_bam`` and ``keep_bam`` positionally instead of ``--wasp_data_json``. Command Reference ----------------- ``make-reads`` ~~~~~~~~~~~~~~ .. code-block:: bash wasp2-map make-reads [OPTIONS] BAM VARIANTS Important options: * ``--samples`` / ``-s``: sample name(s) used to select het variants * ``--out_dir`` / ``-o``: output directory * ``--out_json`` / ``-j``: explicit metadata JSON path * ``--indels``: include indels as well as SNPs * ``--threads``: BAM I/O threads Notes: * paired-end input is required * phased genotypes are strongly recommended * supported variant formats are VCF, VCF.GZ, BCF, and PGEN ``filter-remapped`` ~~~~~~~~~~~~~~~~~~~ .. code-block:: bash wasp2-map filter-remapped [OPTIONS] REMAPPED_BAM [TO_REMAP_BAM] [KEEP_BAM] Important options: * ``--wasp_data_json`` / ``-j``: load ``to_remap_bam`` and ``keep_bam`` from ``make-reads`` metadata * ``--out_bam`` / ``-o``: output BAM path * ``--remap_keep_bam``: optional BAM of remapped reads that passed filtering * ``--remap_keep_file``: optional text file of kept read names * ``--same-locus-slop``: positional tolerance for same-locus matching * ``--threads``: BAM I/O threads Interpreting Outputs -------------------- Common outcomes after ``filter-remapped``: * reads kept because they remap to the same locus * reads dropped because they remap elsewhere * reads dropped because they fail to remap cleanly The final WASP-corrected BAM is the output of ``filter-remapped`` merged with the ``*_keep.bam`` reads that never required remapping. Example ------- .. code-block:: bash wasp2-map make-reads \ sample.bam \ variants.vcf.gz \ --samples SAMPLE1 \ --out_dir wasp_output bwa mem -M -t 8 genome.fa \ wasp_output/sample_swapped_alleles_r1.fq \ wasp_output/sample_swapped_alleles_r2.fq | \ samtools sort -o wasp_output/sample_remapped.bam - samtools index wasp_output/sample_remapped.bam wasp2-map filter-remapped \ wasp_output/sample_remapped.bam \ --wasp_data_json wasp_output/sample_wasp_data_files.json \ --out_bam wasp_output/sample_wasp_filtered.bam Next Steps ---------- * :doc:`counting` to count alleles from the WASP-filtered BAM * :doc:`analysis` to test for allelic imbalance