Mapping Module#
Overview#
wasp2-map implements the WASP remap-and-filter workflow for removing
reference mapping bias before allele counting.
The public CLI has two commands:
make-reads: find reads overlapping sample variants and generate allele-swapped FASTQ files for remappingfilter-remapped: keep only remapped reads that return to the same locus
There is no separate find-intersecting-snps command in WASP2. That overlap
step is part of make-reads.
Typical Workflow#
Step 1: Generate swapped reads#
wasp2-map make-reads \
sample.bam \
variants.vcf.gz \
--samples SAMPLE1 \
--out_dir wasp_output
This writes:
sample_to_remap.bam: original reads that must be remappedsample_keep.bam: reads that never overlapped eligible variantssample_swapped_alleles_r1.fqandsample_swapped_alleles_r2.fq: swapped FASTQ reads to realignsample_wasp_data_files.json: metadata forfilter-remapped
Step 2: Realign swapped reads#
Use the same aligner and alignment settings used for the original BAM.
bwa mem -M -t 8 genome.fa \
wasp_output/sample_swapped_alleles_r1.fq \
wasp_output/sample_swapped_alleles_r2.fq | \
samtools sort -o wasp_output/sample_remapped.bam -
samtools index wasp_output/sample_remapped.bam
Step 3: Filter remapped reads#
wasp2-map filter-remapped \
wasp_output/sample_remapped.bam \
--wasp_data_json wasp_output/sample_wasp_data_files.json \
--out_bam wasp_output/sample_wasp_filtered.bam
You can also provide to_remap_bam and keep_bam positionally instead of
--wasp_data_json.
Command Reference#
make-reads#
wasp2-map make-reads [OPTIONS] BAM VARIANTS
Important options:
--samples/-s: sample name(s) used to select het variants--out_dir/-o: output directory--out_json/-j: explicit metadata JSON path--indels: include indels as well as SNPs--threads: BAM I/O threads
Notes:
paired-end input is required
phased genotypes are strongly recommended
supported variant formats are VCF, VCF.GZ, BCF, and PGEN
filter-remapped#
wasp2-map filter-remapped [OPTIONS] REMAPPED_BAM [TO_REMAP_BAM] [KEEP_BAM]
Important options:
--wasp_data_json/-j: loadto_remap_bamandkeep_bamfrommake-readsmetadata--out_bam/-o: output BAM path--remap_keep_bam: optional BAM of remapped reads that passed filtering--remap_keep_file: optional text file of kept read names--same-locus-slop: positional tolerance for same-locus matching--threads: BAM I/O threads
Interpreting Outputs#
Common outcomes after filter-remapped:
reads kept because they remap to the same locus
reads dropped because they remap elsewhere
reads dropped because they fail to remap cleanly
The final WASP-corrected BAM is the output of filter-remapped merged with
the *_keep.bam reads that never required remapping.
Example#
wasp2-map make-reads \
sample.bam \
variants.vcf.gz \
--samples SAMPLE1 \
--out_dir wasp_output
bwa mem -M -t 8 genome.fa \
wasp_output/sample_swapped_alleles_r1.fq \
wasp_output/sample_swapped_alleles_r2.fq | \
samtools sort -o wasp_output/sample_remapped.bam -
samtools index wasp_output/sample_remapped.bam
wasp2-map filter-remapped \
wasp_output/sample_remapped.bam \
--wasp_data_json wasp_output/sample_wasp_data_files.json \
--out_bam wasp_output/sample_wasp_filtered.bam
Next Steps#
Counting Module to count alleles from the WASP-filtered BAM
Analysis Module to test for allelic imbalance