Skip to content

ProfH2SO4/verrea

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VerRea (Verac Realigner)

logo

Introduction

In certain cases, standard aligners may incorrectly detect variants due to limitations in their scoring schemes.
This can lead to soft-clipping of reads or the identification of false-positive variants,
where reads are aligned to shorter or suboptimal variant representations instead of the true underlying sequence.

VerRea aims to address these issues by re-evaluating alignments and improving the accuracy of variant detection.

Table of Contents

Project Goal

The goal of this project is to improve variant detection accuracy by correcting misaligned reads produced by standard aligners.

VerRea performs a targeted realignment step after the initial alignment. It analyzes selected genomic regions and identifies reads that were likely misaligned (soft-clipping or suboptimal scoring).
These reads are then realigned against alternative reference sequences to better represent the underlying variants.

This approach aims to:

reduce false-positive variant calls
recover missed or incorrectly represented variants
improve alignment quality in challenging regions

Requirements and Dependencies

  • OS MacOs or Linux
  • C++17 or later
  • HTSlib
  • spdlog (automatically fetched before compile)

Installation and Run

Debug

make dev
./build/debug/app  \
        "--ref" "path_to_hg38" \
        "--in" "tests/inputs/bams/100_reads_only_carl_Seraseq-STD-10-ng_so_rmdp.bam" \
        "--kmer" "41" \
        "--sga" "8,-4,-15,-1" \
        "--ca" "5,2" \
        "--log" "debug" \
        "--mmr" "0.05" \
        "--out" "tests/temp/out_seraseq.rea.bam" \
        "--targets" "tests/inputs/beds/carl.bed"

Prod

make all
./build/release/app  \
        "--ref" "path_to_hg38" \
        "--in" "tests/inputs/bams/100_reads_only_carl_Seraseq-STD-10-ng_so_rmdp.bam" \
        "--kmer" "41" \
        "--sga" "8,-4,-15,-1" \
        "--ca" "5,2" \
        "--log" "info" \
        "--mmr" "0.05" \
        "--out" "tests/temp/out_seraseq.rea.bam" \
        "--targets" "tests/inputs/beds/carl.bed"

Tests and limitations

After compiling you can run tests

./build/release/basis_tests "--ref" "path_to_hg38"

All testing input files(.bams) comes from Plasmids, Standards or artificially simulated.
The input .bams contain only short reads 150bp.

All tests were performed using short-read sequencing data (150 bp).
The tool currently operates on targeted regions specified via a BED file,
as the primary use case focuses on selected genomic loci. Whole-genome performance has not yet been evaluated,
and the current implementation is single-threaded.

License Information

MIT

Acknowledgments

This project was developed in response to the need for improved detection of problematic variants during the development of an amplicon kit for MPN (Myeloproliferative Neoplasms), specifically targeting the CALR (calreticulin) locus.
The laboratory development of the kit was done by Veronika Chladova (BioVendor R&D).
The interpretation and evaluation of variant allele frequency (VAF) within the MPN kit were also consulted with her expertise.

Contact Information

For questions, bugs:
Matej Forgac — forgac.matej@gmail.com

For sequencing-related questions(lab part):
Veronika Chladova — chladova@biovendor.com

About

Read-Realigner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors