SAVAGE: De Novo Assembly of Viral Quasispecies using Overlap Graphs

SAVAGE is an approach that enables reconstruction of viral strains, also of very low frequency, in patients without a need for reference data. SAVAGE is therefore particularly helpful when dealing with fresh outbreaks where reliable reference genome data has become obsolete. Discovery of low-frequency strains is crucial when avoiding the development of strains resistant to treatment.
Documentation and Download

  • De Novo Assembly of Viral Quasispecies using Overlap Graphs
    J. Baaijens, A.Z. El Aabidine, E. Rivals†, A. Schönhuth
    Genome Research, 27(5), 835-848
    Publisher Link | bioRxiv:080341

PROSIC: Postprocessing Somatic Indel Calls

PROSIC implements a latent variable model that facilitates to control the false discovery rate (FDR) when discovering somatic insertions and deletions. PROSIC makes use of re-alignment techniques that, in combination with controlling FDR during the variant detection process, allows to drastically increase the sensitivity while keeping precision at a maximum.
Documentation and Download

  • Enhancing Sensitivity And Controlling False Discovery Rate In Somatic Indel Discovery Using A Latent Variable Model
    L.J. Dijkstra*, J. Köster*, T. Marschall†, A. Schönhuth

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads. It achieves very accurate results, works well with Illumina, PacBio and Oxford Nanopore reads, phases SNV's, indels and complex variants, and can be run in pedigree phasing mode.
Documentation and Download

  • WhatsHap: Fast and Accurate Read-Based Phasing
    M. Martin, M. Patterson, S. Garg, S.O. Fischer, N. Pisanti, G.W. Klau, A. Schönhuth, T. Marschall
  • WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads
    M. Patterson*, T. Marschall*, N. Pisanti, L. van Iersel, L. Stougie, G.W. Klau†, A. Schönhuth
    Journal of Computational Biology, 22(6), 498-509
    Publisher Link

HaploClique: Viral Quasisepcies Assembly via Maximal Clique Enumeration

HaploClique is an approach that reconstructs viral quasispecies from next-generation sequencing data. Therefore, it makes use of a graphical approach, where nodes are reads and edges reflect that pairs of reads stem from (locally) identical haplotypes. Maximal cliques then translate into maximal groups of reads from identical haplotypes. To enumerate those maximal cliques, HaploClique makes use of a highly engineered Big Genome Data algorithm.
Documentation and Download

  • Viral Quasispecies Assembly via Maximal Clique Enumeration
    A. Töpfer, T. Marschall, R. Bull, F. Luciani, A. Schönhuth*, N. Beerenwinkel*
    PLoS Computational Biology, 10(3), e1003515, 2014
    *Joint last authors
    Publisher Link

CLEVER: Clique-Enumerating Variant Finder

CLEVER is an insert size based approach to discovering structural variants in next-generation sequenced genomes. Its advantages are to integrate all read information in a sound, graph-based, statistical framework. Outperforms all previous insert-size based approaches and has distinct advantages over split-read approaches. MATE-CLEVER is a hybrid approach, which makes additional use of split-read information and was used in the Genome of the Netherlands project.
Documentation and Download

  • Mendelian-Inheritance-Aware Discovery and Genotyping of Midsize and Long Indels (MATE-CLEVER)
    T. Marschall*, I. Hajirasouliha, A. Schönhuth
    presented at ISMB-HitSeq
    Bioinformatics, 29(24), 3143-3150, 2013
    Publisher Link
  • CLEVER: Clique-Enumerating Variant Finder
    T. Marschall*, I. Costa*, S. Canzar, M. Bauer, G. Klau, A. Schliep, A. Schönhuth
    presented at RECOMB-Seq and ISMB-HitSeq
    Bioinformatics, 28(22), 2875-2882, 2012
    *Joint first authors
    Publisher Link

DCCSE: Discovering Context-Specific Sequencing Errors

DCCSE is based on an approach that systematically screens sequencing experiments for strand biases in sequencing errors, and assigns the resulting positions to sequential motifs. These motifs are likely to be the cause of the errors, due to inducing dephasing during sequencing.
Documentation and Download

  • Discovering Motifs that Induce Sequencing Errors
    M. Allhoff, A. Schönhuth, M. Martin, I.G. Costa, S. Rahmann, T. Marschall
    BMC Bioinformatics, 14(Suppl 5) (Proc. RECOMB-Seq 2013): S1, 2013
    Publisher Link

MirrorTreeTop: Mirroring Co-Evolving Tree Topologies

Software to efficiently align gene trees by means of a tree-topology-constrained, efficient optimization scheme. The advantage of this approach is its speed - it is on orders of magnitude faster than existing distance matrix-based approaches to correctly identifying interacting gene/protein family members.
Hosted by Cenk Sahinalp's lab at Simon Fraser University.
Documentation and Download

  • Mirroring trees in the light of their topologies
    I. Hajirasouliha*, A. Schönhuth*, D. Juan, A. Valencia, S.C. Sahinalp
    Bioinformatics, 28(9), 1202-1208, 2012
    *Joint first authors
    Publisher Link

wDCB: weighted Density-Constrained Biclustering

Software has mainly been applied for inference of systemic cancer markers. Systemic markers are determined as subnetworks in confidence-scored (hence weighted) protein-interaction networks whose genes are differentially expressed in sufficiently many cancer patients, constrained by that subnetworks have to be dense enough. Also works generically, see reference below for formal problem definition.
Hosted by Martin Ester's lab at Simon Fraser University.
Software available on request.

  • Inferring cancer subnetwork markers using density-constrained biclustering
    P. Dao*, R. Colak*, R. Salari, F. Moser, A. Schönhuth†, M. Ester†
    Bioinformatics (Proc. ECCB 2010), 26(18), i625-631, 2010
    *Joint first authors, †Joint last, corresponding authors
    Publisher Link

Coding CpG Islands

Software for determining CpG islands in coding regions. Coding CpG islands are exonic regions which significantly deviate from the pattern statistics of a 5-th order Markov chain, which serves as an appropriate exonic null model. Islands is determined by way of solving a reasonably constrained optimization problem.
In collaboration with Lior Pachter's lab at UC Berkeley.
Software available on request.

  • Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains
    M. Singer*, A. Engström*, A. Schönhuth†, L. Pachter†
    Statistical Applications in Genetics and Molecular Biology, 10(1):43, 2011
    *Joint first authors, †Joint last, corresponding authors
    DOI: 10.2202/1544-6115.1677

Pair HMM Gap Statistics

Software for computation of indel length and multiplicity significance in local and global alignments with affine gap penalties.
Short Documentation and Download

  • Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties
    A. Schönhuth*, R. Salari*, S.C. Sahinalp
    Proceedings of the WABI 2010, LNCS 6293, 350-361, 2010
    *Joint first authors
  • Towards improved assessment of functional similarity in large-scale screens: an indel study
    A. Schönhuth*, R. Salari*, F. Hormozdiari, A. Cherkasov, S.C. Sahinalp
    Journal of Computational Biology 17(1), 1-20, 2010
    *Joint first authors

DCB: Density-Constrained Biclustering

Software to determine functional gene modules as sets of co-expressed genes whose products form densely connected protein-interaction subnetworks. Has outperformed prior approaches in terms of gene ontology based module assessment metrics. For generic usage see reference below.
Hosted by Martin Ester's lab at Simon Fraser University.
Recep Colak's Software Page

  • Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks
    R. Colak, F. Moser, J. Shu, A. Schönhuth*, N. Chen*, M. Ester*
    PLoS One, 5(10): e13348, 2010
    *Joint last authors
    Publisher Link

GQL: Graphical Query Language

Comprehensive software package for the analysis of gene expression time courses.
Hosted by Alexander Schliep's lab at Rutgers.
Documentation, Download and more

  • Constrained mixture estimation for analysis and robust classification of clinical time series
    I. Costa, A. Schönhuth, C. Hafemeister, A. Schliep
    Bioinformatics, 25(12) (Proc. ISMB/ECCB 2009), i6-i14, 2009
  • Semi-supervised clustering of yeast gene expression data
    A. Schönhuth, I. Costa, A. Schliep
    In: A. Okada et al. (eds.), Cooperation in Classification and Data Analysis
    Studies in Classification, Data Analysis, and Knowledge Organizations, 151-159, Springer, 2009
  • Analyzing gene expression time courses
    A. Schliep, I. Costa, C. Steinhoff, A. Schönhuth
    IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(3), 179-193, 2005
  • The Graphical Query Language: a tool for analysis of gene expression time-courses
    I. Costa, A. Schönhuth, A. Schliep
    Bioinformatics, 21(10), 2544-2545, 2005
  • Robust inference of groups in gene expression time-courses using mixtures of HMMs
    A. Schliep, C. Steinhoff, A. Schönhuth
    Bioinformatics, 20, Supp. 1 (Proc. ISMB/ECCB 2004), 283-289, 2004
  • Using hidden Markov models to analyze gene expression time course data
    A. Schliep, A. Schönhuth, C. Steinhoff
    Bioinformatics, 19, Supp. 1 (Proc. ISMB 2003), 255-263, 2003

GHMM Library

Comprehensive software package for usage of hidden Markov models (HMMs).the analysis of gene expression time courses.
Maintained by Alexander Schliep's lab at Rutgers.
Documentation and more, also: Sourceforge Repository.

  • The General Hidden Markov Model Library: Analyzing systems with unobservable states
    A. Schliep, B. Georgi, W. Rungsarityotin, I. Costa da Filho, A. Schönhuth
    In: K. Kremer, V. Macho (eds.): Forschung und Wissenschaftliches Rechnen 2004
    GWDG-Bericht68, 121-136, 2004
  • and many more, see the GHMM homepage.