Please use this identifier to cite or link to this item:
Publication type: Article in scientific journal
Type of review: Peer review (publication)
Title: Progressive multiple sequence alignment with indel evolution
Authors: Maiolo, Massimo
Zhang, Xiaolei
Gil, Manuel
Anisimova, Maria
DOI: 10.21256/zhaw-4775
Published in: BMC Bioinformatics
Volume(Issue): 19
Issue: 331
Issue Date: 2018
Publisher / Ed. Institution: BioMed Central
ISSN: 1471-2105
Language: English
Subjects: Dynamic programming; Indel; Phylogeny; Poisson process; Probability; Sequence alignment; Algorithms; Molecular evolution; INDEL mutation
Subject (DDC): 572: Biochemistry
Abstract: Background: Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modelled by a Markov substitution model. In contrast, the dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. But the failure to model indel evolution may lead to artificially short alignments due to biased indel placement, inconsistent with phylogenetic relationship. Results: Recently, the classical indel model TKF91 was modified to describe indel evolution on a phylogeny via a Poisson process, termed PIP. PIP allows to compute the joint marginal probability of an MSA and a tree in linear time. We present a new dynamic programming algorithm to align two MSAs –represented by the underlying homology paths– by full maximum likelihood under PIP in polynomial time, and apply it progressively along a guide tree. We have corroborated the correctness of our method by simulation, and compared it with competitive methods on an illustrative real dataset. Conclusions: Our MSA method is the first polynomial time progressive aligner with a rigorous mathematical formulation of indel evolution. The new method infers phylogenetically meaningful gap patterns alternative to the popular PRANK, while producing alignments of similar length. Moreover, the inferred gap patterns agree with what was predicted qualitatively by previous studies. The algorithm is implemented in a standalone C++ program.
Fulltext version: Published version
License (according to publishing contract): CC BY 4.0: Attribution 4.0 International
Departement: Life Sciences and Facility Management
Organisational Unit: Institute of Computational Life Sciences (ICLS)
Appears in collections:Publikationen Life Sciences und Facility Management

Files in This Item:
File Description SizeFormat 
2018_Maiolo_Progressive multiple sequence alignment.pdf1.89 MBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.