Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

Murrell, Ben; Weighill, Thomas; Buys, Jan; Ketteringham, Robert; Moola, Sasha; Benade, Gerdus; du Buisson, Lise; Kaliski, Daniel; Hands, Tristan; Scheffler, Konrad

Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

dc.contributor.author	Murrell, Ben
dc.contributor.author	Weighill, Thomas
dc.contributor.author	Buys, Jan
dc.contributor.author	Ketteringham, Robert
dc.contributor.author	Moola, Sasha
dc.contributor.author	Benade, Gerdus
dc.contributor.author	du Buisson, Lise
dc.contributor.author	Kaliski, Daniel
dc.contributor.author	Hands, Tristan
dc.contributor.author	Scheffler, Konrad
dc.date.accessioned	2013-03-15T07:59:30Z
dc.date.available	2013-03-15T07:59:30Z
dc.date.issued	2011-12-22
dc.description	The orginal publication is at www.plosone.org	en_ZA
dc.description.abstract	Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.	en_ZA
dc.description.sponsorship	Europeaid grant number SANTE/2007/174-790 from the European Commission.
dc.description.sponsorship	Funding for the UCSD computing cluster was provided by the Joint DMS/NIGMS Mathematical Biology Initiative through Grant NSF-0714991 and the National Institutes of Health grant AI47745.
dc.description.version	Publisher's version	en_ZA
dc.format.extent	11 p. : col. ill.
dc.identifier.citation	Murrell B, Weighill T, Buys J, Ketteringham R, Moola S, et al. (2011) Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution. PLoS ONE 6(12): e28898.	en_ZA
dc.identifier.other	10.1371/journal.pone.0028898
dc.identifier.uri	http://hdl.handle.net/10019.1/80401
dc.language.iso	en_ZA	en_ZA
dc.publisher	PLOS	en_ZA
dc.rights.holder	The authors holds the copyright	en_ZA
dc.subject	Proteins -- Separation	en_ZA
dc.subject	Biomedical research	en_ZA
dc.subject	Generalist models	en_ZA
dc.subject	Specialist models	en_ZA
dc.subject	Non-negative matrix factorization (NNMF)	en_ZA
dc.subject	Amino acid synthesis	en_ZA
dc.title	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution	en_ZA
dc.type	Article	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: murrell_nonnegative_2011.pdf
Size:: 566.54 KB
Format:: Adobe Portable Document Format
Description:: Publishers' Version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.95 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Research Articles (Mathematical Sciences)