Correcting the bias of empirical frequency parameter estimators in codon models
Date
2010-07
Authors
Kosakovsky Pond, Sergei
Delport, Wayne
Muse, Spencer V.
Scheffler, Konrad
Journal Title
Journal ISSN
Volume Title
Publisher
Public Library of Science -- PLOS
Abstract
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural
selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost
always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical
convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an
adverse effect on goodness of fit and estimates of substitution rates. We propose a ‘‘corrected’’ empirical estimator that
begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via
simulation that the corrected estimates outperform the de facto standard F3|4 estimates not just by providing better
estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the
evolutionary models. On a curated collection of 856 sequence alignments, our estimators show a significant improvement in
goodness of fit compared to the F3|4 approach. Maximum likelihood estimation of the frequency parameters appears to
be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification,
either statistical or computational, for continued use of the F3|4-style estimators.
Description
The original publication is available at http://www.plosone.org/
Keywords
Markov processes, biological processes, Goodness of fit, Nucleotide counts, Codon substitution models
Citation
Kosakovsky Pond, S., Delport, W., Muse, S.V. & Scheffler, K. 2010. Correcting the bias of empirical frequency parameter estimators in Codon Models. PLoS ONE, 5(7): e11230, doi:10.1371/journal.pone.0011230.