Model Selection

Model selection is an important tool to select one of several competing statistical models. Recently, to specific frameworks have been frequently used to achieve a trade-off between model fit and complexity. On the one hand, Bayesian model selection rests on the Bayes factor, the odds of the marginal likelihoods for each model (in a way, a generalized likelihood ratio). On the other hand, minimum description length (MDL) is a principle for data compression. Both methods account for the functional flexibility of models and take order constraints into account.  

Fisher Information Approximation for MDL

A specific minimum description length (MDL) measure (i.e., FIA) is prone to errors in small samples. We’ve proposed a simple way to compute a lower-bound sample size that ensures that this specific bias cannot occur. This lower-bound N can simply be computed with this Excel Sheet. For details, see:

  • [PDF] Heck, D. W., Moshagen, M., & Erdfelder, E. (2014). Model selection by minimum description length: Lower-bound sample sizes for the Fisher information approximation. Journal of Mathematical Psychology, 60, 29–34. https://doi.org/10.1016/j.jmp.2014.06.002
    [Abstract] [BibTeX] [Preprint] [GitHub]

    The Fisher information approximation (FIA) is an implementation of the minimum description length principle for model selection. Unlike information criteria such as AIC or BIC, it has the advantage of taking the functional form of a model into account. Unfortunately, FIA can be misleading in finite samples, resulting in an inversion of the correct rank order of complexity terms for competing models in the worst case. As a remedy, we propose a lower-bound N’ for the sample size that suffices to preclude such errors. We illustrate the approach using three examples from the family of multinomial processing tree models.

    @article{heck2014model,
    title = {Model Selection by Minimum Description Length: {{Lower-bound}} Sample Sizes for the {{Fisher}} Information Approximation},
    author = {Heck, Daniel W and Moshagen, Morten and Erdfelder, Edgar},
    date = {2014},
    journaltitle = {Journal of Mathematical Psychology},
    volume = {60},
    pages = {29--34},
    doi = {10.1016/j.jmp.2014.06.002},
    abstract = {The Fisher information approximation (FIA) is an implementation of the minimum description length principle for model selection. Unlike information criteria such as AIC or BIC, it has the advantage of taking the functional form of a model into account. Unfortunately, FIA can be misleading in finite samples, resulting in an inversion of the correct rank order of complexity terms for competing models in the worst case. As a remedy, we propose a lower-bound N' for the sample size that suffices to preclude such errors. We illustrate the approach using three examples from the family of multinomial processing tree models.},
    arxivnumber = {1808.00212},
    github = {https://github.com/danheck/FIAminimumN},
    keywords = {heckfirst}
    }

MDL and the Bayes Factor  

MDL and Bayes factors are asymptotically identical under specific conditions. However, some qualitative differences exist in finite samples as shown in

  • [PDF] Heck, D. W., Wagenmakers, E., & Morey, R. D. (2015). Testing order constraints: Qualitative differences between Bayes factors and normalized maximum likelihood. Statistics & Probability Letters, 105, 157–162. https://doi.org/10.1016/j.spl.2015.06.014
    [Abstract] [BibTeX] [Preprint]

    We compared Bayes factors to normalized maximum likelihood for the simple case of selecting between an order-constrained versus a full binomial model. This comparison revealed two qualitative differences in testing order constraints regarding data dependence and model preference.

    @article{heck2015testing,
    title = {Testing Order Constraints: {{Qualitative}} Differences between {{Bayes}} Factors and Normalized Maximum Likelihood},
    author = {Heck, Daniel W and Wagenmakers, Eric-Jan and Morey, Richard D.},
    date = {2015},
    journaltitle = {Statistics \& Probability Letters},
    volume = {105},
    pages = {157--162},
    doi = {10.1016/j.spl.2015.06.014},
    abstract = {We compared Bayes factors to normalized maximum likelihood for the simple case of selecting between an order-constrained versus a full binomial model. This comparison revealed two qualitative differences in testing order constraints regarding data dependence and model preference.},
    arxivnumber = {1411.2778},
    keywords = {Inequality constraint,Minimum description length,model,Model complexity,model selection,Model selection,Polytope\_Sampling,selection}
    }

Priors for Reparameterized Models

The Bayes factor strongly depends on the specific prior distributions on the parameters. This becomes especially relevant if models with substantively meaningful parameters are reparameterized. We discuss this issue in case of MPT models, which are often reparameterized to include order constraints, and show how to derive adjusted priors for the new model:

  • [PDF] Heck, D. W., & Wagenmakers, E. (2016). Adjusted priors for Bayes factors involving reparameterized order constraints. Journal of Mathematical Psychology, 73, 110–116. https://doi.org/10.1016/j.jmp.2016.05.004
    [Abstract] [BibTeX] [Preprint] [Data & R Scripts]

    Many psychological theories that are instantiated as statistical models imply order constraints on the model parameters. To fit and test such restrictions, order constraints of the form theta_i {$<$} theta_j can be reparameterized with auxiliary parameters eta in [0,1] to replace the original parameters by theta_i = eta*theta_j. This approach is especially common in multinomial processing tree (MPT) modeling because the reparameterized, less complex model also belongs to the MPT class. Here, we discuss the importance of adjusting the prior distributions for the auxiliary parameters of a reparameterized model. This adjustment is important for computing the Bayes factor, a model selection criterion that measures the evidence in favor of an order constraint by trading off model fit and complexity. We show that uniform priors for the auxiliary parameters result in a Bayes factor that differs from the one that is obtained using a multivariate uniform prior on the order-constrained original parameters. As a remedy, we derive the adjusted priors for the auxiliary parameters of the reparameterized model. The practical relevance of the problem is underscored with a concrete example using the multi-trial pair-clustering model.

    @article{heck2016adjusted,
    title = {Adjusted Priors for {{Bayes}} Factors Involving Reparameterized Order Constraints},
    author = {Heck, Daniel W and Wagenmakers, Eric-Jan},
    date = {2016},
    journaltitle = {Journal of Mathematical Psychology},
    volume = {73},
    pages = {110--116},
    doi = {10.1016/j.jmp.2016.05.004},
    abstract = {Many psychological theories that are instantiated as statistical models imply order constraints on the model parameters. To fit and test such restrictions, order constraints of the form theta\_i {$<$} theta\_j can be reparameterized with auxiliary parameters eta in [0,1] to replace the original parameters by theta\_i = eta*theta\_j. This approach is especially common in multinomial processing tree (MPT) modeling because the reparameterized, less complex model also belongs to the MPT class. Here, we discuss the importance of adjusting the prior distributions for the auxiliary parameters of a reparameterized model. This adjustment is important for computing the Bayes factor, a model selection criterion that measures the evidence in favor of an order constraint by trading off model fit and complexity. We show that uniform priors for the auxiliary parameters result in a Bayes factor that differs from the one that is obtained using a multivariate uniform prior on the order-constrained original parameters. As a remedy, we derive the adjusted priors for the auxiliary parameters of the reparameterized model. The practical relevance of the problem is underscored with a concrete example using the multi-trial pair-clustering model.},
    arxivnumber = {1511.08775},
    osf = {https://osf.io/cz827},
    keywords = {heckfirst,Polytope\_Sampling}
    }