How to write a report on a fitted mixed-effect model in a manuscript or in a thesis correctly?

Ronald Fisher was a genius, for someone the greatest after Darwin. It would be enough to go to his Wikipedia page to understand what a great work he has done. And thanks to this we all can use mixed-effect models now.

Using mixed-effect models (or mixed models, or multilevel, as you prefer) is a complicated task by itself, and to report the results even worse to those who do not know well these statistical techniques.

So let’s see how to do it, in case you have a situation like this: you have developed one of these models and now you should report the results.

ATTENTION: this post is not a tutorial on how to fit a mixed model. The assumption is that you are already familiar with these techniques.

Let’s go ahead.

The main assumption is obvious and should be always in your mind: a mixed-effect model requires more elements to be reported than a simple linear or logistic regression. Some extra effort is needed.

What is your audience?

The first question you need to ask yourself is: who are you writing for?

In general, there are only two cases: people who know statistics and people who do not. In the first case there are reviewers, regulatory bodies and so on. In the second, thesis commissions whose components do not know statistics well, or any customer who wants you to analyse his data.

In the first case, you could resolve an issue as follows: “I had to use a mixed-effect model because some data, such as those from the same hospital or surgeon, were correlated.”

Unless their curiosity is abnormal, you shouldn’t provide other explanations.

If, on the other hand, you deal with statisticians, you must be very convincing as to why you used a mixed model and not a simpler technique. Remember that many reviewers consider these models as mathematical tricks to explore the data and achieve ‘significant’ results that researchers do appreciate.

So be careful: you need to provide a clear rationale and details to be sure that your results will be reproducible.

Let’s see how to avoid undesirable risks.

Many researchers address the issue of reporting in  this way  and in this one. You could also find good guidelines, very detailed checklists like this one.

However, without going into extreme details, the reporting schedule that I use and that has always worked can be found below. This is not all you need to report, of course.

  • Describe very clear how the data are structured: repeated measures, multilevel data structure, cross and nested factors. And remember to say that a simpler model would produce less correct estimates.
  • List the random factors (intercepts and slopes) that you have introduced in the model, in addition to the fixed factors, justifying the choice.
  • Indicate the covariance structures you have used. There is no need to remember that when you model repeated or correlated measures, you will have to choose a covariance matrix of the model residuals: exchangeable, unstructured, etc. Well, you will need to report the one you have used. In the same way you will need to report the covariance matrix between random effects inserted in the model, if any.
  • Report and explain a choice of the estimation method you used: Maximum Likelyhood (ML) or Maximum Restricted Maximum Likelyhood (REML).
  • Report the results obtained: in addition to the fixed effects, report the variance of the random effects, commenting it through the relative Intraclass Correlation Coefficient (ICC).
  • Describe what type of diagnostics you used to evaluate your model.
  • Obviously, report any other details related to the type of mixed model you used: linear, logistic, poisson?

Here is an example that might help you writing a paper. Let’s take as an example a study in which we measure the number of days of hospitalization after surgical interventions in different surgical departments in different hospitals, taking in consideration other variables, such as age and sex of a patient and experience of a surgeon, (a little bit banal example, I know, but I would not want to complicate things).

The aim of the analysis is to estimate the days of post-operation hospitalization of the patients on the basis of the data we have collected. Given the multilevel structure of the data (patients nested in surgeons, in turn nested in hospitals) it was considered necessary to build the model by inserting the fixed factors age, sex and surgeon-experience, a random intercept relating to the hospital participating in the survey and a random slope on the level of experience of the surgeon considered.

Given the large number of patients involved, it was considered correct to use the maximum likelihood as an estimation method since the bias attributable to this method is negligible for large sample sizes.

It was considered correct to assume the covariance matrices of random factors and residues respectively Toeplitz and Unstructured.

Age reaches a statistical significance coefficient, 95% confidence interval, while sex and surgeon-experience do not (coefficients, 95% confidence intervals). The residual variability is explained by the intercept and the random slope inserted in the model with Intraclass Correlation Coefficients of 0.19 and 0.27 respectively.

The diagnostics considered for the model was the verification of the linearity and independence of all the variables and the normal distribution of the residues.

That’s all. I repeat, not all models are the same (otherwise, biostatisticians would not have a job anymore) but for the vast majority of mixed models that you could use in your analyses, the formula I described is sufficient.

Leave a Reply

Your email address will not be published. Required fields are marked *