Variables “centering” is a procedure that researches ignore quite often working with empirical data.
But what is it? Why can it be very important?
Let’s look at a trivial example:
10 subjects have an annual income and want to assess if this income is related to:
sex (0/males, 1/females);
the level of education (0 /”normal” education, 1/”high” education).
When we fit a classic linear regression model we get:
a coefficient for an age;
a coefficient for the level of education;
a coefficient for a sex.
Let’s say that now you want to interpret the intercept.
The intercept is the average income when the value of all the variables is 0.
In our case the intercept would represent the average income of males (sex = 0), of a normal education (education = 0) and … of zero years!
You understand that in this way the value of the intercept is “not interpretable” because at zero years it is impossible to have an income.
Therefore, to have an interpretable value, you can analyse the data using a “centered” variable of age, obtained by subtracting the average age from each age value instead of direct use of the age variable.
You don’t have to necessarily subtract the average; you can “centre” your variable in any other way, but using the average is, may be, the most widespread criterion.
Let’s assume that your 10 subjects, had the following age:
27, 35, 37, 37, 40, 45, 46, 52, 55 and 62 years old.
The average age was 43.6 years.
Instead of using these values in your regression model you will subtract the average from each of them resulting the following values:
-16.6, -8.6, -6.6, -6.6, -3.6, 1.4, 2.4, 8.4, 11.4 and 18.4.
In this case the coefficients will not be changed but the new intercept will be interpretable as the average income of males of normal education and of average age.
This is just an example of centering the variables by their average value, but I hope you have realized the potential of this strategy.