# “Centering” the variables: what it means and what it is used for

Variables “centering” is a procedure that researches ignore quite often working with empirical data.
But what is it? Why can it be very important?

Let’s look at a trivial example:
10 subjects have an annual income and want to assess if this income is related to:
age;
sex (0/males, 1/females);
the level of education (0 /”normal” education, 1/”high” education).

When we fit a classic linear regression model we get:
an intercept;
a coefficient for an age;
a coefficient for the level of education;
a coefficient for a sex.

Let’s say that now you want to interpret the intercept.

The intercept is the average income when the value of all the variables is 0.
In our case the intercept would represent the average income of males (sex = 0), of a normal education (education = 0) and … of zero years!

You understand that in this way the value of the intercept is “not interpretable” because at zero years it is impossible to have an income.
Therefore, to have an interpretable value, you can analyse the data using a “centered” variable of age, obtained by subtracting the average age from each age value instead of direct use of the age variable.

You don’t have to necessarily subtract the average; you can “centre” your variable in any other way, but using the average is, may be, the most widespread criterion.