The motivation we develop Multiple Linear Regression is that Simple Linear Regression only contains one variable that could explaining $Y$ .

That is, it will face a serious Omitted Variable Bias (OVB) problem.

OLS under OVB

The true model:

Y_{i} = β_{0} + β_{1} X_{1 i} + β_{2} X_{2 i} + U_{I}

However, we incorrectly assume:

Y_{i} = α_{0} + α_{1 i} X_{1 i} + V_{i}

Where $E [V_{i} ∣ X_{1 i}] = 0$

From Simple Linear Regression in Matrix Notation, we could also use Matrix to denote the model.

The initial model is:

y_{i} + β_{1} + β_{2} x_{2 i} + β_{3} x_{3 i} + \dots β_{k} x_{ki} + ϵ_{i}

We have $k$ unknowns, and $n$ equations. Only when equations are more than unknowns, we could get the answers.

However, once we change the unknowns to:

X = 1 \dots 1 x_{21} \dots x_{2 n} \dots \dots \dots x_{k 1} \dots x_{kn}

We could denote the model to a much more simpler form:

y = X β + ϵ

recall that from Simple Linear Regression in Matrix Notation, the FOC of $S (b))$ is exactly the same, thanks to the matrix:

\frac{δ S ( b )}{δ b} = 2 X^{'} Xb - 2 X^{'} y = 0

The difference is that the $X^{'} X$ is now a $k \times k$ matrix, not $1 \times 1$

We now could derive $e$ :

y = Xb + e

e = y - Xb

e = y - X (X^{'} X)^{- 1} X^{'} y

e = (I_{k \times k} - X (X^{'} X)^{- 1} X^{'}) y

X^{'} e = X^{'} (I_{n \times n} - X (X^{'} X)^{- 1} X^{'}) y

We called $M = I - X (X^{'} X)^{- 1} X^{'}$ “残差生成矩阵”

X^{'} e = X^{'} I_{k \times k} y - I_{k} X^{'} X (X^{'} X)^{- 1} X^{'} y

Thus,

X^{'} e = 0

Thus we prove that the residual vector $e$ is orthogonal to the matrix of explanatory variables $X$ , i.e. $X^{'} e = 0$ .

OLS part

Ordinary Least Squares

We could then write $\overset{y}{^}$ :

\hat{y} = Xb = X b (X^{'} X)^{- 1} X^{'} y

We denote $H$ :

H = X (X^{'} X)^{- 1} X^{'}

y = Hy + My = \hat{y} + e

Try to figure out that $MH = 0$ ( $H$ and $M$ are orthogonal).

Hint: $H = I - M$

identity matrix
orthogonal projection matrix

are both Idempotency.

The Least Squares is unbiased

See Unbiasedness.

b = (X^{'} X)^{- 1} X^{'} (Xβ + ϵ) = β + (X^{'} X)^{- 1} X^{'} ϵ

To see why the OLS estimator is most efficient, see attached: Proof of Gauss-Markov Theorem.

Estimating the disturbance Variance

That is to check $e$ .

Recall that we have $$ \mathbf{e = (I_{k \times k} - X (X’X)^{-1}X’ )y}

We denote $M = I_{k \times k} - X (X'X)^{-1}X'$ so $e = My$. we already know that $MX = 0$ so $$ e = My = M(X\beta + \epsilon) = \underbrace{ MX\beta }_{ =0 } +M\epsilon = M\epsilon

Thus $E (e) = E [M ϵ] = ME [ϵ] = M 0 = 0$

Under the current assumptions, $M = I_{n \times n} - X (X^{'} X)^{- 1} X^{'}$ is a fixed matrix, and $E (ϵ ϵ^{'}) = σ^{2} I$ .

Va r (e) = E [e e^{'}] = E [M ϵ ϵ^{'} M] = σ^{2} M^{2} = σ^{2} M

Hint: $M$ is idempotent. See Idempotency.

So we know that:

E (ϵ^{'} M ϵ) = σ^{2} Tr (M)

A simple proof: to see why the expectation of a matrix is equal to its Trace, see here

Tr (M) = Tr (I) - Tr (X (X^{'} X)^{- 1} X^{'}) = n - k

This shows that $E [e^{'} e] = (n - k) σ^{2}$ so that:

s^{2} = \frac{e ^{'} e}{n - k}

is unbiased estimator of $σ^{2}$ . $s$ is called standard error of the regression.

$n - k$ is also the degree of freedom.

R^{2} = 1 - \frac{SSR}{SST}

Omitting Relevant Variables

The source of omitted relevant variables: if we ignore some of the dependent variables that have significant relation with the explanatory variables, we could cause bias in the regressors.

Eg: we are focusing on the relation between wage and education, but we forget to include Experience as a explanatory variable.

Discussion

Suppose the true model is:

y = Xβ + ϵ = X_{1} β_{1} + X_{2} β_{2} + ϵ

Suppose we omit $X_{2}$ , the model now becomes:

y = X_{1} β_{1} + ϵ

Now the restricted estimator is:

b_{R} = (X_{1}^{'} X_{1})^{- 1} X_{1}^{'} y = (X_{1}^{'} X_{1})^{- 1} X_{1}^{'} (X_{1} β_{1} + X_{2} β_{2} + ϵ) = β_{1} + (X_{1}^{'} X_{1})^{- 1} X_{1}^{'} X_{2} β_{2} + (X_{1}^{'} X_{1})^{- 1} X_{1}^{'} ϵ

E (b_{R}) = β_{1} + (X_{1}^{'} X_{1})^{- 1} X_{1}^{'} X_{2} β_{2}

It shows that $b_{R}$ will be biased in a predictable direction: the bias is equal to the regression effect of $X_{2}$ predicted by a regression on $X_{2}$ .

We use $e_{R} = y - X_{1} b_{R}$ to represent the corresponding restricted residuals.

Now we have a clear thinking that there must be a difference between $b_{R}$ and $b_{1}$ .

Comparison between residuals

Now we compare $e_{R}^{'} e_{R}$ and $e^{'} e$ :

First, we try to get $e_{R}$ :

e_{R} = M_{1} y = M_{1} (X_{1} b_{1} + X_{2} b_{2} + e) = M_{1} X_{2} b_{2} + e

如何形象理解 $M$ 的意义？其实简单来说， $M$ 是一个投影矩阵，它的作用就是将一个向量投影到 $X$ 的列空间的正交补空间上。简单来说， $M$ 会移除任何在 $X$ 列空间内的成分。

After calculating the results, we could figure out:

e_{R}^{'} e_{R} \geq e^{'} e

Only when $M_{1} X_{2} b_{2} = 0$ , the equal sign is satisfied.

August's Digital Garden

Multiple Linear Regression