Computational part (type and submit your answers in WORD only)

Please make sure that you include a cover page.

Question 1. (15 points) Alumni Donation Data (Multiple Linear Regression). Continue with the same data from homework 1 and fit a multiple linear regression model to the data, where the alumni giving rate is the response variable (\(Y\)), and the percentage of classes with fewer than 20 students (\(X_1\)) and Student/Faculty Ratio (\(X_2\)) as the predictors.

  1. What is your final estimated model?

  2. What is the predicted alumni giving rate for an observation with \(\left(X_1 = 50, X_2 = 10\right)\)?

  3. Test the statistical significance of the regression coefficient using t-tests; use \(\alpha = 0.05\). Obtain the t-statistics and p-values, interpret the results, make a conclusion (i.e. reject or not reject), and explain why. Note: please explain what the null hypothesis is.

  4. What is the F statistic? Is it significant? Clearly write out the null hypothesis, F-statistic, and p-value and interpret the test results.

  5. What is the value of the coefficient of determination? Please interpret.

  6. What is the correlation coefficient \(r_1\) between \(X_1\) and \(Y\) and the correlation coefficient \(r_2\) between \(X_2\) and \(Y\)? Do you see any relationship between \(r_1\), \(r_2\), and \(R^2\)?

Question 2. (10 points) Simulation Study (Simple Linear Regression). Assume mean function \(E\left(Y|X\right) = 10 + 5 X_1 - 2X_2\)

  1. Generate data with \(X_1 \sim N\left(\mu = 2, \sigma = 0.1\right)\), \(X_2 \sim N\left(\mu = 0, \sigma = 0.4\right)\), sample size \(n = 100\), and error term \(\epsilon \sim N\left(\mu = 0, \sigma = 0.5\right)\).

  2. Fit a multiple linear regression to the simulated data from part a. What is the estimated prediction equation? Report the estimated coefficients and their standard errors. Are they significant? Clearly write out the null and alternative hypotheses, observed t-statistic(s), p-value(s), and interpret the estimates and test results. What is fitted model’s MSE?

  3. Repeat part b), but re-simulate the data and change the error term to \(\epsilon \sim N\left(0, \sigma = 1\right)\)

  4. Repeat parts a)–c) using \(n = 400\). What do you conclude? What is the effect to the model parameter estimates when the error variance gets smaller? What is the effect when the sample size gets bigger?

  5. What about the MSE from each model?

Mathematics part (feel free to type or handwrite your solutions)

Question 3. (5 points) Multiple Linear Regression in Matrix Notation

  1. Write out the multiple linear regression model with normal errors in matrix form.

  2. State the model assumptions from part a).

  3. Write out the model matrix \(X\) for the model in part a).

  4. What is the least squares estimate of \(\boldsymbol{\beta}\) in part a)?

  5. What does it mean for the estimate in part e) to be unbiased?