Computational part (type and submit your answers in WORD only)

Please make sure that you include a cover page.

Question 1. (12 points) Alumni Donation Data (Simple Linear Regression). Continue with the same data from homework 1 and fit a simple linear regression on the data, where the alumni giving rate is the response variable \(Y\) of interest and the percentage of classes with fewer than 20 students as the predictor variable \(X\).

  1. What is the estimated slope? Is it significant at the \(\alpha = 0.05\) level? Clearly write out the null and alternative hypotheses, observed t-statistic, p-value, and interpret the estimate and test results.

  2. Repeat part a. above using the equivalent F-test.

  3. What is the value of \(R^2\)? Please interpret.

  4. What is the correlation coefficient \(r\) between \(X\) and \(Y\)? What is the relationship between \(r\) and \(R^2\)?

  5. Plot the training data (i.e., the data used to fit the model) with the fitted regression line and include a 95% (pointwise) confidence band for the mean responses. What do you observe about the confidence band at the point \(\left(\bar{X}, \bar{Y}\right)\)? Is it narrower or wider compared to the rest?

Question 2. (12 points) Simulation Study (Simple Linear Regression). Assume mean function \(E\left(Y|X\right) = 10 + 5 * X\). For this exercise, use set.seed(7052) to ensure reproducibility.

  1. Generate data with \(X \sim N\left(\mu = 2, \sigma = 0.1\right)\), sample size \(n = 100\), and error term \(\epsilon \sim N\left(\mu = 0, \sigma = 0.5\right)\).

  2. Fit a simple linear regression to the simulated data from part a. What is the estimated prediction equation? Report the estimated coefficients and their standard errors. Are they significant? Clearly write out the null and alternative hypotheses, observed t-statistic(s), p-value(s), and interpret the estimates and test results. What is fitted model’s MSE?

  3. Repeat part b), but re-simulate the data and change the error term to \(\epsilon \sim N\left(0, \sigma = 1\right)\)

  4. Repeat parts a)–c) using \(n = 400\). What do you conclude? What is the effect on the model parameter estimates when error variance gets smaller? What is the effect when sample size gets bigger?

  5. What about the MSE from each model?

Mathematics part (feel free to type or handwrite your solutions)

Question 3. (6 points) Bias and Variance of Parameter Estimates.

  1. What are the bias and variance of the OLS estimates \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\)?

  2. What do you expect to happen to the variances of the OLS estimates of \(\beta_0\) and \(\beta_1\) when the sample size \(n\) increases? What do you expect when the error variance \(\sigma^2\) increases?

  3. What is the bias of the model’s MSE? What about the ML estimate of \(\sigma^2\)? What is the difference between these two estimates of \(\sigma^2\)? Why do we use MSE instead of the ML estimate?