Cashback Offer (1st - 15th August, 2020). Get Flat 10% Cashback credited in your account on minimum transaction of $50. Post Your Question

Question DetailsNormal
$ 60.00

STAT 501 Mid-Term Exam 2 | Solution

Question posted by
request

STAT 501 – Mid-Term Exam 2 – Spring 2015 – Due April 12

Instructions: Use Word to type your answers within this document. Then, submit your answers in the appropriate dropbox in ANGEL by the due date and within 3 hours of downloading the exam. The point distribution is located next to each question.

 

  1. (4x2 = 8 points) State which of the following statements is TRUE and which is FALSE. For the statements that are false, explain why they are false.
    1. Removing an outlier in a regression analysis will result in narrower confidence intervals.
    2. In a simple linear regression (SLR) model, if a log transformation is performed on X to remedy some non-linearity, the mean value of Y is bound to change.
    3. In model selection, the highest adjusted R2-value and the smallest S-value criteria always yield the same "best" models.
    4. Regression models with different responses, but the same predictor X matrix, will have the same leverage values.
  2. (3+3+4+4+3+3 = 20 points) Open the “Salary Data.” The dataset consists of current salaries (Salary in thousands of dollars) for 63 individuals with information about their years of work experience (YrsExp) and highest degree attained (Degree). Your goal is to fit a regression model to express the dependence of Y (Salary) on X (YrsExp) and Degree.
    1. Clearly define a set of indicator variables that could be used in a regression model to represent the qualitative variable Degree. [Hint: Think carefully about the number of indicator variables needed given the number of levels of Degree and use “Bachelor” as the reference level.]
    2. Write a population multiple linear regression equation for predicting the current salary in terms of YrsExp and Degree. Since education level could impact the dependence of Y on X, the model should contain an interaction effect between YrsExp and Degree, together with their main effects. [Hint: Your equation should include Y, X, the indicator variables you defined in part (a), interaction terms, and population regression coefficients (β’s).]
    3. Conduct a hypothesis test for whether the average annual salary increase per year of experience differs by level of education (i.e., test if the slopes for two or more Degree categories differ). Write out the null and alternative hypotheses, the test statistic, the p-value, and the conclusion. [Minitab v17: Select Salary as the Response, YrsExp as the Continuous predictor, Degree as the categorical predictor, click “Model,” select both YrsExp and Degree together in the Predictors box and click the Add button next to “Interactions through order 2.” Minitab v16: Create interaction terms using Calc > Calculator before fitting the regression model.]
    4. Write a new population regression equation based on your conclusion to part (c). Fit this model and conduct two separate hypothesis tests for whether the mean salary for a fixed number of years’ experience differs by education level. For each test, write out the null and alternative hypotheses, the test statistic, the p-value, and the conclusion.
    5. Based on your conclusion to part (d), write three fitted sample regression equations that can be used to predict the current salary for each education level. [Hint: Your equations should include number values, not β’s.]
    6. Based on one of the equations from part (e), predict the current salary of a PhD degree holder with 10 years of work experience. [Hint: A point estimate is sufficient so there is no need for an interval.]
  3. (4x2 = 8 points) Consider the following four graphs where the vertical axis represents Y and the horizontal axis represents X.

 

Choose the most appropriate plot for each of the following models (where D1 and D2 represent a set of indicator variables):

  1.  
  2.  
  3.  
  4.  

 

 

 

  1. (5+2+5+3+3 = 18 points) The file “Savings Data” contains savings of 33 individuals along with their age. It is apparent that Y = Savings (in $) has a positive association with X = Age (in years). An appropriate regression model relating Savings to Age could be useful for predicting savings based on age. The most straightforward approach would be to fit a simple linear regression (SLR) model for Y vs X, provided that the LINE assumptions are satisfied. [Consult “Worked Examples Using Minitab” in the Online Notes for help with any Minitab procedures.]
    1. Fit an SLR model for Y vs X and perform a residual plot analysis to determine if the LINE assumptions are satisfied. Include a numerical test when checking for normality (use the Ryan Joiner test in Minitab). Discuss your findings and include any relevant graphs.
    2. Based on your conclusion in part (a), determine if any transformations are suggested for X and/or Y. [Hint: You should find that both X and Y need to be transformed.]
    3. Fit an SLR model for the transformed variable(s) and comment on this model’s validity with supporting statements, numerical tests and/or plots.
    4. Use Minitab to compute a 95% confidence interval for the mean amount of savings (in $) expected for 40 year-olds based on the fitted model in part (c). [Hint: Remember to take into account the transformations to X and Y.]
    5. Use Minitab to compute a 95% prediction interval for the amount of savings (in $) predicted for a randomly selected 40 year-old based on the fitted model in part (c). [Hint: Remember to take into account the transformations to X and Y.]
  2. (2+1+3+2 = 8 points) The following Minitab output resulted from a multiple linear regression model fit to response variable, Y, and predictor terms, X1, X2, and X1X2:

Coefficients

 

Term        Coef  SE Coef  T-Value  P-Value

Constant    4.49     1.89     2.37    0.022

X1         0.759    0.374     2.03    0.048

X2         0.965    0.426     2.26    0.028

X1*X2     0.1742   0.0821     2.12    0.039

  1. Conduct a hypothesis test for whether the interaction term, X1X2, can be dropped from the model. Write out the population model, null and alternative hypotheses, the test statistic, the p-value, and the conclusion.
  2. Based on your conclusion to part (b), write the fitted sample regression equation.
  3. State whether the following statements are supported by the Minitab output. (simply write “yes” or “no” for each statement).
    1. X1 and X2 are positively associated.
    2. Y and X1 are positively associated for fixed values of X2.
    3. The linear association between Y and X1 increases as X2 increases.
  4. Use the fitted equation in part (b) to predict Y for an observation with X1 = 6 and X2 = 5. [Hint: A point estimate is sufficient.]

 

 

  1. (6x3 =18 points) The table below was obtained from the Best Subsets regression procedure for the “Infection Risk Data.”

Response is InfctRsk

 

                                               C

                                               u

                                               l   C N

                                               t X e u

                                             S u r n r

                                             t r a s s

             R-Sq    R-Sq  Mallows           a e y u e

Vars  R-Sq  (adj)  (pred)       Cp        S  y s s s s

   1  35.5   34.8    30.2     51.2   1.1351  X

   1  34.7   34.0    30.7     53.2   1.1428    X

   2  53.0   52.0    48.3     14.0  0.97380  X X

   2  46.3   45.1    40.9     29.2   1.0415    X   X

   3  57.0   55.6    51.4      7.1  0.93657  X X     X

   3  56.0   54.6    49.5      9.4  0.94740  X X   X

   4  59.3   57.5    53.1      4.0  0.91622  X X X   X

   4  58.7   56.9    52.0      5.4  0.92323  X X X X

   5  59.3   57.1    51.0      6.0  0.92120  X X X X X

  1. Based on the criteria listed in the table above select what you believe to be the “Best” model and write down its population regression equation. Support your answer.
  2. Would you consider this model to yield an unbiased predicted response? Support your answer.
  3. Name a model in the table that may yield a biased predicted response. Support your answer.
  4. Calculate SSTO using information in the table.
  5. Use Minitab’s Backward Elimination procedure on this dataset and write down the fitted sample regression equation for the resulting “best” model. Use αr = 0.15 and the Minitab v17 command sequence: Stat > Regression > Regression > Fit regression model > Stepwise (select Backward Elimination for Method). For Minitab v16 use Stat > Regression > Stepwise.
  6. State any extra useful information provided by the Backward Elimination output that is not available in the Best Subsets table above.
  1. (4+2+2+2+4+3+3 = 20 points) Open the “Profits Data.” The data indicate a positive linear association between interest rates and broker profits. The data are to be used primarily to obtain a regression model and compute confidence/prediction intervals.
    1. Fit an SLR model for Y = profits and X = interest rate and create a scatterplot of Y vs X with the fitted regression line added.
    2. For the model in part (a) discuss whether there are any “extreme X values.” [Hint: Use leverages.]
    3. For the model in part (a) discuss whether there are any “outliers” (unusual Y values). [Hint: Use internally studentized residuals, which Minitab calls standardized residuals.]
    4. For the model in part (a) discuss whether there are any “influential data points.” [Hint: Use Cook’s distances.]
    5. You should have identified one outlier in part (c). Repeat your regression analysis after deleting this outlier. Again create a scatterplot of Y vs X with the fitted regression line added.
    6. Compare the results of your regression analyses and plots obtained from parts (a) and (e).
    7. In the context of this problem, comment on any detrimental effects if the outlier was not removed.
Available Answer
$ 60.00

[Solved] STAT 501 Mid-Term Exam 2 | Solution

  • This Solution has been Purchased 5 time
  • Submitted On 11 Apr, 2015 01:29:51
Answer posted by
solution
Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 4 112.612 28.1529 33.54 0.000 Stay 1 15.703 15.7032 18.71 0.000 Cultures 1 19.536 19.5358 23.27 0.000 Xrays 1 4.345 4.3451 5.18 0.025 Nurses 1 7.480 7.4801 8.91 0.004 Error 92 77.231 0.8395 Total 96 189.842...
Buy now to view full solution.
Other Similar Questions
User Profile
Exper...

STAT 501 – Homework 10 (covers Lesson 11) | Complete Solution

From the above scatter pot we can see that the range of values lies in the same interval thus visually there is no sign of an extrapolation beyond the range of the data. From the given data we can see that, 1 4 2 1 4 4 1...
User Profile
Homew...

STAT 501 Final Exam | 3 Questions solved

The intercept test β_0 may or may not have any practical interpretation depending on the range of the predictors, it has the usual interpretation that if all the predictors are 0 then the value of the dependent variable. Th...
User Profile
smart...

STAT 501 Mid-Term Exam 2 | Solution

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 4 112.612 28.1529 33.54 0.000 Stay 1 15.703 15.7032 18.71 0.000 Cultures 1 19.536 19.5358 23.27 0.00...

The benefits of buying study notes from CourseMerit

homeworkhelptime
Assurance Of Timely Delivery
We value your patience, and to ensure you always receive your homework help within the promised time, our dedicated team of tutors begins their work as soon as the request arrives.
tutoring
Best Price In The Market
All the services that are available on our page cost only a nominal amount of money. In fact, the prices are lower than the industry standards. You can always expect value for money from us.
tutorsupport
Uninterrupted 24/7 Support
Our customer support wing remains online 24x7 to provide you seamless assistance. Also, when you post a query or a request here, you can expect an immediate response from our side.
closebutton

$ 629.35