Question DetailsNormal
$ 18.00
Project 2 | Complete Solution
Question posted by

For this assignment use the data set senic.xlsx. This data set consists of a random sample of 113 hospitals. The objective is to study the infection risk and what factors influence it. The variables from the data set are:


Variable Name


Identification number


Length of stay

Average length of stay in hospital (in days)


Average age of patients (in years)

Infection risk

Average estimated probability of acquiring infection in hospital (in percent)

Routing culturing ratio

Ratio of number of cultures performed to number of patients without signs or symptoms of pneumonia, times 100

Routine chest X-ray ratio

Ratio of number of X-rays performed to number of patients without signs or symptoms of pneumonia, times 100

Number of beds

Average number of beds in hospital

Medical school affiliation

0 = Yes, 1 = No

Average daily census

Average number of patients in hospital per day

Number of nurses

Average number of full-time licensed practical nurses

Available facilities and services

Percent of 35 potential facilities and services that are provided by the hospital


The goal is to fit the best multiple regression model to the response (infection risk).


Do an analysis using the first 108 observations.


Use the stepwise regression method to see which model is the best. Repeat using subset regression. Do they agree?

Are there any outliers in the data? Look for x-outliers, y-outliers, and high-influence points.

Come up with one model that you think best describes the data and can be used for future predictions. Show the residual plot for this one. Does the model seem appropriate?

Use this model to predict (using prediction interval) y for the last 5 observations of the data and see if the model is doing well.

Available Solution
$ 18.00
Project 2 | Complete Solution
  • This Solution has been Purchased 1 time
  • Submitted On 25 Apr, 2015 04:47:20
Solution posted by
First we tried to find whether there are any outliers in the data considering all variables and first 108 observations. There are few outliers as shown by the below tables with classification of outliers: Also the same can be seen from the below box plots:   We tr...
Buy now to view full solution.

$ 629.35