DISCUSSION QUESTION

Boating trips.csv contains a dataset describing the number of boating trips of leisure boat owners. It contains the following variables:

Trips: number of boating trips in the survey year

Quality: respondent’s rating of the facility’s quality

Ski: whether the respondent engages in water-ski activities

Income: household income in $1000.

Userfee: has the respondent paid annual user fee to the destination

CostC: average expenditure ($) during a boat trip at lake C

CostS: average expenditure ($) during a boat trip at lake S

CostH: average expenditure ($) during a boat trip at lake H

 

(a) Plot histogram of the outcome variable (trips). Copy it here.

(b) Analyze potential multicollinearity issues of the X variables (all variables except for trips). Use threshold=10. What variables are eliminated?

(c) Run a poisson regression of the outcome variable (trips) on the remaining variables from (b). Which variable are significant predictors of trips at 0.05 level?

(d) How does the expected number of boating trips differ for owners who have paid annual user fees vs. those who haven’t?

(e) Run a negative binomial regression of the outcome variable (trips) on the same set of variables as in (c). Now what is the your answer to (d)?

(f) Which model provides better model fit? Poisson or negative binomial? Support your answer.

 

Part 2. Admdata.csv

Admdata.csv contains a dataset on admission outcomes of graduate school applicants. There are 4 variables in this dataset.

Admit: =yes if the applicant is admitted

Gre: GRE score of the applicant

Gpa: GPA of the applicant

Rank: rank of the undergraduate institution of the applicant (Tier 1 is the highest and Tier 4 is the lowest).

(a) Tabulate the variable admit. What is the average probability of an applicant being admitted to a graduate program in this dataset?

(b) Run a logistic regression of the outcome variable (admit) on the other variables (gre, gpa, rank). Copy your results.

(c) Based on the estimation results, compare the odds of admission for applicants graduating from each tier of undergraduate institutions. (Hint: use Tier 1 as the base, explain whether the odds for Tier 2/3/4 is higher or lower, and by how much, respectively.)

(d) Based on the model, what is the predicted admission probability for an applicant with GRE 720, GPA=3.85, who graduated from a Tier2 undergrad institution?

(e) Predict probabilities of admission for all applicants in the dataset. Use 0.5 as the cutoff to make predictions of admission status (yes vs. no). Copy the confusion matrix here. What is the accuracy of the model, and what are sensitivity and specificity of the model. What do you think about the model predictions? Are you satisfied or concerned, and why?

(f) Suppose you want to improve the sensitivity of the model, how would you adjust the cutoff, and why?