1. A study looked at whether pack years of lifetime smoking (smokepy) can predict the level of C-reactive protein (CRP), an inflammatory marker, after controlling for age, socioeconomic status (SES) and education. SES and education were numerical variables; CRP was normally distributed. The results of analyses are shown below.
Significance criterion is set at p<0.05.
R-squared = .43
Adjusted R-squared = .41
Parameter Estimates
Variable DF Estimates Standard
Error
t value p-value
Intercept 1 169.39 7.92 21.39 <.001
Smokepy 1 -0.38 0.05 -7.49 <.001
Age 1 0.03 0.04 0.77 .44
SES 1 1.41 0.67 2.1 .04
Education 1 -1.66 0.87 -1.91 .06
A) What type of statistical procedure is this?
B) Describe the findings, including interpretation of all values in the column for parameter estimates and whether or not they are significant.
C) After controlling for model complexity (i.e. number of independent variables), what is the proportion of variability in CRP explained by this model?
2. An investigator conducted a study to find the relationship between the number of decayed, missing, or filled teeth (DMFT) and sugar consumption. The investigator produced an estimate for the correlation coefficient and provided the following statement: “The correlation between DMFT and sugar consumption is 0.7. There is a strong correlation between DMFT and sugar consumption. Therefore, it is recommended that patients be advised to reduce sugar consumption to prevent tooth decay.”
State why you are or you are not confident about this investigator’s conclusion. In other words, explain if something is missing from this investigator’s analysis, or if all you need is provided.
3. You are conducting a study to analyze gender differences in neurocognitive impairment (NCI) within a sample of cocaine-dependent methadone-maintained patients. You found 3 demographic characteristics that produced significant effects on NCI. They are gender, race, and age.
A) What statistical analysis would you use to see simultaneously the contributions of socio-demographic variables (gender (male/female), race (White, Black, Latino, Asian), and age (in years) on self-reported NCI, a normal continuous outcome (higher scores indicate higher NCI)?
B) How many independent variables will there be in your model? Describe (1) what they are, (2) how you would create them, and (3) interpretation for each coefficient.
4. We ran an inference test to study if gender (0=female; 1= male) is associated with a diagnosis of Type 2 Diabetes Mellitus (t2dm: 0= absent; 1=present) on a group of patients, controlling for age. Results table are shown as below:
Analysis of Maximum Likelihood Estimates
Estimate Standard
Error
Test statistic p-value Exp(B)
Intercept -12.77 1.9759 41.8176 <.0001 ———-
Gender 0.41 0.124 10.9799 0.0009 1.5
Age 0.0948 0.0305 9.6883 0.0019 1.09
A) What type of model is this, and why is this type of analysis appropriate in this case?
B) Describe the finding: is gender associated with diagnosis of t2dm, why or why not (provide the test statistic and p-value)? Interpret the coefficient for gender and age.
5. In 1998, there was a major ice storm in Maine. Researchers wanted to know whether there was an association between generator location (inside or outside) and CO poisoning after an ice storm. Results from their case-control study are summarized in the table below (cases are observations that have experienced the CO poisoning, controls are observations that have not experienced the CO poisoning):
(A) What type of table is this?
(B) Name at least 2 tests you can perform to investigate the association between generator location and CO poisoning after an ice storm.
(C) Calculate the odds ratio and risk ratio based on this table. Which one is more appropriate for this type of study design?