MATH 1342 Elementary Statistics Review for test 1MiscarriageTotalAspirin522Ibuprofen1353Acetaminophen24172No painkiller103762Total1451009a.What percent of the pregnancies ended in miscarriage?b.For each painkiller group, compute the percent of women taking that painkiller who had a miscarriage. Discuss the results.c.Is this an experiment or an observational study? d.Identify a possible confounding variable. Clearly indicate how the confounding variable is associated with both the explanatory and response variables.e.What percent of all women who miscarried had taken no painkillers?20. Dotplot of Arsenic in ToenailsThe dotplot below shows arsenic concentrations in toenails for 19 individuals.a.Describe the shape of this distribution. Is it symmetric or skewed? Are there any obvious outliers?b.Estimate the five-number summary of this sample.c.Estimate the mean of this sample without calculating it, based on the median and the shape.d.Is it appropriate to use the 95% rule to estimate the standard deviation of these data? Why or why not?21.A.69Prostate Cancer and a Drug for BaldnessThe drug finasteride is marketed as Propecia to help protect against male pattern baldness, and it also mayprotect against prostate cancer. A large sampleof healthy men over age 55 were randomly assigned to receive either a daily finasteride pill or a placebo. The study lasted seven years and the men had annual check-ups and a biopsy at the end of the study. Prostate cancer was found in 804 of 4368 men taking finasteride and in 1145 of the 4692 men taking a placebo.a.Is this an experiment or an observational study? The study was double-blind; what does that mean?b.What are the variables in the study?c.Make a two-way table to display the results. Include the row and column totals.d.What percent of men in the study received finasteride?e.What percent of the men with prostate cancer were in the placebo group?f.Compare the percent of men in each group who got prostate cancer. Does finasteride appear to offer some protection against prostate cancer?22.A.77Infection in Dialysis PatientsThe table below gives data showing the time to infection, at the point of insertion of the catheter, for kidney patients using portable dialysis equipment. There are 38 patients, and the data give the first observation for each patient. Estimate the five-number summary from the data.Time to infection for dialysis patients256778121315151722222324Page 6 of 14
MATH 1342 Elementary Statistics Review for test 12730343953546396113119130132141149152152185190292402447511536a.Estimate the five-number summary from the data.b.Identify any outliers in the data by the IQR method. Show your calculations.23.US monthly retail sales in billions of dollars for the 136 months starting January 2000 are given in the dotplot below. Estimate and interpret the 90th percentile. Justify your answer.24.Barry Bonds hit 73 home runs during the 2001 Major League Baseball season. The histogram below on the next page displays the distribution of the distances the 73 home runs traveled in feet. Each bin (bucket) is 20 wide with the first one starting at 310 feet and ending at 330 feet. Approximately what percentage of the home runs traveled less than 390 feet? Show work.Page 7 of 14
MATH 1342 Elementary Statistics Review for test 1Page 8 of 14
MATH 1342 Elementary Statistics Review for test 125. Alex Rodriguez (“Arod”), Ken Griffey Jr, Barry Bonds and David (“Big Papi”) Ortiz who played roughly during the same time period in Major League Baseball. Rodriguez, Griffey and Bonds all played 22 seasons while Ortiz played 20. Side-by-side boxplots of their seasonal home run totals follow. The circle with the crossinside indicates the location of the mean.Identify the players for which the distribution of seasonal home run totals has each of the following:i) Right skewii) Left Skewiii) approximate symmetry Page 9 of 14
MATH 1342 Elementary Statistics Review for test 126. I have data on ACT math scores for 200 students at Georgia Southern University. A dotplot of the distribution of scores follows.a. Estimate the mean.b. Is it appropriate to use the 95% rule with these data? Why or why not?c. Use the 95% rule to estimate the standard deviation.d. Estimate and interpret the 10th percentile.Page 10 of 14
MATH 1342 Elementary Statistics Review for test 1Page 11 of 14
MATH 1342 Elementary Statistics Review for test 1Solutions1.(a) Experiment. Students are assigned to treatments (teaching methods) by the researchers. (b) Yes. This is a randomized, controlled experiment. Results can be used to demonstrate a causal relationship. (c) Children are not randomly assigned into this group. There may be common characteristics among childrenin families who either choose not to try new methods or fail to return consent forms.2.(a) Increase (b) Decrease (c) Increase (d) Decrease3.(a) Completely randomized design: Randomly assign 15 students to Group 1 (easy mazes) and the other 15 to Group 2 (hard mazes). Compare the time estimates of Group 1 to those of Group 2.(b) Matched pairs design: The matched pairs need to be formed by using the same individual. Each student completes both types of mazes. Randomly assign 15 students to Group 1 (easy mazes first) and the other 15 to Group 2 (hard mazes first). Compare each student’s easy maze time estimate to his or her hard maze time estimate.4.(a) Observational (b) The explanatory variable is alcohol consumption, and the response variable is whether or not the participant dies within four years. (c) A causal relationship cannot be established by this study, because it is observational. The explanatory variable was not randomly assigned to participants(the subjects choose their own treatments). (d) People who drink alcohol in moderation tend to also exercise regularly, and lack of exercise is a contributing factor to early death. Confounding occurs here because lack of exercise is associated with both alcohol consumption (the explanatory variable) and early death (the response variable).5.A.1 (a) Yes (b) Yes6.A.2 (a) No (b) No7.A.3 (a) No (b) Yes (c) No (d) Yes (e) Explanatory: drug vs. placebo (categorical). Response: Remission (yes/no) (categorical).8.A.4 (a) Yes (b) Yes (c) Yes (d) Yes9.A.7 (a) One categorical variable (b) Bar chart (c) Proportion10.A.10 (a) One quantitative variable (b) Histogram (c) Mean11.A.11 (a) Two categorical variables (b) Side-by-side bar charts (c) Difference in proportions12.A.12 (a) One categorical variable and one quantitative variable (b) Side-by-side boxplots, dotplots, or histograms (c) Statistics by group or diference in means13.A.17 (a) This is an experiment since subjects were randomly assigned to one of three groups which determined what method was used. (b) This study could not be “blind” since both the participants and those recording the results could see what each had applied. (c) The sample is the 46 subjects participatingin the experiment. The intended population is probably anyone who might consider using black grease under the eyes to cut down on glare from the sun. (d) The explanatory variable is the eye treatment (black grease, black tape, petroleum jelly), which is categorical. The response variable is the improvement in contrast sensitivity, which is quantitative.14.A.42 (a) There are two variables. percent college graduates (quantitative) and region of the country (categorical). (b) The Northeast has the states with the highest percent of college graduates, while the South has the states with the lowest percent of college graduates. The only outlier is a high outlier in the South. (This is the state of Virginia.) (c) There seems to be a clear association between region and percentof college graduates.15.A.70 (a) The length of a phone call cannot fall below 0 minutes. However, the upper bound on what phone call length is possible is very high, and there may be a few very long phone calls during a month. We expect that the bulk of the data values will be between 0 and 20 minutes, with a tail extending out to the right to some phone calls extending perhaps as long as two hours (120 minutes). This describes a distribution that is skewed to the right. (b) The long phone calls will pull up the mean but not the median, Page 12 of 14
MATH 1342 Elementary Statistics Review for test 1so we expect the mean to be 13.7 minutes and the median to be 2.5 minutes. Notice that this implies that half the phone calls made on this cell phone are less than 2.5 minutes in length.16.A.33 (a) No, we cannot conclude that about 79% of all people think physical beauty matters, since this was a volunteer sample in which only people who decided to vote were included in the sample, and only people looking at cnn.com even had the opportunity to vote. (b) The sample is the 38,485 people who voted. The population, if we made such an incorrect conclusion, would be all people. (c) There is potential for sampling bias in every volunteer sample. (d) People are likely to incorrectly answer “no” to this question if they believe an answer of “yes” would make them appear shallow.17.A.38 (a) The explanatory variable is walking speed, which is quantitative. The response variable is whether or not the person died in the next five years, which is categorical. (b) The article is assuming causation when it should not be, since walking speed was not randomly assigned to the cases. (c) A possible confounding variable is the health of the men due to other factors when they were tested at age 70. Poor health would cause slower walking and greater risk of death in the next five years.18.A.48 (a) The total is n = 2253 so we divide each of the frequencies by the total. See the table. Notice that the relative frequencies add to 1.0, as we expect. (b) (13% do not own a cell phone) (c) 46% own a smartphone.Cell Phone OwnedProportionAndroid smartphone0.203iPhone smartphone0.194Blackberry smartphone0.063Cell phone not smartphone0.410No cell phone0.130Total1.000 19.A.50 (a) The percent of pregnancies ending in miscarriage is 145=1009 = 14.4%. (b) For each category, we compute the percent ending in miscarriage. Aspirin: Percent = 5/22= 22.7% Ibuprofen: Percent = 13/53 = 24.5% Acetaminophen: Percent = 24/172 = 14.0% No painkiller: Percent = 103/762 = 13.5%. The percent ending in miscarriage seems to be higher for those women who used aspirin or ibuprofen. Acetaminophen does not seem to pose a greater risk of miscarrying. (c) This is an observational study. (d)Stress may lead to headaches for which women take painkillers, and may also influence the chance of a miscarriage. Many other health-related examples could work here as well. (e) We have 103/145 = 71.0%. Notice that although certain painkillers appear to increase the risk of a miscarriage, it is still true that within this sample 71% of all miscarriages happened to women who did not use any painkiller.20.A.64 (a) The data are heavily skewed and there appear to be some large outliers. (b) The five-number summary (Min, Q1, Median, Q3, Max) is (0.05, 0.1, 0.15, 0.35, 0.85) (c) The mean is right of the median(0.15), because of the right skew. It is also the balance point of the distribution. We estimate the mean to be about 0.25. (d) No, it is not appropriate to use that rule with this distribution. That rule is useful when data are symmetric and bell-shaped.21.A.69 (a) This is an experiment. Double-blind means that neither the patients nor the doctors making the cancer diagnosis knew who was getting the drug and who was getting a placebo. (b) There are two variables. one records the presence or absence of prostate cancer and the other records whether the Page 13 of 14
MATH 1342 Elementary Statistics Review for test 1individual was in the finasteride group or the placebo group. (c) A two-way table for treatment groups andcancer diagnosis is below. (d) We have Percent receiving finasteride = 4368/9060 = 48.2% (e) A total of 1949 men were found to have cancer, and 1145 of these were in the placebo group, so we have 1145/1949 = 58.7%. (f) We have the following cancer rates in each group. Percent on finasteride getting cancer = 804/4368 = 18.4%. Percent on a placebo getting cancer = 1145/4692 = 24.4%. The percent getting cancer appears to be quite a bit lower for those taking finasteride.CancerNo cancerTotalFinasteride80435644368Placebo114535474692Total19497111906022.A.77 (a) The five-number summary (Min, Q1, Median, Q3, Max) is (2, 15, 46, 149, 536) (b) We see thatthe interquartile range is IQR = Q3 – Q1 = 149 – 15 = 134. We compute: Q1 – 1.5(IQR) = 15 – 1.5(134) = 15 – 201 = –186. Q3 + 1.5(IQR) = 149 + 1.5(134) = 149 + 201 = 350. Outliers are any values outside these fences. In this case, there are four outliers that are larger than 350. The four outliers are 402, 447, 511, and 536.23.Since there are 136 dots, we will find the 90th percentile at 10% of 136 (13.6) dots from the right. This is around 345. US retail sales were below $345 billion in 90% of the 136 months following January 2000.24.Approximately 21/73 * 100 = 28.77%25. i) Right Skew – Jr.ii) Left Skew – Arod and Ortiziii) Roughly Symmetric – Bonds26. a. approximately 24.b. Oh yes, pretty close to being perfectly symmetric.c. s = 3.5 to 4d. 18Page 14 of 14