Statistics Question

In this mini-project, you will use data collected by the U.S. National Center for Health Statistics through the 2011-2012 National Health and Nutrition Examination Survey (NHANES). The NHANES has been conducted every two years since the early 1960s. The data collected from the survey include demographics, various body and health measurements, and information about various lifestyle choices. The NHANES is unique in that the health measurements are collected through physical examinations. Data on all other variables are self-reported. Before you begin the analysis, we will discuss the data and expectations for this project. Objectives for the mini-project You will understand: ¨ Graphical displays can be used to communicate information about a single variable or the relationship between two or more variables. ¨ Creating effective graphical displays is typically an iterative process. You will be able to: ¨ Create an appropriate display to visualize the distribution of a single variable (univariate graphs) or the relationship between two or more variables (multivariable graphs). ¨ Write a short report to communicate statistical results to a general audience. ¨ Apply principles of making effective graphical displays to improve an existing graph.

1) Locate the NHANES dataset (DCMP_STAT_5D_nhanes_alldata) containing the following variables (continued on the next page): Age: Age in years at screening of the study participant (Note: Subjects 80 years or older were recorded as 80)

  • HealthGen: Self-reported rating of the study participant’s health in general (Excellent, Vgood, Good, Fair, Poor)
  • PhysActive: Whether the study participant reported performing moderate or vigorous sports, fitness, or recreational activities (Yes, No)
  • AttendCollege: Whether the study participant attended or completed college at the time of the study (Yes, No)
  • Credit: iStock/Nattakorn Maneerat Introductory Statistics First Edition (2021) Student Pages Copyright © 2021, The Charles A. Dana Center at The University of Texas at Austin
  • BadPhysHlthDay: Whether the study participant self-reported having at least one bad physical health day in the past 30 days (Yes = at least 1 day; No = 0 days)
  • BadMentlHlthDay: Whether the study participant self-reported having at least one bad mental health day in the past 30 days (Yes = at least 1 day; No = 0 days)
  • BMI: Body mass index (weight/height! in kg/m2) SleepTrouble: Participant told a doctor or other health professional that they had trouble sleeping (Yes, No)
  • SleepHrsNight: Self-reported number of sleep hours the study participant usually got on weekdays or workdays
  • TotChol: Total HDL cholesterol in mmol/L
  • BPSysAve: Average of three systolic blood pressure readings in mm/Hg
  1. Part A: Identify two quantitative variables in the data. What possible values do these variables take in the dataset?
  2. Part B: Identify two categorical variables in the data. List the possible categories these variables can take.

2) In this project, you will create univariate, bivariate, and multivariate graphical displays for quantitative and categorical variables. Identify one or two DCMP Data Analysis Tools at https://utdanacenter.org/data-analysis-tools (or other tools) you can use to create these graphs. Include the name of the tool and the type of graph you can create in your answer.

3) One goal of this project is to continue developing your statistical writing skills. This includes writing your interpretations in a way that can be clearly understood by a general audience and presenting your results in a report suitable for an academic or professional setting.

  • Part A: Describe what is meant by “writing in a way that could be clearly understood by a general audience.”
  • Part B: Refer to the project rubric. Based on the rubric, what are qualities of a report that are suitable for an academic or professional setting?

4) Let’s begin by looking at the ages of the survey respondents.

(Note: Subjects 80 years or older were recorded as 80) HealthGen: Self-reported rating of the study participant’s health in general (Excellent, Vgood, Good, Fair, Poor)

  • Part A: Use technology to create a histogram of Age.
  • Part B: Describe the distribution. Include the shape, center, spread, and the presence of outliers in your description, using appropriate summary statistics as needed.
  • Part C: The distribution shows a small peak around 80 years old. (Note: Changing the binwidth of the histogram makes the peak more or less noticeable.) Briefly explain why there may be a peak around this value.

5) Next let’s examine how the respondents generally perceived their health. Create an appropriate graphical display of HealthGen. Then use the graph to describe two different observations about the respondents’ general perceptions of their health.

6) Now let’s examine whether there is an association between health perception and whether a person performs regular physical activity. Create a graphical display to visualize the distribution of HealthGen for each category of PhysActive. Does there appear to be a difference in general health perception between people who perform regular physical activity and those who do not? Write two observations from the graphical display to support your response.

7) A popular health and wellness website is writing an article that explores the following questions:

  • (1) Does having a habit of regular physical activity change with age?
  • (2) Does the association between age and health perception differ based on performing regular physical activity?

8) Introductory Statistics First Edition (2021) Student Pages Copyright © 2021, The Charles A. Dana Center at The University of Texas at Austin They would like to include a graphical display in the article to help readers visualize conclusions in the article about the associations between the variables. The authors propose the following graph, but the website editor is concerned it may be confusing to readers. She has asked for your help to improve the graph and write an interpretation of the graph that will be included in the article.

  • Part A: Describe two ways this graphical display may be confusing and/or makes it difficult to explore the questions of interest for the article.
  • Part B: Use technology to make an improved graphical display that can be effectively used to answer the two questions of interest. You may use multiple graphs, if needed. In statistics, we often need to manipulate the structure of the dataset in order to create the visualizations. Locate the reorganized dataset (DCMP_STAT_5D_nhanesYesNo) that includes only the variables for this question that aide in the creation of your visual display.
  • Part C: Use the graph(s) to answer the two questions of interest for the article. Write a short paragraph (three to six sentences) that includes your answers to the questions and observations from the graph(s) that support your response.

The response should be written in a way that is clearly understood by a general audience. ONLY NEED HELP/ANSWERS WITH 4-7