What proportion of babies weigh between 10 and 14 pounds? Is it unusual for a baby to weigh more than 17 pounds?

Statistics Question

Baby weights: According to a recent National Health Statistics Reports, the weight of male babies less than 2 months old in the United States is normally distributed with mean 11.5 pounds and standard deviation 2.7 pounds.

  • What proportion of babies weigh between 10 and 14 pounds?
  • Is it unusual for a baby to weigh more than 17 pounds?

SAT scores: The College Board reports that in a recent year, the mean mathematics SAT score was 514, and the standard deviation was 118. A sample of 65 scores is chosen.

  • What is the probability that the sample mean score is less than 500?

 

How can you control for variables trending linearly over time in your regression model? Estimate a model that controls for linearly trending variables. Do the results from iii) change?

Relationship between consumer prices and approval rates

Question

You are commissioned by the White House to investigate the relationship between consumer prices and approval rates of the President. You are given access to the APPROVAL dataset. The dataset consists of 78 months of data during the presidency of George W. Bush (ending in July 2007, before Bush left office). In addition to economic variables and binary indicators for various events, it includes an approval rate, approval, collected by Gallup.

  1. i) Create a line graph with values for cpi on the left y-axis and values for approve on the right y-axis. Check for a linear trend in both variables. Hint: This time, you need to create the ‘date’ variable before even declaring it for time-series use. These are monthly data, so you might want to type ‘help ym’ for more insights. The full command is `gen date = ym(year, month)’. You also need to generate the initial linear time trend. The command is `gen t = _n’.
  2. ii) What is the median of the variable approve? Is it larger or smaller than the average value? Use the graph from part i) to explain why.

iii) Estimate the model

lap provet = 0o + I3ilcpit + ut,

where lap provet and pipit are the logarithmic forms of approver and cpit, respectively. Interpret the coefficients.

  1. iv) How can you control for variables trending linearly over time in your regression model? Estimate a model that controls for linearly trending variables. Do the results from iii) change?
  2. v) What other variables might be important to include? What happens if you omit them from your re-gression model?

 

 

Describe the shape of the observed distribution for each group. Do you think that the sampled data come from a population with a normal distribution? Why or why not?

DISCUSSION ESSAY

Problems not marked with [SPSS] should be done by hand, as problems similar to those could be on the Midterm and/or Final.  (Of course, feel free to check your work using SPSS!)

Problems marked with [SPSS] are intended to be done with SPSS.  For these problems, please attach the output file.

 

  1. A study was conducted investigating the long-term prognosis of children who have suffered an acute episode of bacterial meningitis, an inflammation of the membranes enclosing the brain and spinal cord. Listed below are the times to the onset of seizure for 13 children who took part in the study. In months, the measurements are:

0.10   0.25   0.50   4   12   12   24   24   31   36   42   55   96

Find the following numerical summary measures of the data:

  1. Mean
  2. Median
  3. Mode
  4. Range
  5. Interquartile Range
  6. Standard Deviation
  7. How many standard deviations away from the mean is a child whose time to the onset of seizure was 50 months? (Note: for the purpose of this problem, please assume that the population standard deviation is the same as the sample standard deviation.)
  8. What proportion of children have an onset to seizure time of 50 or more months?
  9. What proportion of children have an onset to seizure time between the mean and 50 months?
  10. Calculate a 95% confidence interval around the mean assuming that the data are normally distributed with a known population variance of 20
  11. Calculate a 95% confidence interval around the mean assuming that the data are normally distributed with an unknown population variance.
  12. Calculate a 99% confidence interval around the mean assuming that the data are normally distributed with an unknown population variance.

 

  1. [SPSS] A study was conducted comparing female adolescents who suffer from bulimia to healthy females with similar body compositions and levels of physical activity. The file sav contains measures of daily caloric intake, recorded in kilocalories per kilogram, for samples of adolescents from each group.

 

  1. Find the median daily caloric intake for both the bulimic adolescents and the healthy ones.
  2. Compute the IQR for each group.
  3. Construct box-and-whisker plots for each group.
  4. Describe the shape of the observed distribution for each group. Do you think that the sampled data come from a population with a normal distribution?  Why or why not?
  5. Describe the qualitative differences between the two groups based on the box-and-whisker plots. (For example, which average is higher?  Which group has more variability?  Are there outlying values in either group?)

 

  1. [SPSS] The declared concentrations of nicotine in milligrams for 35 brands of Canadian cigarettes are saved under the variable name nicotine in the file sav.

 

  1. Find the mean and median concentrations of nicotine.
  2. Produce a histogram of the nicotine measurements. Describe the shape of the observed distribution.  Do you think that the sampled data come from a population with a normal distribution?  Why or why not?
  3. Which number do you think provides the best measure of central tendency for these concentrations, the mean or the median? Why?

 

  1. [SPSS] The data set sav contains information for the sample of 100 low birth weight infants born in Boston, Massachusetts. This data set contains information on the infants, including systolic blood pressure (SBP), gender, and gestational age of the infant, as well as APGAR score at 5 minutes, toxemia diagnosis for mother and germinal matrix hemorrhage.
    1. Run descriptive statistics in SPSS on all numeric variables, including all possible dispersion statistics, as well as skewness and kurtosis. Attach the output file.
    2. Use SPSS to provide a 95% confidence interval around the mean and show the quartiles (25th, 50th, 75th percentiles) for each numeric variable. Attach the output file (can be all one file).
    3. Create frequency tables in SPSS of all categorical tables. Attach the output file (can be all one file).
    4. Create a cross-tabulation table of toxemia diagnosis for mother and germinal matrix hemorrhage, including the expected frequencies and column percentages. Attach the output file (can be all one file).

 

  1. [SPSS] The datasetsav contains data examining the mean pulse rate of students taking a midterm for PM 510. Two TAs each measured the pulse rate of 10 students taking the midterm in the class after 1 hour. Each TA selects 10 students at random. Let 𝜇 represent the true (population) mean pulse of the students taking the PM 510 midterm.

 

  1. Calculate the 90% confidence interval for 𝜇 based on the data collected by the 1st
  2. Calculate the 90% confidence interval for 𝜇 based on the data collected by the 2nd
  3. Interpret the confidence intervals.
  4. Compare the two confidence intervals. Give some possible reasons why they are different.

 

  1. A library wants to determine the effectiveness of their summer literacy program among low-income children. Because surveying the large numbers of students in the program would require too many resources the library staff interviews 30 randomly chosen children among the low-income program attendees.  The 30 sampled children are given a reading test before and after the program.
    • Describe the population of this study.

 

  • The difference in the reading test scores (after – before) has mean = 10 and SD = 4. Assuming the score differences are normally distributed, what percent of the children showed any improvement (difference > 0) in reading ability?

 

  • What percent of children improved by more than 15 points?

 

  1. [SPSS] Use SPSS (use a blank dataset) to calculate the following probabilities:  Consider the standard normal distribution with mean μ = 0 and standard deviation 𝜎 = 1. Provide the answers to each question and attach the output file.

 

  1. What is the probability that an outcome z is < -2.05?
  2. What is the probability that an outcome z is > 1.82?
  3. What is the probability that an outcome z is > -1.82?
  4. What is the probability that an outcome z is between –2.28 and 1.92?
  5. What value of z cuts off the upper 30% of the standard normal distribution?
  6. What value of z cuts off the lower 8% of the standard normal distribution?

 

 

Discuss where it came from. What was the context? Summarize the meaning that was communicated. Provide a critique of the data by discussing the population, sample, and any descriptive or inferential information related to the data.

DISCUSSION ESSAY

Data and statistics are shared with us every day. In this course, you have explored data, visualizing data, central tendency, descriptive statistics, and basics of inferential statistics. You reflected about your experience and thoughts about data. Now that you have learned some additional statistics concepts, let’s reflect again.

Respond to the following in a minimum of 175 words:

  • Identify 4 different types of data you have encountered today or this week. Maybe it’s data you read, heard, or saw on television. For each identified data type, do the following:
  • Discuss where it came from. What was the context?
  • Summarize the meaning that was communicated.
  • Provide a critique of the data by discussing the population, sample, and any descriptive or inferential information related to the data. Use examples from concepts in this class to help inform your discussion. What information was present? What was missing? How does that change your perception of the data?

 

What factors should the assistant chief consider in determining the presence of gender bias in firefighter promotion? Is the promotional status of recently promoted firefighters independent of their gender?

DISCUSSION QUESTIONS

You volunteer some of your spare time to your local fire department and have been asked by an assistant chief to analyze data on firefighters who applied for promotion. The assistant chief wants to ensure that gender bias is not a concern in the promotion of firefighters. Shown below is data for 50 firefighters who applied for promotion and the results of a chi-square analysis of the data.

  Male Female
Promoted 13 22
Not Promoted 10 5

 

Chi-Square Statistic 3.6845
P value 0.054919
  1. What factors should the assistant chief consider in determining the presence of gender bias in firefighter promotion?
  2. Is the promotional status of recently promoted firefighters independent of their gender?
  3. What reasons should the assistant chief convey to the fire chief to justify the absence of gender bias in the most recent class of firefighters who were promoted?
  4. How might the presence of gender bias in promotions impact the fire department?

 

What are the odds of responding “yes” for a female subject who is 60 years old? What is the estimated odds ratio corresponding to the age variable? Interpret this estimate.

Calculate Odds within Logistic Regression Model

A survey was conducted and asked: “Do you consider your health to be poor?” This was coded as 1 for “yes” and 0 for “no.” The effect of age (in years) and sex (1 = male, 0 = female) on the outcome was examined using logistic regression. The fitted logistic model is:

What are the odds of responding “yes” for a female subject who is 60 years old?

What is the estimated odds ratio corresponding to the age variable? Interpret this estimate.

Suppose that, based on the data from the survey, the odds of responding “yes” for males are 3.79% greater than the odds for females, adjusting for age. What is the value of ?

 

A sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of different sizes from the same population. True or false

True or false question

If the population distribution is normal, then the sampling distribution of the sample mean(With sample size n) is normal. True or false.

A sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of different sizes from the same population. True or false

For the same population a sampling distribution with greater size has greater standard distribution. True or false

The sampling distribution for a population depends on a fixed sample size . True or false

For sampling distributions of sample means for (normally distributed) population, a greater sample size leads to less concentration of the sample means around the population mean. True or false

 

How many factors and dependent variables does this analysis have? Which are they? Should the threshold for the test of each dependent variable be 5%? Explain your answer.

Effect of a drug over a disease

In order to evaluate the effect of a drug over a disease a pharmaceutical company wants to perform a study comparing the effect of the drug with placebo. The drug and the placebo will be delivered to both male and female patients to verify any possible different response of the two sexes.

The effect of the drug will be measured in 10th of reduction of any of the 5 possible symptoms of the disease.

How many factors and dependent variables does this analysis have? Which are they? Should the threshold for the test of each dependent variable be 5%? Explain your answer.

 

Using JASP and the sample data, determine the internal consistency of the Presentations make me pass out subscale.

QUESTION

  1. This attached data set provides responses of 2571 students on a recently developed Anxiety Questionnaire intended to be used to study the anxiety level of doctoral students.This analysis will use Cronbach’s alpha analysis as presented in class to compute the reliability (in terms of internal consistency) of a specific subscale. The subscales with their specific items of interest are:
    1. Statistics makes me cry: Q01, Q06, Q11, Q16, Q21
    2. Writing wrecks my nerves: Q02, Q07, Q12, Q17, Q22
    3. Presentations make me pass out: Q03, Q08, Q13, Q18, Q23
    4. Reading makes me want to run away: Q04, Q09, Q14, Q19
    5. APA format gives me hives: Q05, Q10, Q15, Q20
  2. All items are 5-point Likert scale (1 = strongly disagree, 5 = strongly agree).Using JASP and the sample data, determine the internal consistency of the Presentations make me pass out subscale. In your write-up, include at a minimum the following:
    1. The results in APA format.
    2. What the results indicate.
    3. What you would do to improve the internal consistency.
    4. What your thoughts are on the process.
    5. Have at least one citation (peer-reviewed and/or textbook) to support your discussion.
  3. Read the following learning materials:Which tests should I use? Statistical Analysis for JASP: A Guide for Students (pp. 176 – 179)

    1. Creswell & Creswell text:
      • Chapter 7: Research Questions and Hypotheses
      • Chapter 8: Quantitative Methods
  4. Supplemental Reading:
    1. Barbera, J., Naibert, N., Komperda, R., & Pentecost, T.C. (2021). Clarity on Cronbach’s Alpha use. Journal of chemical education, 98(2), p. 257-258.
    2. De Vito, R., Bellio, R., Trippa, L., & Parmigiani, G. (2019). Multi-study factor analysis. Biometrics, 75(1), p. 337-346.

Read the supplementary reading entitled “Judicial Sentencing Decisions in Taiwanese Economic rimes” Use your own words to describe the OLS Model.

Judicial Sentencing Decisions in Taiwanese Economic rimes

Read the supplementary reading entitled “Judicial Sentencing Decisions in Taiwanese Economic rimes” Use your own words (paraphrasing) to describe the OLS Model. The length of this assignment is about 250-300 words (one typed, double-spaced page). In you answers, you must specifically address the meaning of and finding about (1) R Square (Coefficient of Determination), (2) F-test, and (3) p-value of t-test in the OLS Model.