Compute the mean, median, mode, range, sample standard deviation, and sample estimate of the population standard deviation for the data in Table 1 (below). Use your calculator and scrap paper for computations as necessary. Write your answers in the spaces provided.

ASSIGNMENT

  1. Compute the mean, median, mode, range, sample standard deviation, and sample estimate of the population standard deviation for the data in Table 1 (below). Use your calculator and scrap paper for computations as necessary. Write your answers in the spaces provided.

PDF

 

 
   

 

 
   

 

 
   

 

 
   

 

 
   

 

 
   

 

 
   

 

 
   

 

 

Use a simulation to obtain a sampling distribution along with its mean, standard deviation, and error. Interpret the mean and error for a sampling distribution.

Module 15 Homework 1 – Distribution of Sample Proportions

Progress Check

Use this activity to assess whether you and your peers can:

  • Use a simulation to obtain a sampling distribution along with its mean, standard deviation, and error.
  • Interpret the mean and error for a sampling distribution.

Prompt

The National Health Survey uses household interviews to describe the health-related habits of U.S. adults. From these interviews they estimate population parameters associated with behaviors such as alcohol consumption, cigarette smoking, and hours of sleep for all U.S. adults.

In the 2005-2007 report, they estimated that 30% of all current smokers started smoking before the age of 16. Imagine that we want to verify this estimate. So we randomly select a sample of 100 smokers and calculate the proportion who started smoking before the age of 16. How much error do we expect in the sample proportions if the 30% is correct for the population overall? Use the applet and a give an error based on 2 standard deviations.

Below is the applet we used in the simulations on the previous page. Here are the directions.

  • Use the first dropdown list to select the appropriate population proportion p.
  • Use the second dropdown list to select the appropriate sample size n.
  • Check the Show Standard Deviation Bars box.
  • Click the Run Simulation (5000 Samples) button.

Module 15 Discussion Board

Use the Module 15 discussion board (opens in a new tab) to ask questions or provide feedback about the problems in any Module 15 activity – including this peer-reviewed assignment.

Review Feedback

  • Peer feedback should be available before final drafts are due.
  • Instructor feedback is only available after an assignment is graded.
  • Use these directions (opens in a new tab) to learn how to review feedback.

Click the “Next” or > button to continue.

Content by the Open Learning InitiativeLinks to an external site. and licensed under CC BYLinks to an external site..

 

Galton was the first to describe and explain the phenomenon of “regression to- wards the mean”. Being concerned about the height of the English aristocracy, he interpreted results as “regression to mediocrity”. Regress the height of adult children against the height of parents. Report your results in a table and interpret the estimated coe

Stata about the Econometrics

Part 1: Analysis with Galton’s original data set

Galton’s work on children and parents’ height was published in: Galton, F. (1886): “Regression towards mediocrity in hereditary stature”, Journal of the Anthropological Institute, 15: 246-63. In this first part of the project you are asked to reconstruct the original data from this original article and replicate his analysis.

  • Question 1. Find Galton’s original article (on jstor.org or LEARN). On Table I of his article, the data used are summarized. You need to create a STATA data set that contains the 928 observations that Galton collected. It is recommended that you first type the data in an excel file and then have STATA read that file. Some versions of the Galton data set are  available online. You are advised NOT to use them. It is part of this project that you show that you understand how to make a data set from such a table. There are important conceptual issues that you will miss if you borrow the data from somewhere else.

For those observations reported in Table I of Galton’s article as “below” or “above” the minimum and maximum height values, you need to assume some particular values. Please state these explicitly in a table and provide a justification with one sentence. Define “tall parents” and “short parents” according to your data. Then divide your sample into these two groups and report relevant statistics for the adult children and for parents in each group. Report this information in a table and comment it.

  • Question 2. Galton was the first to describe and explain the phenomenon of “regression to- wards the mean”. Being concerned about the height of the English aristocracy, he interpreted results as “regression to mediocrity” (hence the name “regression”). Regress the height of adult children against the height of parents. Report your results in a table and interpret the estimated coe

 

What is the population of people? Where and how are you going to collect your sample? Does your sample accurately represent your population? Why or why not? Collect the sample and record the data. Use a single unit for height.

Confidence interval

To study about the correlation between height and shoe size, you need to collect a sample of nine (9) people using a Systematic Sampling method.

  1. What is the population of people? Where and how are you going to collect your sample? Does your sample accurately represent your population? Why or why not?
  2. Collect the sample and record the data. Use a single unit for height. Do not use a mixed unit like feet and inches. Person 1 Person 2 Person 3 Person 4 Person 5 Person 6 Person 7 Person 8 Person 9 Height Shoe Size

(CLO 1) Construct a confidence interval to estimate the mean height and the mean shoe size: you must complete the following questions by first choosing a Confidence Level. You may choose from the familiar 90%, 95%, or 99% level of confidence. Denote this by choosing α = .

  • Find the sample mean and sample standard deviation of the height. Denote them as ̄x and sx respectively.
  • Find the sample mean and sample standard deviation of the shoe sizes. Denote them as ̄y and sy respectively.

Construct and interpret a confidence interval to estimate the mean height of the population. You must first write the formula for the confidence interval and then substitute your appropriate numbers.

  1. Construct and interpret a confidence interval to estimate the mean shoe size of the population. You must first write the formula for the confidence interval and then substitute your appropriate numbers. West Coast University Signature Assignment – Page 2 of 2
  2. (CLO 2) Test a claim that the mean height of your population is different from 64 inches. Use the appropriate significance level α you fixed earlier.
  3. State the initial and alternative hypothesis.
  4. Find the test statistic and the P-value. You must first write the formula for the test statistic and then substitute your appropriate numbers.
  5. Draw a conclusion in context of the situation. Your conclusion should include both the formal language as well as an informal explanation.
  6. (CLO 3) Find a correlation between height and shoe size.
  7. Create a scatterplot of the data. Height is x-axis and Shoe size is y-axis. Attach your scatterplot to the end of this document.
  8. Find the linear correlation coefficient. What does this tell you about your data?
  9. Write the equation of the regression line and use it to predict the shoe size of a person that is 68 inches tall.
  10. Write a paragraph or two about what you have learned from this process. When you read, see, or hear a statistic in the future, what skills will you apply to know whether you can trust the result?

 

Are housing prices in your regional market lower than the national market average? Is the square footage for homes in your region different than the average square footage for homes in the national market?

National market

You have been hired by your regional real estate company to determine if your region’s housing prices and housing square footage are significantly different from those of the national market. The regional sales director has three questions that they want to see addressed in the report:

  1. Are housing prices in your regional market lower than the national market average?
  2. Is the square footage for homes in your region different than the average square footage for homes in the national market?
  3. For your region, what is the range of values for the 95% confidence interval of square footage for homes in your market?

You are given a real estate data set that has houses listed for every county in the United States. In addition, you have been given national statistics and graphs that show the national averages for housing prices and square footage. Your job is to analyze the data, complete the statistical analyses, and provide a report to the regional sales director. You will do so by completing the Project Two Template located in the What to Submit area below.

Directions

Introduction

  1. Region: Start by picking one region from the following list of regions:
    West South Central, West North Central, East South Central, East North Central, Mid Atlantic
  2. Purpose: What was the purpose of your analysis, and what is your approach?
    1. Define a random sample and two hypotheses (means) to analyze.
  3. Sample: Define your sample. Take a random sample of 500 observations for your region.
    1. Describe what is included in your sample (i.e., states, region, years or months).
  4. Questions and type of test: For your selected sample, define two hypothesis questions and the appropriate type of test hypothesis for each. Address the following for each hypothesis:
    1. Describe the population parameter for the variable you are analyzing.
    2. Describe your hypothesis in your own words.
    3. Describe the inference test you will use.
      1. Identify the test statistic.
  5. Level of confidence: Discuss how you will use estimation and confidence intervals to help you solve the problem.

 

In May in Belgium, the high temperature exceeded 700 F on 20 days. It rained on 8 days when the high temperature was over 700 F. If we choose any day in May, the probability that in rained on that day is independent of the probability was over 700 F. Therefore, how many days did it rain in May?

Classical probability theory

Question 1) solve the following questions using classical probability theory. Show your math or describe your answer clearly.

  1. Given P(A) = 0.6 ; P(B) = 0.5; and P(A ⋂ B) = 0.2 , then P(A|B) ?
  2. If P(A) = 0.3 and P(B|A) = 0.3, are events A and B independent? If not why?
  3. In May in Belgium, the high temperature exceeded 700 F on 20 days. It rained on 8 days when the high temperature was over 700 F. If we choose any day in May, the probability that in rained on that day is independent of the probability was over 700 F. Therefore, how many days did it rain in May?
  4. Hex Yahtzee is a game in which each person rolls six dice at once. The dice are ordinary 6- sided cubes, with a different number on each face. If exactly five of the six dice come up the same, the roll is called five-of-a-kind, and score very well. What is the probability of rolling five-of-a-kind on your first roll in Hex Yahtzee?

Question 2) You work at company and are searching for the Higgs boson. You have one trillion particle traces from the Company’s Large Hadron Collider. You believe that ten thousand of the particles are Higgs bosons. Each particle trace is fed into the expert software. If a particle is a Higgs boson, the software will identify it correctly with a 0.9 probability. If it is not a Higgs boson, the software will reject it with a 0.99 probability.

If a particle is identified by HiggsView as a Higgs boson, what is the probability that it actually is a Higgs boson?

 

 

Suppose that n = 200 voters are randomly sampled from the large number of voters in this area, what is the probability that at most 60 voters favor the bond issue?

Voting Exercise

A pollster believes that 25% of the voters in a certain area favor a bond issue.

  • a) Suppose that n = 200 voters are randomly sampled from the large number of voters in this area, what is the probability that at most 60 voters favor the bond issue?
  • b) How many voters should be sampled such that the sampled fraction of voters favoring the bond issue will between 0.15 and 0.35 with probability at least 0.96?

Find the sample size n by using (i) the Chebyshev inequality and (ii) the normal approximation.

 

Choose the research variables needed to answer your research questions. Formulate hypotheses for your study. Collect data from secondary sources. Test the hypotheses using one or more of the techniques you have learned in this class. Describe your research findings and provide your conclusion.

Final Paper

Prompt: Identify a research topic that you are interested in. Formulate your research question. Then complete the following:

  • Choose the research variables needed to answer your research questions.
  • Formulate hypotheses for your study.
  • Collect data from secondary sources (e.g., historical stock price, weather, census).
  • Test the hypotheses using one or more of the techniques you have learned in this class.
  • Describe your research findings and provide your conclusion.

Requirements: Prepare the Final Paper using the following guidelines:

  • 3,500 – 4,000 words
  • APA-compliant formatting, including title and reference pages
  • Minimum of three scholarly references
  • Remember to provide a link to your data source.

You have deposited $500 in the bank and make no other deposits or withdrawals. How much is this deposit worth at the end of 20 years?

Banking transaction

Problem 3. In a certain country it is required that the force of interest for any banking transaction follow the function

4 (St t > 0 50 t’ where t is the time in years since the transaction began.

(1) (5 points) You have deposited $500 in the bank and make no other deposits or withdrawals. How much is this deposit worth at the end of 20 years? (2) (5 points) You have loaned $1,000 for 35 years. You are to repay the loan by making a payment of $400 at time t = 12 and a final payment of X at time t = 35. Compute X.

 

Compare the coefficients of public variable in Model A and Model B. Explain carefully why the results are different, relating your discussion to sector wage discrimination.

Before estimating the regression equation, conduct an overall preliminary analysis of the relationship between workers’ wages and

  • sector,
  • gender,
  • educational attainment,
  • age and
  • marital status.

Use tables and/or appropriate graphs for the categorical variables (male, public, degree, married) and the numerical variable (age).

Interpret your findings by comparing and contrasting the earnings of the counterparts based on each of these dummy variables and also explain the kind of relationship you observe between workers’ earnings and age?  (5 marks)

  • Use a simple linear regression to estimate the relationship between workers’ earnings and the variable public (Model A). You may use the Data Analysis Tool Pack. Based on the Excel regression output:
  • Write down the estimated regression equation,
  • Interpret the slope coefficient,
  • Carry out any relevant two-tailed hypothesis test of the slope coefficient using the critical value approach, at the 5% significance level, showing the step by step workings/diagram in your report.
  • Interpret your hypothesis test results.

(4 marks)

 

  • Use a multiple regression model to explore the relationship of workers’ earnings with  variables related to sector, gender, educational attainment, age and marital status (Model B). You may use Data Analysis Tool Pack for this. Based on the Excel regression output:

 

  • Write down the estimated regression equation,
  • Interpret the slope coefficients,
  • Carry out any relevant two-tailed hypothesis tests for each individual slope coefficient using the p-value approach, at the 5% significance level.
  • Carry out an overall significance test using the p-value approach.
  • Carefully interpret your hypothesis test results.
  • Are your regression findings with regards to public-private wage gap broadly consistent with those reported in the study of Mahuteau et al. (2017)?

(8 marks)

 

  • Interpret the R-squared in Model A and adjusted R-squared in Model B. Which one is a better model? Explain why, relating your answer to the interpretations.

(2 marks)

 

  • Compare the coefficients of public variable in Model A and Model B. Explain carefully why the results are different, relating your discussion to sector wage discrimination.

(4 marks)

 

  • Predict the earnings of a 40-year-old male, university qualified and married public worker. Next, predict the earnings of a female worker with the same characteristics.

(2 marks)

 

  • Another conclusion from Mahuteau et al. (2017) is that the wage premium (comparatively higher wages) for the workers in the public sector is slightly higher for females than males. Conduct appropriate regression analyses to examine whether your findings based on 2019 data are broadly consistent with those reported in the study.

(4 marks)

 

  • If you could request additional data to study the factors that influence workers’ earnings, what extra variables would you request? Discuss two such variables, explaining why you choose them and how each of your proposed variables could be measured in the regression model. [You could draw evidence from journal articles, newspapers, etc]

(3 marks)