Comments on the accountant’s recommendation to eliminate product K as an al­ternative: Does the recommendation appear reasonable? What is your reaction to the recommendation? How would the optimal solution change if product K were eliminated?

Case study questions

Skillings Industrial Chemicals, Inc., operates a refinery in southwestern Ohio near the Ohio River. The company’s primary product is manufactured from a chemical process that requires the use of two raw materials—material A and material B. The production of 1 pound of the primary product requires the use of 1 pound of material A and 2 pounds of material B. The output of the chemical process is 1 pound of the primary product, 1 pound of liquid waste material, and 1 pound of solid waste by-product. The solid waste by­product is given to a local fertilizer plant as payment for picking it up and disposing of it. The liquid waste material has no market value, so the refinery has been dumping it di­rectly into the Ohio River. The company’s manufacturing process is shown schematically in Figure 1.

Figure 1: Manufacturing Process

Government pollution guidelines established by the Environmental Protection Agency will no longer permit disposal of the liquid waste directly into the river. The refinery’s research group has developed the following set of alternative uses for the liquid waste material

  1. Produce a secondary product K by adding 1 pound of raw material A to every pound of liquid waste.
  2. Produce a secondary product M by adding 1 pound of raw material B to every pound of liquid waste.
  3. Specially treat the liquid waste so that it meets pollution standards before dumping it into the river.

These three alternatives are depicted in Figure 2

Figure 2: Manufacturing Process

The company’s management knows that the secondary products will be low in quality and may not be very profitable. However, management also recognizes that the special treatment alternative will be a relatively expensive operation. The company’s problem is to determine how to satisfy the pollution regulations and still maintain the highest possible profit. How should the liquid waste material be handled? Should Skillings produce product K, produce product M, use the special treatment, or employ some combination of the three alternatives?

Last month 10,000 pounds of the company’s primary product were produced. The ac­counting department has prepared a cost report showing the breakdown of fixed and vari­able expenses that were incurred during the month.

 

Cost Analysis for 10,000 Pounds of Primary Product

Fixed costs

Administrative expenses         $12,000

Refinery overhead                      4,000

Variable costs

Raw material A                          15,000

Raw material B                          16,000

Direct labor                                 5,000

Total                                                      $52,000

 

In this cost analysis, the fixed-cost portion of the expenses is the same every month re­gardless of production level. Direct labor costs are expected to run $0.20 per pound for product K and $0.10 per pound for product M.

The company’s primary product sells for $5.70 per pound. Secondary products K and M sell for $0.85 and $0.65 per pound, respectively. The special treatment of the liquid waste will cost $0.25 per pound.

A company accountant believes that product K is too expensive to manufacture and can­not be sold at a price that recovers its material and labor cost. The accountant’s recom­mendation is to eliminate product K as an alternative.

For the upcoming production period, 5000 pounds of raw material A and 7000 pounds of raw material B will be available.

Develop an approach to the problem that will allow the company to determine how much primary product to produce, given the limitations on the amounts of the raw material avail­able. Include recommendations as to how the company should dispose of the liquid waste to satisfy the environmental protection guidelines. How many pounds of product K should be produced? How many pounds of product M should be produced? How many pounds of liquid waste should be specially treated and dumped into the river? Include a discussion and analysis of the following in your report:

  1. A cost analysis showing the profit contribution per pound for the primary product, product K, and product M
  2. The optimal production quantities and waste disposal plan, including the projected profit
  3. A discussion of the value of additional pounds of each raw material
  4. A discussion of the sensitivity analysis of the objective function coefficients
  5. Comments on the accountant’s recommendation to eliminate product K as an al­ternative: Does the recommendation appear reasonable? What is your reaction to the recommendation? How would the optimal solution change if product K were eliminated?

Provide a general introduction, background, and purpose of the paper, with your thesis resting on the idea of using statistical analysis to achieve better business decision and increase profitability. Also

Introduction/Background of Project Topic

Company to accomplish this task is Wal-Mart

Provide a general introduction, background, and purpose of the paper, with your thesis resting on the idea of using statistical analysis to achieve better business decision and increase profitability. Also, include a discussion of the real estate industry and the impacts that influence the health, viability, and success of the real estate marketplace. Use this assignment to set the stage for the rest of your paper. It should be 1.5 – 2.0 pages, double spaced, with a cover page, reference page, and citing, as appropriate. Understand that the cover page and reference page do not count against the expected page length for this assignment.

present each of the datasets you are analysing, identify research questions that can be addressed by your analysis and, ideally, present relevant existing literature and contrast your results against it.

ST3189: Assessed Coursework Project

You will undertake a project that will determine your final mark of the course by 30 per cent. The project will require you to analyse one or more real-world datasets of your choice. You can use the OpenML website, the UCI repository, or any other open access domain to select your dataset(s).

The project will consist of completing the following three tasks that can be implemented on one or more of your chosen real-world datasets.

  1. Unsupervised Learning: where the problem consists of identifying homogeneous population groups or dimension reduction techniques, which can then be used in the context of the empirical application
  2. Regression: where the problem consists of continuous target variable(s).
  3. Classification: where problem consists of categorical target variable(s).

You will be expected to present each of the datasets you are analysing, identify research questions that can be addressed by your analysis and, ideally, present relevant existing literature and contrast your results against it. You are expected to use multiple technique for the regression and classification tasks, and compare their results.

In all cases, your analysis should be presented in a paper like format, avoiding highly technical language where possible. It may be helpful to think of your audience as consisting of people with some quantitative background but no prior knowledge of Machine Learning. Your ability to present and interpret the results will be regarded as important as your ability to apply the taught techniques.

The results of the project should be presented in a 10-page article in A4 format. The 10-page limit includes figures and tables but excludes the title page, table of contents and references. Make sure to include your candidate number in the title page and the filename but not your name. If your candidate number has not been generated at the time of submission, this should be your student registration number (SRN). In addition to the 10-page article, which should be submitted via a word or pdf file, your R code should also be submitted with appropriate comments and description via an R script or an RMarkdown file. You may alternatively also use Python code; in which case you should submit a Jupyter notebook or a Spyder script file.

 

You may choose to conduct all the above three tasks on a single dataset or conduct some of the tasks on separate datasets; this is up to you. Do not submit your data, just provide the open access links in your code files, from which the data can be downloaded.

 

To sum up the following two files are required where your candidate number (or if not yet available, your student registration number) should be visible:

  1. A word or pdf file with your report that should not contain any code (10-page limit applies as mentioned above).
  2. Your code in a single file of appropriate format (R script, RMarkdown, Spyder script, Jupyter notebook).

 

 

Provide the STATA output and STATA code needed for generating the results reported in Table A column 2 for the pretreatment characteristic years of education.

Statistics Question

This assignment is based on the 2019 paper of Arielle Bernhardt, Erica Field, Rohini Pande, and Natalia Rigol (henceforth BFPR) “Household matters: Re-visiting the returns to capital among female microentrepreneurs” published in AER: Insights. In their paper, BFPR make use of different field experiments run in India, Sri Lanka, and Ghana. The data file for this assignment is based on the Indian experiment and drawn from the data used in BFPR. The paper, data file, and answer sheet are now available for download at CANVAS. Before answering the questions, it is strongly recommended that you read the paper thoroughly. Please answer the questions as clearly and concisely as possible, and in accordance to the instructions. At the end of each question, the instructions are written in italics between brackets. Parts of answers that deviate from the requested format, or are difficult to decipher will reduce the grade.

The Indian experiment
BFPR evaluate an experiment among female microentrepreneurs in low-income neighborhoods who all received individual loans that ranged from Rs 4.000 up to Rs 10.000 (which is equivalent to e45 up to e112,5 at the 2023 exchange rate). These female microentrepreneurs were organized in microfinance groups of 5. All these groups had to attend a group-specific repayment meeting, in which repayment conditions were discussed. For the experiment, these groups were randomly assigned to different repayment conditions. One set of microfinance groups received a standard contract in which loan repayment was organized through bi-weekly loan installments starting two weeks after the microentrepreneurs received their loan (control groups). The other set of microfinance groups received a contract with a two-month grace period before they had to start repaying their loan through bi-weekly loan installments (treatment groups). Apart from the grace period, all other contract features were identical. The hypothesis is that microentrepreneurs who receive a grace period in their contract face weaker liquidity constraints and make, as a result of that, better business decision leading up to higher business profits.
As we already mentioned above, these female microentrepreneurs were organized in microfinance groups of 5. The randomization occurred within batches of 20 of such groups. There were in total 9 different batches. This means that treatment assignment is random within each batch (that is, treatment is random conditional upon a full set of batch group indicators).

Question 1
BFPR collected pre-treatment information of the female microentrepreneurs including their age, marital status (marriage 0/1), religion (muslim 0/1), house- hold size, whether they experienced some unexpected household event (house- hold shock 0/1), whether there is water nearby (no drain 0/1), whether they had financial control over their resources (financial control 0/1), years of education, whether they are homeowners (homeowner 0/1), the number of enterprises in the household, and 6 loan amount indicators for having a loan of Rs 4.000, Rs 5.000, Rs 6.000, Rs 8.000, Rs 9.000, and Rs 10.000, respectively. We refer to these pretreatment characteristics as Xihg (where i, h and g stand for female microentrepreneur i in household h in batch group g). In Online Appendix Table A1, BFPR report means and standard deviations for the pretreatment characteristics female microentrepreneurs assigned to the control groups. BFPR make a distinction between households with multiple enterprise owners (column 1) and household where only the female microentrepreneur owns enterprises (column 3).

1. Replicate the results of Online Appendix Table A1 (only column 1) and report all the results with 3 decimals for a selected set of pretreatement characteristics in Table A in the answer sheet.1 As example, we have already provided the first entry for the age of female microentrepreneurs 1Note that results expressed with 2 or 4 decimals will give zero points. With results expressed with 3 decimals, we mean the exact four numbers as depicted in the STATA output. If, for example, the output reads 34.02878 we want 34.028 and not 34.029. assigned to the control group in household with multiple enterprise owners, which equals 34.028 with standard deviation 7.322. [Complete Table A column 1 in the answer sheet].

2. Provide the STATA output and STATA code needed for generating the results reported in Table A column 1, that is, the means and standard deviations for pre-treatment characteristics of the female microentrepreneurs in families with multiple enterprise owners. [Take a screenshot of the STATA output of column 2, including the STATA command line responsible for the output, and paste it in the answer sheet]. The randomized experiment requires that the female entrepreneurs in treated and control groups are, on average, identical. To test this, BFPR estimate for each pretreatment characteristic in Xihg the following OLS regression Xihg = α0 + α1Gg + δ1Bg + ihg , (1) where Gg is the treatment indicator (which is 1 for those groups who received that grace period contract, and 0 otherwise), and Bg represent a set of dummy indicators for the different batch groups, and ihg is the error term. The coefficient α1 measures the difference between pretreatment characteristics between female entrepreneurs in treated and control groups. The term δ1 is a set of coefficients attached to the different batch group indicators. In this regression it is key to control for the batch group indicators (and not batch group number) because treatment assignment is randomly assigned within each batch group. Recall that the female microentrepreneurs were organized (and treated) in microfinance groups of 5. BFPR have clustered their standard errors at the microentrepreneurial group level. To get the correct standard errors, add the command at the end of your regression command: cluster(group). In Online Appendix Table A1, BFPR report these estimates for α1 in columns 2 and 4. Again, they make a distinction between households with multiple enterprise
owners (column 2) and household where only the female microentrepreneur owns enterprises (column 4).

3. Replicate the results of Online Appendix Table A1 (only column 2) and report the results with 3 decimals for the same set of pretreatment characteristics in Table A in the answer sheet.2 As example, we have al- ready provided the second entry for the estimated α1 for the age of female 2Note that results expressed with 2 or 4 decimals will give zero points. With results expressed with 3 decimals, we mean the exact four numbers as depicted in the STATA output. If, for example, the output reads 34.02878 we want 34.028 and not 34.029. microentrepreneurs in household with multiple enterprise owners, which equals -1.515 with standard error 0.946. [Complete Table A column 2 in the answer sheet.]

4. Provide the STATA output and STATA code needed for generating the results reported in Table A column 2 for the pretreatment characteristic years of education. [Take a screenshot of the STATA output of column 2, including the STATA command line responsible for the output, and paste it in the answer sheet].

5. The estimate attached to the Rs. 10.000 loan indicator is statistically significant, which suggests that female entrepreneurs in the treatment groups more often loaned the highest amount than female entrepreneurs in the control groups. Is this a concern? [Circle the correct answer in the answer
sheet].

Question 2
BFPR estimate the effect of the grace period treatment on enterprise profits by OLS, estimating the following equation: Yihg = β0 + β1Gg + θ1Bg + γ1Xihg + μihg , (2) where Yihg are the weekly enterprise profits of female enterpreneur i in household h in batch group g. The variables Gg , Bg , Xihg are as defined earlier and μihg is the error term. The coefficient β1 is the average treatment effect of being assigned to the grace period contract. The coefficients θ1 and γ1 are attached the different batch group indicators and pretreatment characteristics. In Table 2, BFPR report the β1 estimates for female enterprise profits (column 1) and all household enterprise profits (column 2). In the notes of Table 2, BFGR indicate that they want to estimate their regressions on the largest sample possible. They therefore include all controls in Xihg (we list these characteristics in Question 1). In cases where a control variable (in Xihg ) is missing, they set its value to
zero and include a dummy for whether the variable is missing.

1. Replicate the main estimation results of Table 2 (columns 1 and 2) and report all the β1 estimates in 3 decimals in Table B in the answer sheet (together with the standard error). Do not forget to control for the dummies for whether the control variables are missing. As before, BFPR have clustered their standard errors at the microentrepreneurial group level. To get the correct standard errors, add the command at the end of your regression command: cluster(group). [Complete Table B in the answer sheet].

2. Provide the STATA output and STATA code needed for the average treatment effect estimates presented in columns 1. [Take a screenshot of the STATA regression results using the specification of column 1, including the STATA command line responsible for the output, and paste it in the answer sheet].

Question 3
In their experiment, BFGR measure pre-treatment characteristics in the baseline survey and profit measures in the follow-up survey. In between surveys, some of the enterprises under study got bankrupt. BFGR keep these enterprises in the analysis and code their profits as zero. Bankruptcy itself, however, is a relevant and interesting outcome to consider when estimating the effect of grace period treatment.

1. If you would focus on households where only the female microentrepreneur owns enterprises, what is the share of female enterprises that went bankrupt? [Report the bankruptcy share in 3 decimals in Table C in the answer sheet.]

2. What happens to the treatment effect estimate reported in Table 2 column 1 when you switch the left-hand side variables in (2) to a bankruptcy indicator and estimate the effect of the grace period treatment on enterprise bankruptcy by OLS?[Construct the bankruptcy indicator yourself based on female enterprise profits. Report the treatment effects estimate together with the standard error in 3 decimals in Table C in the answer sheet. Use the same right hand side specification as the one you used to replicate the estimation results of Table 2 (columns 1).]

3. Apart from liquidity constraints, female microentrepreneurs may experience difficulties running their enterprise when they have young children (under 6). Report the treatment effect estimates for bankruptcy (together with the standard error) with 3 decimals for female microentrepreneurs with and without young children (under 6). [Report the corresponding treatment effect estimates in Table C columns 2 and 3 in the answer sheet.]

4. According to these regression results, are female microentrepreneurs with young children more, less, or equally responsive to the grace period con- tract than female microentrepreneurs without young children. [Circle the correct answer in the answer sheet].

How many sets of numbers do you want to generate? How many numbers per set? Do you wish each number in a set to remain unique? Do you wish to sort the numbers that are generated? How do you wish to view your random numbers?

David Badal Math 40 Lab 2: Descriptive Statistics Revised 3/4/21

Math 40 Lab 2: Descriptive Statistics

Notes
You are to complete the following work in Excel. These instructions are for my version of Excel on a PC. Depending on your version of Excel or if you’re using a MAC, certain aspects may very slightly.

Once completed, you are required to upload your Excel file into the Lab 2 Assignment in Canvas.

If you don’t have Excel, you may download it for free through Office 365.

http://www.laspositascollege.edu/students/office365.php

You need to make sure that you have completed all work yourself, and that you’re not sharing files with other students.

Random Number Generator

1. Generating Random Numbers
This is the same random number generator that you used in Lab 1, but please make sure to generate a new set of numbers.

You’re going to generate 50 random integers between 1 and 100, inclusive.

Since random numbers generated in Excel will often change with each new calculation, you’re going to generate your random numbers outside of Excel.

Follow this link
https://www.randomizer.org to go to the Research Randomizer site.
At the site, enter the following information.

  • How many sets of numbers do you want to generate? 1
  • How many numbers per set? 50
  • Number range? 1 to 100
  • Do you wish each number in a set to remain unique? No
  • Do you wish to sort the numbers that are generated? Yes, least to greatest.
  • How do you wish to view your random numbers? Place markers off
  • Select Randomize Now!

You should now see your list of random numbers. Nice work!

2. Entering your random numbers into Excel.
On the results page, you have the option of either printing out your numbers or downloading them.

Selecting download, will create a csv file, which you can open.

Open the csv file and save it as an Excel file.

If for some reason you’re not able to save the document as an Excel workbook, simply open a new Excel file and type in your list of random numbers starting in cell A5.

3. Administrative
Delete anything that appears in rows 14.

Type your first name and last name in cells A1 and B1, respectively.

Enter your course name in cell A2 and your section number in cell B2.

Type Lab 2 in cell A3.

Leave row 4 blank.

Your data should begin in cell A5 and end in cell A54.

Create Two Transformed Lists
Youre going to create two new lists of numbers based upon your random numbers in column A.

For the first new list, youre going to add 4 to each of your random numbers.

o Starting in cell B5, type =, click on cell A5, type +4, and hit Enter. The number in cell B5 should be 4 more than the first random number in cell A5.

o Now lets copy this formula to generate the rest of the transformed list.

o Click on cell B5. Hover the cursor over the lower righthand corner of the cell until you see the black plus sign.

o Holding the mouse down, you can copy the formula by dragging down to cell B54. Once you release the mouse, the new numbers should be created. Please make sure that theyre 4 more than the original random numbers in column A.

For the second new list, youre going to multiply each of your random numbers by 4.
o Starting in cell C5, type =, click on cell A5, type *4, and hit Enter. The number in cell C5 should be 4 times the first random number in cell A5.

o Copy this formula to generate the rest of the transformed list.

o Please make sure that they’re 4 times the original random numbers in column A.

Descriptive Statistics
Youre going to use Excel functions to calculate descriptive statistics for each set of numbers in columns A, B, and C.

Since are data is from a random sample, you’ll want to make sure to use the sample versions of the functions. Please note that functions may vary slightly due to your version of Excel.

Starting in cell E5, put the function labels listed below in the left column.

Calculate the descriptive statistics of each set of numbers in columns A, B, and C.

o Column A: Starting in cell F5, use the formulas listed below in the right column.

o Column B: Starting in cell G5, use the formulas listed below referencing cells B5:B54.

Zscore of 5th smallest value: =(B9G5)/G6

o Column C: Starting in cell H5, use the formulas listed below referencing cells C5:C54.

Zscore of 5th smallest value: =(C9H5)/H6

Round the zscore to two decimals and all other results to one decimal. In Excel rounding is for display only. The actual values are still stored in the cells.
o Mean (average) =average(A5:A54)

o Standard Deviation =stdev.s(A5:A54)

o Variance =var.s(A5:A54)

o Minimum =min(A5:A54)

o 1st Quartile =quartile.exc(A5:A54,1)

o Median =median(A5:A54)

o 3rd Quartile =quartile.exc(A5:A54,3)

o Maximum =max(A5:A54)

o 40th percentile =percentile.exc(A5:A54,40/100)

o Zscore of 5th smallest value =(A9F5)/F6

What Did You Learn?

Type your answers to the following questions at the bottom of your spreadsheet.

1. Compare the descriptive statistics of the numbers in columns A and B.
a. What happened to the zscore of the fifthsmallest value?

b. What happened to the standard deviation and variance?

c. What happened to the other statistics?

2. Compare the descriptive statistics of the numbers in columns A and C.
a. What happened to the zscore of the fifthsmallest value?

b. What happened to the variance?

c. What happened to the other statistics?

3. Reflect on your experience with completing this lab assignment.


What type of Variable? Categorical or Numerical. If numerical, is it discrete or continuous? Define a survey with 10 questions about a presidential race.

Applied Statistics to Social Science

1) What type of Variable? Categorical or Numerical. If numerical, is it discrete or continuous?

  1. Types of diseases
  2. Ticket Price ($)
  3. Freshmen
  4. Marriage
  5. Music Beats
  6. San Francisco (City)
  7. Wage Rate ($/hour)
  8. Amount of Sugar in a cup of coffee (grams)
  9. Weight (pounds)
  10. Social Class (Low Income, Middle Income, High Income)

 

 

  • Define a survey with 10 questions about a presidential race.
  • Give an example of “poor wording” when designing a survey about a presidential race.
  • Give an example of non-response bias in a survey of a presidential race.

 

 

Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey.

  1. The population
  2. The population parameter of interest
  3. The sampling frame
  4. The sample
  5. The sampling method, including whether or not randomization was employed
  6. Any potential sources of bias you can detect any problems using generalizing to the population of interest

 

Researchers waited outside a bar they had randomly selected from a list of such establishments. They stopped every 10th person who came out of the bar and asked whether he or she thought drinking and driving was a serious problem.

 

 

6) Bob is a taxi driver who keeps a record of his meter readings. The result of the past twenty- meter readings are given in miles:

 

328, 333, 358, 400, 433, 520, 340, 455, 465, 300,

280, 512, 425, 328, 395, 433, 395, 448, 390, 260

Make a stem and leaf plot.

 

7)  Alexandra took a survey during 4th hour about what students enjoyed doing most. Here is what they responded:

S= Sleep, TV= Television, SP=Sports, F=Friends, V=Videos

 

S  TV  TV  F  F  V  F  F  SP  SP  SP  SP  SP  SP

S  S  F  SP  TV  F  TV  SP  F  S  F  SP  S  TV

 

  • Make a pie chart.
  • Make a bar chart.

 

  • Sports: Dog Sled Racing- How long does it take to finish the 1161- mile Iditarod Dog Sled Racing from Anchorage to Nome Alaska. The finish times to the nearest hour for 57 dogsled teams are given below:

 

261 271 236 244 279 296 284 299 288 288 247 256

338 360 341 333 261 266 287 296 313 311 307 307

299 303 277 283 304 305 288 290 288 289 287 299

332 330 309 328 307 328 285 291 295 298 306 315

310 318 318 320 333 321 323 324 327

 

Make a histogram using five classes.

Identify two quantitative variables in the data. What possible values do these variables take in the dataset? Identify two categorical variables in the data. List the possible categories these variables can take.

Statistics Question

In this mini-project, you will use data collected by the U.S. National Center for Health Statistics through the 2011-2012 National Health and Nutrition Examination Survey (NHANES). The NHANES has been conducted every two years since the early 1960s. The data collected from the survey include demographics, various body and health measurements, and information about various lifestyle choices. The NHANES is unique in that the health measurements are collected through physical examinations. Data on all other variables are self-reported. Before you begin the analysis, we will discuss the data and expectations for this project. Objectives for the mini-project You will understand: ¨ Graphical displays can be used to communicate information about a single variable or the relationship between two or more variables. ¨ Creating effective graphical displays is typically an iterative process. You will be able to: ¨ Create an appropriate display to visualize the distribution of a single variable (univariate graphs) or the relationship between two or more variables (multivariable graphs). ¨ Write a short report to communicate statistical results to a general audience. ¨ Apply principles of making effective graphical displays to improve an existing graph.

1) Locate the NHANES dataset (DCMP_STAT_5D_nhanes_alldata) containing the following variables (continued on the next page): Age: Age in years at screening of the study participant (Note: Subjects 80 years or older were recorded as 80)

  • HealthGen: Self-reported rating of the study participant’s health in general (Excellent, Vgood, Good, Fair, Poor)
  • PhysActive: Whether the study participant reported performing moderate or vigorous sports, fitness, or recreational activities (Yes, No)
  • AttendCollege: Whether the study participant attended or completed college at the time of the study (Yes, No)
  • Credit: iStock/Nattakorn Maneerat Introductory Statistics First Edition (2021) Student Pages Copyright © 2021, The Charles A. Dana Center at The University of Texas at Austin
  • BadPhysHlthDay: Whether the study participant self-reported having at least one bad physical health day in the past 30 days (Yes = at least 1 day; No = 0 days)
  • BadMentlHlthDay: Whether the study participant self-reported having at least one bad mental health day in the past 30 days (Yes = at least 1 day; No = 0 days)
  • BMI: Body mass index (weight/height! in kg/m2) SleepTrouble: Participant told a doctor or other health professional that they had trouble sleeping (Yes, No)
  • SleepHrsNight: Self-reported number of sleep hours the study participant usually got on weekdays or workdays
  • TotChol: Total HDL cholesterol in mmol/L
  • BPSysAve: Average of three systolic blood pressure readings in mm/Hg
  1. Part A: Identify two quantitative variables in the data. What possible values do these variables take in the dataset?
  2. Part B: Identify two categorical variables in the data. List the possible categories these variables can take.

2) In this project, you will create univariate, bivariate, and multivariate graphical displays for quantitative and categorical variables. Identify one or two DCMP Data Analysis Tools at https://utdanacenter.org/data-analysis-tools (or other tools) you can use to create these graphs. Include the name of the tool and the type of graph you can create in your answer.

3) One goal of this project is to continue developing your statistical writing skills. This includes writing your interpretations in a way that can be clearly understood by a general audience and presenting your results in a report suitable for an academic or professional setting.

  • Part A: Describe what is meant by “writing in a way that could be clearly understood by a general audience.”
  • Part B: Refer to the project rubric. Based on the rubric, what are qualities of a report that are suitable for an academic or professional setting?

4) Let’s begin by looking at the ages of the survey respondents.

(Note: Subjects 80 years or older were recorded as 80) HealthGen: Self-reported rating of the study participant’s health in general (Excellent, Vgood, Good, Fair, Poor)

  • Part A: Use technology to create a histogram of Age.
  • Part B: Describe the distribution. Include the shape, center, spread, and the presence of outliers in your description, using appropriate summary statistics as needed.
  • Part C: The distribution shows a small peak around 80 years old. (Note: Changing the binwidth of the histogram makes the peak more or less noticeable.) Briefly explain why there may be a peak around this value.

5) Next let’s examine how the respondents generally perceived their health. Create an appropriate graphical display of HealthGen. Then use the graph to describe two different observations about the respondents’ general perceptions of their health.

6) Now let’s examine whether there is an association between health perception and whether a person performs regular physical activity. Create a graphical display to visualize the distribution of HealthGen for each category of PhysActive. Does there appear to be a difference in general health perception between people who perform regular physical activity and those who do not? Write two observations from the graphical display to support your response.

7) A popular health and wellness website is writing an article that explores the following questions:

  • (1) Does having a habit of regular physical activity change with age?
  • (2) Does the association between age and health perception differ based on performing regular physical activity?

8) Introductory Statistics First Edition (2021) Student Pages Copyright © 2021, The Charles A. Dana Center at The University of Texas at Austin They would like to include a graphical display in the article to help readers visualize conclusions in the article about the associations between the variables. The authors propose the following graph, but the website editor is concerned it may be confusing to readers. She has asked for your help to improve the graph and write an interpretation of the graph that will be included in the article.

  • Part A: Describe two ways this graphical display may be confusing and/or makes it difficult to explore the questions of interest for the article.
  • Part B: Use technology to make an improved graphical display that can be effectively used to answer the two questions of interest. You may use multiple graphs, if needed. In statistics, we often need to manipulate the structure of the dataset in order to create the visualizations. Locate the reorganized dataset (DCMP_STAT_5D_nhanesYesNo) that includes only the variables for this question that aide in the creation of your visual display.
  • Part C: Use the graph(s) to answer the two questions of interest for the article. Write a short paragraph (three to six sentences) that includes your answers to the questions and observations from the graph(s) that support your response.

The response should be written in a way that is clearly understood by a general audience. ONLY NEED HELP/ANSWERS WITH 4-7

Calculate the probability that Boston Celtics will win the series. Construct a probability distribution for your net win (X) in the series. Calculate your expected net win (E(X)) and the standard deviation of X.

One Entry Level Statistics Task

Problem:
Suppose that Boston Celtics and Miami Heat (two American NBA teams) are scheduled to play a best of three (3) series. The winner of the series will be the first team that wins two of the three games. The probability that Boston Celtics wins a game in their home stadium (@TD Garden) is 0.63 and the probability that Miami Heat wins their home game (@MiamiDade Arena) is 0.61. Next, suppose that you place a bet on each game played where you win $100 if Boston Celtics win and you lose $103 if Boston Celtics lose the game.

In parts 14 below, assume that the outcomes of the games are independent of each other.

Part 1:
If the first game is played in Miami, the second game is played in Boston, and the third game (if it becomes necessary) is in Miami, then complete parts (i)(v) below.

(i) Calculate the probability that Boston Celtics will win the series.

(ii) Construct a probability distribution for your net win (X) in the series. Calculate your expected net win (E(X)) and the standard deviation of X.

(iii) Use Excel or R to create 2,500 random values for X. Let these random values be denoted by Y. Use these Y values to estimate your expected net win by using a 90% confidence interval. Does this confidence interval contain the E(X) in (ii)?

(iv) Construct a frequency distribution for Y. Next, use the Chisquare goodness of fit test to verify how closely the distribution of Y has estimated the distribution of X.

(v) Use your observations in parts (ii) and (iii) above to describe whether your betting strategy is favorable to you. Write a summary of your observations and analyses in the Word document.

Part 2:

Repeat part 1 above but assume that the first game is played in Boston, the second game is played in Miami, and the third game (if it becomes necessary) is in Boston.

Part 3:
Repeat part 1 above but now assume that the series is a best of five (5) series where the first team that win three games will win the series with games alternating between Boston and Miami, with the first game being played in Miami.

Part 4:
Repeat part 1 above but now assume both teams will play in the 2025 NBA Finals. The series is a best of seven (7) series where the first team that win four games will win the series. The team with homecourt advantage hosts games 2, 3, 5, and 6, while the opponent hosts games 1, 4, and 7. Let’s assume Boston Celtics has the homecourt advantage against Miami Heat.

Hint: You can use R or Python(Jupyter Notebook) to solve Part 4.

What type of ANOVA did the authors use? What’s the evidence that there is an effect of time on MADRS? What’s the evidence that there are differences of MADRS across treatment conditions?

Statistics question

Read and answer

Lam, Raymond W., et al. “Efficacy of bright light treatment, fluoxetine, and the combination in patients with nonseasonal major depressive disorder: a randomized clinical trial.” JAMA psychiatry 73.1 (2016): 56-63.

a. What type of ANOVA did the authors use?

b. What’s the evidence that there is an effect of time on MADRS?

c. What’s the evidence that there are differences of MADRS across treatment conditions?

d. Which treatment appears most effective?

Find the standard deviation; make a Cumulative frequency table; draw a histogram, with frequency polygon drawn; (d) a frequency distribution graph (ogive); describe the general shape of the frequency distribution polygon; describe the general shape of the ogive.

Statistics question

Part 1 (15 Points)

For each of the following 5 problems: (a) find the standard deviation; (b) make a Cumulative frequency table; (c) draw a histogram, with frequency polygon drawn; (d) a frequency distribution graph (ogive); (e) describe the general shape of the frequency distribution polygon; (f) describe the general shape of the ogive;

 

Part 2 (5 Points)

For Part 2, you need to used the data you generated for last week’s homework. Remember, we called the date you generated as SGDS.

For SGDS, (a) Make a Cumulative Frequency Table; (b) draw a histogram, with frequency polygon drawn; (c) a frequency distribution graph (ogive); (d) describe the general shape of the frequency distribution polygon; (e) describe the general shape of the ogive;

Chapter 3.pdf