Create a two-way frequency table of Smoker and Sex and draw the bar diagram. Write the comment on it.

Department of Applied Statistics & Research Methods

Attempt all the questions (Follow the instructions)

Question 1

Create a two-way frequency table of Smoker and Sex and draw the bar diagram. Write the comment on it. (10 points)

Instructions

Download student survey data file from CANVAS

  • Step I. open Statkey click on two categorical variables
  • Step 2. click on upload data. Select sex and smoke from the data. Hit ok. You wits see  visualizations and two-way table.

Question 2
Create a live-number summary of SAT score along with its’ mean and cons tilts of Comment on the shape of the distribution of SAT score. Also write the summa °f re’s .e number summary. (10 points)

Instructions

  • Step I. Open Statkey click on one quantitative variable • click on upload data. Select SAT from the data. Hit ok. You can be able to create
  • Step 2 histogram and five number summary

 

Question 3

Create a five-number summary of GPA by gender. Create the side by side box plot of GPA score by gender. Comment on your results. (10 points)

Instruction

  • Step 1. Open Statkey click on one quantitative and one categorical variable
  • Step 2. Click on upload data. Select GPA and SEX from the data. Hit ok. You can be able to create five number summary and side by side box plot.

Question 4
Test the hypothesis, if student survey data provide the evidence of difference in GPA score of male and female? (10 points)

Instruction (Hypothesis Test)

  • Step 1. open Statkey click on test for difference between means (under Randomization
  • Step 2. Click on upload data. Select GPA and SEX from the data. Hit ok. You can able to find test statistics and p-value.

Question 5
Test the hypothesis, if student survey data provide the evidence of non-smoker’s GPA is higher than smoker’s GPA? (10 points)

Instruction

  • Step 1. open Statkey click on test for difference between means (under Randomization Hypothesis Test)
  • Step 2. Click on upload data. Select GPA and SMOKE from the data. Hit ok. You can be able to find test statistics and p-value.

Question 6

Write the summary of your finding from question 1 through 5 in one paragraph (don’t include any numbers) (1 0 points)

 

Explain how these statistics inform the conclusions in the article. Describe any other conclusions you can draw from these statistics that are not captured in the article.

Statistics Question

Overview

Biostatisticians employ various methods to collect client data. For example, various organizations collect a large amount of public health data that they can analyze and interpret to understand health outcomes and other important client information. A statistically significant clinical study includes statistical summaries of the underlying data and describes how these statistics inform the study’s conclusions. You should also be able to interpret these statistics to understand and evaluate the study and verify their conclusions.

Prompt

Choose an article to review from the list in the Module Two Short Paper Journal Articles document. Then evaluate how the data and summary statistics were used by the authors in your chosen article. Provide specific evidence from at least two recent scholarly sources to support your claims. (I choose this article

Prevalence and correlates of adolescents’ e-cigarette use frequency and dependence

https://www-sciencedirect-com.ezproxy.snhu.edu/science/article/pii/S0376871618302631?via%3Dihub)

Specifically, you must address the following rubric criteria:

  1. Data Collection Method: Identify the data collection method used in your chosen article. Include the following details in your response:
    1. Explain why you think the chosen method used is best suited for the purpose of the research.
  2. Statistical Summary: Interpret the data in the statistical summary table provided in the chosen article. Include the following details in your response:
    1. Select two variables and interpret their mean and standard deviation. What do they tell you about the data?
    2. Select two other variables and interpret their frequency and percentage. What do they tell you about the data?
  3. Significance: Describe the significance of the statistical summary. Include the following details in your response:
    1. Explain how these statistics inform the conclusions in the article.
    2. Describe any other conclusions you can draw from these statistics that are not captured in the article.

Analyze resulting series using sample autocorrelations, sample partial autocorrelations etc. to understand the correlation structure of the model.

Fitting several possible time series models to a given dataset

The goal of this project is to fit several possible time series models to a given dataset , select one that is preferable using model diagnostics procedures and then perform a forecasting task.

ˆ Remember that the analysis of the dataset starts with a time plot and, possibly, some scatterplots. Based on them, you may want to consider a deterministic trend model with a time series error process or a straightforward time series model with a possibly constant mean.

ˆ Your next stage should consist of looking at whether the (possibly detrended) data process seems stationary or not. If not, consider applying appropriate transformations to the data to make it stationary.

ˆ Next, you will need to analyze resulting series using sample autocorrelations, sample partial autocorrelations etc. to understand the correlation structure of the model. At this stage, you should be able to pick up a few possible ARIMA (p,d,q) model candidates.

ˆ For all resulting candidate models, perform appropriate diagnostics procedures. Remember that you always have to start with the residual analysis. If necessary, you may also consider overfitting as a second diagnostic tool.

ˆ The final step of your analysis will be forecasting. Please remember to include a plot of the original series with the addition of forecasted values and prediction limits. Include other plots from the model selection and diagnostic stage on an as-needed basis: enough to illustrate your thought flow but not an overwhelming flood of unnecessary plots.

You will be using the dataset called robot that is also included with the TSA package. It consists of 324 observations. The measurements are expressed as deviations from a target position. The robot is put through this planned set of exercises in the hope that its behavior is repeatable and, therefore, predictable. Forecast five values ahead and obtain 95% forecast limist.

Collect and compare quantitative data from two populations. Use statistical methods to determine if there a difference between the means of the two populations.

Math 109 Project Part 2

Description: For this semesterlong project, you will collect and compare quantitative data from two populations. You will use statistical methods to determine if there a difference between the means of the two populations.

You will present your results and conclusions for this second part of project in an essay format that is at least 800 words in length. Do NOT include definitions or state how to calculate results in your project. State your findings and conclusions in your own words.

Discussion of Topic (Total: 4pts)
Give a clear and concise overview of the topic chosen for the project: This should be basically the same as the Discussion of Topic from Part I. Introduce your topic and why it was interesting to you. You can copy and paste your introduction from part I of the project, just make sure that you hadn’t lost any points for it the first time around. This section should be at most 1 short paragraph.

Discussion of Part 1 (Total: 5pts)
Summarize methods from part 1: What were some of the methods you used in part one to compare shape, center, and spread? Comment briefly about your visual comparison of the samples.

Summarize conclusions from part 1: What was the answer to your main overarching question that you came to by the end of part one? Did you decide the populations were likely about the same, or did you decide that the two populations were different?

Describe how confident you were in your conclusions from part 1: Do you feel like your samples led you to accurate conclusions? What are some things you know about your samples and this project that you feel make your data good, and what are some things you know that make your data unreliable?

This section should be at most 1 paragraph.

Discussion of Hypothesis Statement (Total: 4pts)
State the null and alternative hypothesis in symbol form: Provide the symbol form of your null and alternative hypotheses for the twotailed test testing if there is a difference between the population means.

State the null and alternative hypothesis in sentence form: Interpret the symbol form of the hypotheses in the context of the project. Use specifics about your specific data and populations.

Discussion of Hypothesis Test Results (Total: 12pts)

Provide a clear analysis of the hypothesis test results: What does it mean to run a ‘two tailed’ test? What is the test statistic? What is the Pvalue?
State your conclusion and support the decision: Given your test results, do you reject or fail to reject the null hypothesis? How did you (mathematically) come to that conclusion?

Discuss what the conclusion means for the two populations: Interpret your conclusion in the context of the project. Use specifics about your populations. Based on your conclusion, what is the answer to the main overarching question? Are your population means the same or different?

Discuss if the results of the hypothesis test match the previous conclusions from part 1 of the project: Go back to your conclusions from part 1. Is it the same conclusion? It is okay if it is different. Just compare the two conclusions. If the conclusions are different, which is more reliable?

Discussion of Confidence Interval (Total: 12pts)
Discuss the confidence interval: What is the confidence interval? State the plausible range of differences between the two populations.

Interpret the confidence interval: What does your confidence interval represent in the context of the project? Are the population means the same or different? Is one likely larger than the other? How do you know?

Discuss if the values make sense: Do these values seem reasonable to be the differences between your population means? Does the interval seem too wide or narrow to be reasonable?

Comparison of Hypothesis Test and CI (Total: 6pts)
Compare the results of the Hypothesis Test and Confidence Interval: You had to
answer the main question: “Are the population means the same or different?” based on hypothesis test and also based on your confidence interval. Did you come to the same conclusion?

Discussion of Expectations (Total: 4pts)

Briefly discuss what the expectations were at the beginning of the project: When you chose your topic, what did you think the result would be?
Discuss how the analysis has supported or changed any initial thoughts: Did your results from part 1 support your original expectation? Did your results from part 2 support your original expectation? Now that you have completed the analysis, was your original expectation right or wrong?

Discussion of Possible Changes (Total: 4pts)
Clearly discuss any changes that could be made if the project was to be redone: Imagine that you are a real statistician. What would you change about your project? Think about the things you know about your project topic and data collection that make your analysis accurate or inaccurate. How would you improve on those methods, what mistakes would you want to correct?


If the response variable is (sorted by treatment and block), analyze the results. Is there a significant main effect at treatment group? Is there a significant block effect?

Stat 490 Group project

Due 04/26

In experimental design, designing an experiment is as important as analyzing the results. Through the semester we have learned 4 ways to design an experiment

  1. Design an experiment that is BIBD.
  2. Design an experiment that is an optimal design.

For these two types of designs you need to use R language to help you unless for simple cases. Even for the same input, you may end up different designs if you rerun the code. For example, the runs in the same treatment group may end up with different blocks. Or you may end up with different subset of the total runs that is still optimal.

  1. Design a blocked experiment with confounding structure.
  2. Design a fractional experiment with aliases structure.

For these two types of designs you can use either R or Excel to help you design the experiment. With the same confounding structure, same aliases structure, you will end up with the same design.

In this group project, you are asked to design 4 experiments and analyze the results. There is an excel file called “group_response.csv” on canvas, with two variables  and  in it. These will be the response variables for your analysis below.

Experiment 1

  1. Design a BIBD that has 6 treatment groups and 10 blocks. What is the number of runs in each treatment group? What is the number of runs in each block? What is the value of ?
  2. If the response variable is (sorted by treatment and block), analyze the results. Is there a significant main effect at treatment group? Is there a significant block effect?
  3. Analyze the residual to check whether there are any potential concerns about the validity of the assumptions.

Experiment 2

  1. Suppose that factor A has 3 levels, factor B and C each has 2 levels. Assuming you only have budget to have 30 runs, design a D-optimal experiment with these 3 factors such that a model with all first order term and second order term for A can be estimated, and there are 3 replicates in each treatment combination.
  2. If the response variable again is (sorted by treatment A, B, C), analyze the results. Is there a significant main effect at factor A, B or C?
  3. Have appropriate interaction plots and explain whether the interaction is significant or not. Develop the final regression model and have relevant contour plot from your final model.
  4. Analyze the residual to check whether there are any potential concerns about the validity of the assumptions.

Experiment 3

  1. Design a blocked experiment. Choose two three order or higher order interactions to be confounded with the blocks. To choose the confounded interaction terms, use the names from every member’s name. Pick one distinct letter (A-E) from each person’s name and form the interaction.
  2. If the response variable is (sorted by the order from the output from conf.design() function, that is the order of Blocks, E, D, C, B, A), analyze the results. Identify the significant factors and develop your model.
    1. Be careful that when you run Yates analysis the data should be in standard order, while the output from conf.design() function is not in standard order.
  3. Is there a significant block effect?
  4. Write down the complete confounding structure. Confirm the confounding structure using SS.
  5. Analyze the residual to check whether there are any potential concerns about the validity of the assumptions. Analyze dispersion effect if there is any.

Experiment 4

  1. Design a To choose the generators for the design, use the names from every member’s name. Pick one distinct letter (A-G) from each person’s name and form the generator.
  2. If the response variable again is , analyze the results. Identify the potential significant factors and develop your model.
    1. Use the same order as your data in experiment 3, that is you can just add two more factors to your experiment 3 data using the generators you have
    2. Be careful that conf.design() function is based on 0/1 coding, while defining relation is based on -1/1 coding.
  3. Write down the aliases structure for the main effects and two order interactions (ignore higher order interactions) and confirm the resolution of the design.
  4. Analyze the residual to check whether there are any potential concerns about the validity of the assumptions. Analyze dispersion effect if there is any.

What estimator and statistical test should you use to test if survival curves of two groups are significantly different from each other? How many dummy variables you would need to include in the regression analysis to represent a variable with k categories?

Easy stats questions

1.What is VIF test used for?

2.When should we use a Poisson regression?

3.When should we use mixed-effects model?

4.What estimator and statistical test should you use to test if survival curves of two groups are significantly different from each other?

5.Explain what an interaction is.

6.Explain why a test with 100% specificity might not be a perfect test (or might not even be a good test).

7.Give an example of interval-censored observation in survival analysis.

8.How many dummy variables you would need to include in the regression analysis to represent a variable with k categories?

2-3 sentences for each. do NOT use outside sources

What type of statistical procedure is this? Describe the findings, including interpretation of all values in the column for parameter estimates and whether or not they are significant.

5 questions

1. A study looked at whether pack years of lifetime smoking (smokepy) can predict the level of C-reactive protein (CRP), an inflammatory marker, after controlling for age, socioeconomic status (SES) and education. SES and education were numerical variables; CRP was normally distributed. The results of analyses are shown below.
Significance criterion is set at p<0.05.

R-squared = .43
Adjusted R-squared = .41

Parameter Estimates
Variable DF Estimates Standard
Error
t value p-value
Intercept 1 169.39 7.92 21.39 <.001
Smokepy 1 -0.38 0.05 -7.49 <.001
Age 1 0.03 0.04 0.77 .44
SES 1 1.41 0.67 2.1 .04
Education 1 -1.66 0.87 -1.91 .06

A) What type of statistical procedure is this?
B) Describe the findings, including interpretation of all values in the column for parameter estimates and whether or not they are significant.
C) After controlling for model complexity (i.e. number of independent variables), what is the proportion of variability in CRP explained by this model?

2. An investigator conducted a study to find the relationship between the number of decayed, missing, or filled teeth (DMFT) and sugar consumption. The investigator produced an estimate for the correlation coefficient and provided the following statement: “The correlation between DMFT and sugar consumption is 0.7. There is a strong correlation between DMFT and sugar consumption. Therefore, it is recommended that patients be advised to reduce sugar consumption to prevent tooth decay.”
State why you are or you are not confident about this investigator’s conclusion. In other words, explain if something is missing from this investigator’s analysis, or if all you need is provided.

3. You are conducting a study to analyze gender differences in neurocognitive impairment (NCI) within a sample of cocaine-dependent methadone-maintained patients. You found 3 demographic characteristics that produced significant effects on NCI. They are gender, race, and age.

A) What statistical analysis would you use to see simultaneously the contributions of socio-demographic variables (gender (male/female), race (White, Black, Latino, Asian), and age (in years) on self-reported NCI, a normal continuous outcome (higher scores indicate higher NCI)?
B) How many independent variables will there be in your model? Describe (1) what they are, (2) how you would create them, and (3) interpretation for each coefficient.

4. We ran an inference test to study if gender (0=female; 1= male) is associated with a diagnosis of Type 2 Diabetes Mellitus (t2dm: 0= absent; 1=present) on a group of patients, controlling for age. Results table are shown as below:
Analysis of Maximum Likelihood Estimates
Estimate Standard
Error
Test statistic p-value Exp(B)
Intercept -12.77 1.9759 41.8176 <.0001 ———-
Gender 0.41 0.124 10.9799 0.0009 1.5
Age 0.0948 0.0305 9.6883 0.0019 1.09

A) What type of model is this, and why is this type of analysis appropriate in this case?
B) Describe the finding: is gender associated with diagnosis of t2dm, why or why not (provide the test statistic and p-value)? Interpret the coefficient for gender and age.

5. In 1998, there was a major ice storm in Maine. Researchers wanted to know whether there was an association between generator location (inside or outside) and CO poisoning after an ice storm. Results from their case-control study are summarized in the table below (cases are observations that have experienced the CO poisoning, controls are observations that have not experienced the CO poisoning):

(A) What type of table is this?
(B) Name at least 2 tests you can perform to investigate the association between generator location and CO poisoning after an ice storm.
(C) Calculate the odds ratio and risk ratio based on this table. Which one is more appropriate for this type of study design?

Use the compound interest calculator to determine what you need to do to build a substantial nest egg.

Compound interest simulation

Use the compound interest calculator to determine what you need to do to build a substantial nest egg. Compound Interest Calculator | Investor.govDetermine how much your money can grow using the power of compound interest. For this discussion forum, pick a money number goal, set 3 timelines and what you will invest in to get there i.e. stocks, ETFs etc. Present in a PPT format.

State the conditions under which the Moving Average method can be recommended for trend analysis? How will you determine the period of the moving average? Calculate the 4-yearly moving average of the following data relating to sales in a departmental store.

MST 4104 – Time Series Analysis

Assignment 1 (Total Marks = 10)

Question 1 (a)

(b) Plot a time series graph for the above data.

(c) Plot the corresponding trend line using the method of moving average.

State the conditions under which the Moving Average method can be recommended for trend analysis? How will you determine the period of the moving average? Calculate the 4-yearly moving average of the following data relating to sales in a departmental store:

  • Year 2000 2001 2002 2003 2004 2005  2006  2007
  • Sales 960   976   974     996   1024  1040  1688  1128
  • Year 2008 2009 2010 2011 2012 2013 2014  2015
  • Sales 1144 1120  1140  1168  1196  1212  1200 1180

Question 2

Explain what is meant by seasonal fluctuations of a time series. A company manufactures bicycles. Given the quarterly production figures of the company for the last 4 years, explain
the procedure to compute seasonal indices by the ‘link relatives’ method. Use link- relatives method to compute seasonal indices from the recorded production figures given below:

YEAR  Q1   Q2    Q3   Q4

2016  420  414  502  365

2017  491   456  516   337

2018  463  365   478  310

2019  502  487   536  404

Conduct a Oneway ANOVA to determine if the three groups differ significantly on the GRE-Q measure. Report and summarize the results in correct APA style. What conclusions can you draw from these results? And do the three groups differ significantly in quantitative reasoning skills – if so, how? Is it justified to use GRE-Q as a covariate, Why?

CASE STUDY

Dr. Antonia Natalie, Professor of Psychology at Thoreau University, wants to determine if her optional weekly Zoom lectures are having an impact on students’ grades in a graduate-level Behavioral Statistics course.

During the semester the professor records attendance for each of her ten Zoom lectures. Based on student attendance the professor creates 3 equal student groups. Group 1 (n = 50) consists of students that did not attend any Zoom Lectures. Group 2 (n = 50) consists of students that attended one to five lectures. And Group 3 (n = 50) consists of students that attended six to ten Zoom lectures.

Concerned that students that attend more Zoom lectures may also have higher Quantitative Reasoning Skills, the professor also records GRE Quantitative scores (GRE-Q) for each student.

Your job is to conduct an ANCOVA to determine if the Zoom attendance groups have significantly different final grade averages (on a 100-point scale ) in the Behavioral Statistics course, after controlling for their GRE-Q scores.

Conduct a Oneway ANOVA to determine if the three groups differ significantly on the GRE-Q measure. Report and summarize the results in correct APA style. What conclusions can you draw from these results? And do the three groups differ significantly in quantitative reasoning skills – if so, how? Is it justified to use GRE-Q as a covariate, Why?

Using the attached SPSS Data file, conduct the appropriate preliminary EDA to determine if a) the data meets the assumptions for ANCOVA (i.e., test for skewness/kurtosis), b) test for homogeneity of regression, and c) test for homogeneity of variances.

Conduct an ANCOVA in which Statistics grades (StatGrade) is the DV, Zoom group (ZGroup) is the IV, and GRE_Q is the covariate (CV). If the main effect for ZGroup is significant, conduct Bonferroni’s post hoc mean comparison tests.

In your summary:

1: State the goals of the present study

2: Write the correct Null (H0) and Alternative (H1) hypotheses

3: Report the results for the Oneway ANOVA in which Zgroup is the IV and GRE-Q is the DV.

4: Report and summarize the results for the tests for normality, homogeneity of regression slopes, and homogeneity of variances.

5: Report and summarize the results for the ANCOVA and post hoc mean comparison tests.

6: Be sure to include in your summary the conclusions and recommendation that can be drawn from the study.

7: Be sure to include the appropriate tables and figures. See my sample summary.

8: Model your summary after my sample summary.