Basketball survey

  • In Assignment #1, the basketball survey, why should you be dubious of the statistical results? Were you consistent with others in gathering the data, and what could be done to improve this sort of observational data gathering?

 

  • Regarding Assignment #2, Individual Sports Analysis, what where some limitations encountered and what other analytical ideas occurred to you while working on it?

 

  • If the probability of an athlete scoring a point is 23%, about how many attempts (sample size) should be made to validate this probability? (see “More Information” below)
    1. 36
    2. 42
    3. 80
    4. 63
  • LeBron James has a 75% probability of making a free throw. How many free throws would LeBron attempt to validate this success rate?
    1. 8
    2. 2
    3. 80
    4. 63

More information:

Determining the sample size of a coin flip experiment is dependent on an acceptable standard error.  An acceptable standard error is subjective and a function of the amount of risk associated with the decision.  This can lead to lengthy calculations and discussions—depending on an individual’s acceptable standard error (it can range from 100 to 40,000 for a coin flip).

If flips of a coin are the deciding factor on who buys a cup of coffee, no reason to calculate the appropriate number of flips.  If the decision is to select an athlete based on a scoring probability, then there is more to consider.

A quick rule of thumb calculation is two divided by the square root of the sample size.  The spreadsheet formula for a sample of 25 is:

=2/Sqrt(25)

This equals .40 or 40%.  If the probability of an athletes scoring is 30%, then this a sample of 25 attempts is too small.  A sample of 50 would be better 2/sqrt(50)=.28 or 28%.  As long as the rule of thumb calculated probability is less the probability in question, the sample size is acceptable for a quick validation of sample size.

 

  • Correlating a player’s points scored to games won provides insight to their contribution to winning games (presumably, the higher the correlation the more effective the player). If a decision requires to recruit one of two players, either player A or player B, and the players correlation between points per game and wins results in:

Player A                   Player B

Correlation (r)            .67                            .63

R-Squared (R2)           .41                            .56

The best decision is to select:

  1. Player B because although the correlation is lower, the higher R-Squared suggest a tighter fit and a closer relationship between the player and wins
  2. Player A because of the higher correlation and the lower R-Squared
  3. It doesn’t matter, because the skill level of the players is about the same.
  • Consider the following scatterplot of results of the Gator Basketball game scores as of February 19, 2002 (with Opponent’s scores as x and Florida’s scores as y). What is the best description between Florida’s scores and their opponent’s scores?
    1. There is no real relationship between x and y in this scatterplot.
    2. There is a strong, positive linear relationship between x and y.
    3. There is a strong, negative linear relationship between x and y.
    4. There is a strong, curved relationship between x and y.

More information:

A correlation coefficient (r or Pearson’s r) is a number between -1 and 1 and the best fit line (trend line) typically between two sets of data.  A scatter plot displays this information in a graph using (x,y) coordinates.

The R-squared (R2), a number between 0 and 1, describes how closely fit the data points are to the correlation coefficient.  Consider it the percentage of data points that line up.

In judging the sets of graphs below just by the r (correlation coefficient), preference may be given to graph “b” with a higher r.  When the R2 is considered, graph “a” would be preferred because of the tighter relationship between of the correlated numbers.

  1. b.

r = .95    R2 = .94                                                               r = .98    R2 = .63

 

  • A manufacturer of balloons claims that p, the proportion of its balloons that burst when inflated to a diameter of up to 12 inches, is no more than 0.05. Customers complain that balloons are bursting more frequently.

If the customers want to conduct an experiment to test the manufacturer’s claim, which of the following hypotheses would be appropriate?

  1. H0: p ¹05, Ha:  p = 0.05
  2. H0: p = 0.05, Ha:  p > 0.05
  3. H0: p = 0.05, Ha:  p ¹05
  4. H0: p = 0.05, Ha:  p < 0.05
  5. H0: p < 0.05, Ha:  p = 0.05
  • The manufacture assumes that 5% will fail, this means:
    1. 1 standard deviation will not fail
    2. 2 standard deviations will not fail
    3. .5 standard deviation will not fail
    4. 3 standard deviations will not fail

More information:

A statistical hypothesis is an assumption about a population parameter (a measurable characteristic of a population such as a mean or a standard deviation).

Hypothesis testing is the use of statistics to determine the probability a given hypothesis is true. The statement assumed to be true is called the null hypothesis (notation H0) and the contradictory statement is called the alternate hypothesis (notation Ha or H1).

In this case, the manufacture claims that 5 percent of the balloons will burst when inflated up to a diameter up to 12 inches.  This is the null hypothesis (H0).

Customers have an alternative hypothesis (Ha) stating that the probability of balloons burst inflated to a 12 inch diameter is more than 5 percent.

Hypothesis testing. Null vs alternative https://www.youtube.com/watch?v=ZzeXCKd5a18