Hypothesis Testing

Assignment 4
Michael Lewis
November 14, 2017

Part 1

Introduction: The objective of this assignment is to learn how to use hypothesis testing and apply it to real world examples. Hypothesis testing is a statistical method to make larger conclusions about a population from a sample. It becomes most useful when it is unrealistic to retrieve data from an entire population, like from residents of San Francisco or farms in Iowa. Instead, by using hypothesis testing, you can make generalizations about the population from a more doable, sample population. We break it down into six steps:
1.       State the null hypothesis
2.       State the alternative hypothesis
3.       Choose a statistical test
4.       Choose the level of significance
5.       Calculate the statistic
6.       Make a decision about the null and alternative hypotheses

Each step is further broken down to the following:
1.       There is no significant difference between the observed mean and the sample mean
2.       There is a significant difference between the observed mean and the sample mean
3.       Choose between a z-test and t-test based on the number of observations.
4.       Common choices include, 95%, 97.5% & 99%.
5.       Calculate using the respective test
6.       Use your calculations to determine whether you reject or fail to reject the null hypothesis

It is important to be mindful of which test to select in your hypothesis testing. While their formulas are near identical, the charts used with each are extremely different and can make or break your test. Once you’ve selected your test, the best way to choose your level of significance is to do some simple literature review. What have others done in the past? This piece of research is critical in your test.

Assignment: The assignment had multiple questions to get the reader involved and incorporating different tests and scenarios. First, we were given a chart that supplied us with the Interval Type, Confidence Levels and numbers of observations for a set of data points. With this information, we were left to find the Significance level, determine what test should be used, and the corresponding value based on the information. Figure 1 below is this chart completed.


     Figure 1: Completed chart of data points and respective values.


Secondly, we were given another scenario in which we were to do our own hypothesis testing based on some values given to us. Looking at crop yields in a district in Kenya, we were given the following estimates on yields of groundnuts, cassava and beans.
-          Groundnuts = 0.55 metric tons
-          Cassava = 3.8 metric tons
-          Beans = 0.28 metric tons
A sample was taken of 23 farmers which gave us the following results


Mean
Std. Dev.
Groundnuts
0.51
0.3
Cassava
3.4
0.74
Beans
0.33
0.13

We then did a hypothesis test for each of the crops.

Groundnuts
1.       Null Hypothesis: There is no significant difference between the estimated yield of groundnuts and the sample yield of groundnuts.
2.       Alternative Hypothesis: There is a significant difference between the estimated yield of ground nuts and the sample yield of ground nuts
3.   Statistical Test: t-test was used due to n < 30
4.   Choose a: a was given as 95% or 0.05. It is a two tailed test, so when split it is .025.
5.     Calculate the test. t = (0.51 - 0.55) / (0.3 / sqrt(23)) t = -0.6394
6.     With a t value of -0.6394, we fail to reject the null hypothesis as it is within the range of -2.074 – 2.074.

Additionally, the probability was found to be 0.246.

Cassava
1.       Null Hypothesis: There is no significant difference between the estimated yield of cassava and the sample yield of cassava.
2.       Alternative Hypothesis: There is a significant difference between the estimated yield of cassava and the sample yield of cassava.
3    Statistical Test: t-test was used due to n < 30
4.   Choose a: a was given as 95% or 0.05. It is a two tailed test, so when split it is .025.
5.     Calculate the test. t = (0.34 - 0.38) / (0.74 / sqrt(23)) t = -2.5923
6.     With a t value of -2.5923, we reject the null hypothesis as it is outside the range of -2.074 – 2.074.

Additionally, the probability was found to be 0.00856.

Beans
1.       Null Hypothesis: There is no significant difference between the estimated yield of beans and the sample yield of beans.
2.       Alternative Hypothesis: There is a significant difference between the estimated yield of beans and the sample yield of beans
3.   Statistical Test: t-test was used due to n < 30
            4.    Choose a: a was given as 95% or 0.05. It is a two tailed test, so when split it is .025.
5.     Calculate the test. t = (0.33 - 0.28) / (0.13 / sqrt(23)) t = 1.8446
6.     With a t value of 1.8446, we fail to reject the null hypothesis as it is within the range of -2.074 – 2.074.

Additionally, the probability was found to be 0.95652.

It is important to note the differences and similarities in the data sets. For example, while the estimates for groundnuts and cassava were both higher than the actual sample, the conclusion for both ended differently with one rejecting the null hypothesis and the other failing to reject it. Additionally, the groundnuts and beans both fell inside the designated range, explaining why they failed to reject the null hypothesis. Cassava was the only crop to reject the null hypothesis and fall outside of the range.

A third scenario was proposed in which a stream is believed to be more polluted than the allowable limit. It gives us our data needed including: n=17, m=6.8, mh=4.4 and a standard deviation of 4.2.

1.       Null Hypothesis: There is no significant difference between the observed pollutant level and the sample pollutant level.
2.       Alternative Hypothesis: There is a significant difference between the observed pollutant level and the sample pollutant level.
3.   Statistical Test: t-test was used due to n < 30
4.   Choose a: a was given as 95% or 0.05.
5.       Calculate the test. t = (6.4 - 4.4) / (4.2 / sqrt(17)) t = 2.3561
6.       With a t value of 2.3561, we reject the null hypothesis as it is over the Critical Value of 1.746.

The probability was found to be 0.0017.

Part 2

We were asked to compare the average values of homes by block group of the city of Eau Claire and Eau Claire County. By using the data provided by ArcMap, a comparison could be made. The following is the Hypothesis Test for this scenario:

1.       Null Hypothesis: There is no significant difference between the observed average home values in Eau Claire County and the sample average home values in the City of Eau Claire.
2.       Alternative Hypothesis: There is a significant difference between the observed average home values in Eau Claire County and the sample average home values in the City of Eau Claire.
3.   Statistical Test: z-test was used due to n > 30 
            4.   Choose a: A level of .05 was selected.
5.       Calculate the test. t = (151876.5094 - 169438.1304) / (49706.91892 / sqrt(69)) t = -2.9348
6.       With a t value of -2.9348, we reject the null hypothesis as it fell outside of the range of -1.96 to 1.96.


     Figure 2: Average Home Value by Block Group in Eau Claire County
As you can see, the average home values in the city of Eau Claire are considerably less than the county. While this is geographically evident and represented in figure 2, it is also represented by our hypothesis test. By rejecting the null hypothesis, we see how different the two sets of block groups are.


Conclusion: By applying hypothesis testing to numerous scenarios, it is clear that it is a valuable statistical tool. It allows the researcher to make generalizations off of a smaller sample population. Deciding on which test is appropriate is critical, but also not very hard to determine. By looking at the number of observations, you can quickly and confidently select the correct test. While our significance level was often given to us, it is through literature review that we can determine one in the future. 

Comments

Popular posts from this blog

Coffee Shops Customer Report, San Francisco

Trader Joe's Site Selection (Geog 352)

Regression Analysis