Hypothesis Testing
Assignment 4
Michael Lewis
November 14, 2017
Part 1
Introduction:
The objective of this assignment is to learn how to use hypothesis testing and
apply it to real world examples. Hypothesis testing is a statistical method to
make larger conclusions about a population from a sample. It becomes most
useful when it is unrealistic to retrieve data from an entire population, like
from residents of San Francisco or farms in Iowa. Instead, by using hypothesis
testing, you can make generalizations about the population from a more doable,
sample population. We break it down into six steps:
1.
State the null hypothesis
2.
State the alternative hypothesis
3.
Choose a statistical test
4.
Choose the level of significance
5.
Calculate the statistic
6.
Make a decision about the null and alternative
hypotheses
Each step is further broken down to the following:
1.
There is no
significant difference between the observed mean and the sample mean
2.
There is
a significant difference between the observed mean and the sample mean
3.
Choose between a z-test and t-test based on the
number of observations.
4.
Common choices include, 95%, 97.5% & 99%.
5.
Calculate using the respective test
6.
Use your calculations to determine whether you reject or fail to reject the null hypothesis
It is important to be mindful of which test to select in
your hypothesis testing. While their formulas are near identical, the charts
used with each are extremely different and can make or break your test. Once
you’ve selected your test, the best way to choose your level of significance is
to do some simple literature review. What have others done in the past? This
piece of research is critical in your test.
Assignment:
The assignment had multiple questions to get the reader involved and
incorporating different tests and scenarios. First, we were given a chart that
supplied us with the Interval Type, Confidence Levels and numbers of
observations for a set of data points. With this information, we were left to
find the Significance level, determine what test should be used, and the
corresponding value based on the information. Figure 1 below is this chart
completed.
Secondly, we were given another scenario in which we were
to do our own hypothesis testing based on some values given to us. Looking at
crop yields in a district in Kenya, we were given the following estimates on
yields of groundnuts, cassava and beans.
-
Groundnuts = 0.55 metric tons
-
Cassava = 3.8 metric tons
-
Beans = 0.28 metric tons
A sample was taken of 23 farmers which gave us the
following results
|
|
Mean
|
Std. Dev.
|
|
Groundnuts
|
0.51
|
0.3
|
|
Cassava
|
3.4
|
0.74
|
|
Beans
|
0.33
|
0.13
|
We then did a hypothesis test for each of the crops.
Groundnuts
1.
Null Hypothesis: There is no significant difference
between the estimated yield of groundnuts and the sample yield of groundnuts.
2.
Alternative Hypothesis: There is a significant
difference between the estimated yield of ground nuts and the sample yield of
ground nuts
3. Statistical
Test: t-test was used due to n < 30
4. Choose
a: a was given as 95% or 0.05. It is a two tailed test, so when split it is
.025.
5. Calculate the test. t = (0.51 - 0.55) / (0.3 / sqrt(23)) t = -0.6394
6. With a t value of -0.6394, we fail to
reject the null hypothesis as it is within the range of -2.074 – 2.074.
Additionally, the
probability was found to be 0.246.
Cassava
1.
Null Hypothesis: There is no significant difference
between the estimated yield of cassava and the sample yield of cassava.
2.
Alternative Hypothesis: There is a significant difference
between the estimated yield of cassava and the sample yield of cassava.
3 Statistical
Test: t-test was used due to n < 30
4. Choose
a: a was given as 95% or 0.05. It is a two tailed test, so when split it is
.025.
5. Calculate the test. t = (0.34 - 0.38) / (0.74 / sqrt(23)) t = -2.5923
6. With a t value of -2.5923, we reject the
null hypothesis as it is outside the range of -2.074 – 2.074.
Additionally, the
probability was found to be 0.00856.
Beans
1.
Null Hypothesis: There is no significant difference
between the estimated yield of beans and the sample yield of beans.
2.
Alternative Hypothesis: There is a significant difference
between the estimated yield of beans and the sample yield of beans
3. Statistical
Test: t-test was used due to n < 30
4. Choose
a: a was given as 95% or 0.05. It is a two tailed test, so when split it is
.025.
5. Calculate the test. t = (0.33 - 0.28) / (0.13 / sqrt(23)) t = 1.8446
6. With a t value of 1.8446, we fail to
reject the null hypothesis as it is within the range of -2.074 – 2.074.
Additionally, the
probability was found to be 0.95652.
It is important to note the differences
and similarities in the data sets. For example, while the estimates for
groundnuts and cassava were both higher than the actual sample, the conclusion
for both ended differently with one rejecting the null hypothesis and the other
failing to reject it. Additionally, the groundnuts and beans both fell inside
the designated range, explaining why they failed to reject the null hypothesis.
Cassava was the only crop to reject the null hypothesis and fall outside of the
range.
A third scenario was proposed in which a
stream is believed to be more polluted than the allowable limit. It gives us
our data needed including: n=17, m=6.8, mh=4.4 and a standard deviation of 4.2.
1.
Null Hypothesis: There is no significant difference
between the observed pollutant level and the sample pollutant level.
2.
Alternative Hypothesis: There is a significant difference
between the observed pollutant level and the sample pollutant level.
3. Statistical
Test: t-test was used due to n < 30
4. Choose
a: a was given as 95% or 0.05.
5.
Calculate the test. t = (6.4 - 4.4) / (4.2 / sqrt(17)) t = 2.3561
6.
With a t value of 2.3561, we reject the
null hypothesis as it is over the Critical Value of 1.746.
The probability was
found to be 0.0017.
Part
2
We were asked to compare the average
values of homes by block group of the city of Eau Claire and Eau Claire County.
By using the data provided by ArcMap, a comparison could be made. The following
is the Hypothesis Test for this scenario:
1.
Null Hypothesis: There is no significant difference
between the observed average home values in Eau Claire County and the sample average
home values in the City of Eau Claire.
2.
Alternative Hypothesis: There is a significant difference
between the observed average home values in Eau Claire County and the sample average
home values in the City of Eau Claire.
3. Statistical
Test: z-test was used due to n > 30
4. Choose
a: A level of .05 was selected.
5.
Calculate the test. t = (151876.5094 - 169438.1304) / (49706.91892 / sqrt(69)) t = -2.9348
6.
With a t value of -2.9348, we reject the
null hypothesis as it fell outside of the range of -1.96 to 1.96.
As you can see, the average home values in the city of Eau
Claire are considerably less than the county. While this is geographically
evident and represented in figure 2, it is also represented by our hypothesis
test. By rejecting the null hypothesis, we see how different the two sets of
block groups are.
Conclusion:
By applying hypothesis testing to numerous scenarios, it is clear that it is a
valuable statistical tool. It allows the researcher to make generalizations off
of a smaller sample population. Deciding on which test is appropriate is critical,
but also not very hard to determine. By looking at the number of observations,
you can quickly and confidently select the correct test. While our significance
level was often given to us, it is through literature review that we can
determine one in the future.
Comments
Post a Comment