Spatial Autocorrelation

Part I: Correlation

This section of the assignment focused on correlation and using tools to analyze data. Tools that were used includes software like Microsoft Excel and SPSS and tests, Pearson Correlation and scatter plots. With these, we are able to critically look at data to find patterns. First, we were given data on sound levels at various distances. One column had distance as feet and the other column had sound level as decibels. The data is shown below in figure 1.

Figure 1: Sound level data provided
Using Excel, we made a scatter plot to visually represent the data. The resulting scatter plot is shown below in figure 2. 
Figure 2: Scatter plot of sound level, Microsoft Excel

Taking it further and putting the data into SPSS, we can analyze it using Pearson Correlation. This takes the data and looks at how it changes in relation to each variable. Figure 3 below is the result of running that correlation.
Figure 3: Correlation Table fro Distance and Sound Level, SPSS

By analyzing the data and using the scatter plot and correlation table, it is clear that there is a strong negative correlation between the distance from the source and the sound level. With a significance level of -0.896, it is considered strong as it is close to -1. This correlation tells us that the further you are from the source, the lower the sound level. This is represented both by the significance level and the trend line from the scatter plot.

Next, we were asked to make a correlation matrix for a number of variables in Detroit. These variables we part of a data set that was provided for this assignment. The variables include: 

White = White Pop. for the 1000 Census Tracts in and around Detroit
Black = Black Pop
Asian = Asian Pop.
His = Hispanic Pop
BachDegree = Number with a Bachelor’s Degree
MedHHInc = Median Household Income
MedHomeValue = Median Home Value
Manu = Number of Manufacturing Employees
Retail = Number of Retail Employees
Finance = Number of Finance Employees

Figure 4: Correlation Matrix, SPSS

In the correlation matrix, we are able to see the significance values of each variable compared to each other. While many of the comparisons do not stand out, there are several that reach significant levels. First, the highest level of significance is Median Household Income to Median Home Value. This is a simple correlation that to most, seems like common sense. If you make more money, you can afford a more expensive house. While we cannot prove causation, we can see that with a significant level of 0.883, it is a strong positive correlation. Next, a correlation that is interesting is the correlation between the white population and the number with a bachelor's degree. Here we see a significance level of 0.698, indicating that it is a relatively strong correlation. It is the highest among each racial category, but not by much. Lastly, it is equally important to point out both positive and negative correlations. We see a moderately strong negative correlation between white populations and black populations. With a significance level of -0.604 it indicates that in areas with large white populations, there are lower levels of black populations, or vice versa. This would make sense as Detroit has a long history of gentrification and segregation. 

Part II: Spatial Autocorrelation 

Introduction: We were tasked by the Texas Election Commission (TEC) to analyze patterns between the 1980 and 2012 elections. They want to know if there are patterns in voting and voting turnout. The objective is to determine any existing patterns and present them to the governor to display how voting in the state has changed over a period of 32 years. Questions to be answered include:

Is there a pattern in voting in the state?
Is there a pattern of voting turnout in the state?
Are there other population variables that may be related to these patterns?
How have these patterns changed over time?

Some potential shortcomings of the data include a lack of data in between the selected years. With a large gap in time, 32 years, patterns may have been evident or become strong before 2012. This data does not reflect the elections in between, rather it compares the two election years. 

Methodology: Using data provided by the TEC, we analyze it using various tests and tools. 

First, as reference for the mapping aspect of the question, the Texas shapefile was downloaded from the U.S. Census database online. By selecting all counties in the state of Texas, we get a shapefile with each individual county in it. This is key for breaking down election data to the county level. 

Next, using ArcMap, we can join the downloaded shapefile with our data provided and include a variable such as percent Hispanic. Joining by the GEO_ID allows us to display each data set on the shapefile. 

Third, after exporting the new shapefile, we can open it up in GeoDa. GeoDa allows us to run various key analysis. The two main tools used were Moran's I and LISA Cluster Maps. Moran's I is used to find spatial autocorrelation and LISA Cluster Maps display the cluster of a selected variable on a map.

Results: Five separate scatter plots and five separate cluster maps were created to display potential patterns in each variable. Figure 5 and figure 6 show the Democratic voting patterns in a Moran's I scatter plot and a cluster map respectively. 
Figure 5: Moran's I of percent Democrat Voter in 1980

Figure 6: LISA Cluster Map of percent Democrat Voter in 1980

From these two figures, we can look at the patterns present to give us insight into voting in 1980. There is a clear cluster of low democrat voting in the northern counties of the state. In contrast we see a cluster of high democrat voting in the southern counties. Overall, there is not much of a strong pattern as the majority of counties register as not significant in the cluster. This is also shown in the scatter plot as the Moran's I statistic is a 0.57, indicating a moderately strong correlation.
Figure 7: Moran's I of Percent Democrat Vote in 2012

Figure 8:  LISA Cluster Map of percent Democrat Voter in 2012

Figures 7 and 8 show percent voting democrat in the 2012 election. Here we see the pattern first shown in 1980 has increased with a high democrat voting in the southern counties. The Moran's I statistic has increased as well to 0.695, indicating a strong correlation.
Figure 9: Moran's I scatter plot of Voter Turnout in 1980

Figure 10: LISA Cluster Map of Voter Turnout in 1980

Figures 9 and 10 show us the data of voter turnout in 1980 and the patterns associated with it. As the Moran's I statistic is low, at a 0.468, there is not much of a pattern here. This is shown again in the spare distribution on the cluster map. 
Figure 11: Moran's I scatter plot of Voter Turnout in 2012

Figure 12: LISA Cluster Map of Voter Turnout in 2012


Figures 11 and 12 display the voter turnout data for 2012. Again we see little to no pattern with a Moran's statistic of 0.335. There is no clear pattern on the cluster map other than lower turnout rates in the southern counties, a pattern that was also evident in the 1980 data.

Lastly, interested in the patterns and what may factor into this, I decided to look at percent Hispanic as an additional variable per county. Figures 13 and 14 are representations of this data. 
Figure 13: Percent Hispanic Population by County

Figure 14: Percent Hispanic Population by County Cluster Map

Here we see some very clear patterns. First, the Moran's I statistic is a 0.778, which indicates a high correlation. The cluster map geographically represents this as it is clear that counties in the southwest region of the state have higher percentages of Hispanic populations.

This pattern prompted a further look into correlation using data on Hispanic populations. By putting all of the data together, I ran a Pearson's correlation analysis. Figure 15 is the table it created. 

Figure 15: Correlation Matrix for voting data and Hispanic data

In the correlation matrix, it shows us that there is a strong positive correlation between Hispanic populations and voting in the 2012 election. With a significance value of .718, it shows us that there is a pattern that counties with higher percentages of Hispanics had a higher Democratic voting in 2012. Going back to Figures 8 and 15, this makes sense as they both indicate high values in the same region. At the same time, we see an equally strong negative correlation for voter turnout in 2012. This is interesting as there is lower turnout in counties with high percentages of hispanics, yet the democratic voting is up.

Conclusion: While the data is just one slice of the data that it out there, it offers us insight into the changing patterns of the elections in the state of Texas. To further understand these patterns, more research should be done to look at variables like race, shown here as percentages of Hispanics, or use other variables like Median Household Income. With the data provided, it is clear that some patterns do exist. First, in 1980 there is a split between north and south when it comes to Democratic Voting. This pattern is still evident in the 2012 election. Voter turnout has remained sporadic but when paired with Hispanic data, the pattern matches. Counties with higher percentages of Hispanics have lower voter turnouts. These patterns, while seemingly unrelated and unimportant, give a snapshot of the state politically. This data could be used to target areas for increased voter turnout, or where a politician can expect support. Using these tools for analysis offer greater insight into each dataset.




Comments

Popular posts from this blog

Coffee Shops Customer Report, San Francisco

Trader Joe's Site Selection (Geog 352)

Regression Analysis