Regression Analysis
Part I
After a study was done gathering data of poverty and crime rates in Town X, certain assumptions were made about the connection between the two. One such assumption was that in areas where the amount of kids who get free lunches increases, so does the crime rate. Finding this a bit far fetched and believing it may lack a strong correlation, I decided to take a look at the data myself and run a regression analysis. The data I had to work with included:
- Percent of kids given a free lunch in an area
- Crime rate per 100,000 people
The question that I sought out to answer was "Is the percentage of kids receiving a free lunch in an area correlated to higher crime rates?" The news station that made this assumption offers the reason for the question, and by running a regression analysis, I can prove or disprove their claim.
By putting the data into SPSS and running a linear regression analysis, I can see the regression equation and the level of correlation and significance. Figure 1 is the tables that were generated during the analysis.
Figure 1: Regression Analysis tables ran through SPSS
There are certain numbers that are important to point out. First, using the values above, our regression equation comes out to be Y = 21.819 + 1.685x. Our constant is 21.819, meaning that if an area had 0% of kids recieving a free lunch, the crime rate would be 21.819 per 100,000. Additionally, our slope is 1.685. This means that for every percent increase of free lunches, the crime rate increases by 1.685.
These values and equation alone would sound pretty convincing when applied to the news stations claim. The important figure to look at next is the level of significance. Here, the significance is 0.005, meaning it is significant.
Next, as a way of double checking this connection between variables, we look at the R value. Our R value is the correlation between our variables. At .416 we can come to a number of conclusions. First, there is a correlation between the variables and that this correlation is positive. Also, this is where we determine our strength of correlation. At .416, it is not very close to +/- 1, indicating that it is not a very strong correlation.
This finding is arguably the most important when applying it to the new's claim. While their is a correlation between the percent of kids getting a free lunch and the crime rate, it lacks validation due to its lack of strength. To answer the study question, yes the crime rate in an area is correlated to the percentage of kids getting free lunches. Using our equation, we could find the crime rate in areas with different x values. For example, if an area in Town X was found to have 30% of kids there getting a free lunch, we could also say that the crime rate in that area would be 72.369 per 100,000. Again, as this correlation isn't very strong, this would at best be a approximation, not a exact figure.
Part II
Quick response to 911 calls can often be the difference between life or death and the distance from incident to hospital plays a huge role in the outcome of each patient. Emergency services rely on quick actions and thought out placement of facilities. In major metropolitan areas, any number of factors can influence how often 911 is called. Cities that actively improve public safety continuously analyze their area monitor fluctuating amounts of 911 calls.
In this exercise, a company in Portland, OR is looking to build a hospital. In hopes to help the city out, the company is looking to build this hospital in an area with high numbers of 911 calls. During this search for a suitable location, the city is interested in finding out what factors are causing increased numbers of calls. 911 calls come from all over the city but it is theorized that certain areas may have more due to certain variables.
The study question is "Are certain variables correlated to increase 911 calls and how are they spatially distributed?" By answering this question, we should also arrive at an answer of where to build this hospital.
Methodology
The purpose of this exercise is to use data given and statistical tools like regression analysis to find a suitable place to build the hospital. Using software like SPSS, Excel, and ArcMap, we should be able to spatially analyze the data and display the findings in a concise and clear manner.
The data was provided by the city of Portland and includes 9 variables. They include:
- Calls (number of 911 calls per census tract)
- Jobs
- Renters
- LowEduc (Number of people with no HS Degree)
- AlcoholX (alcohol sales)
- Unemployed
- ForgnBorn (Foreign Born Pop)
- Med Income
- CollGrads (Number of College Grads)
I was asked to narrow down the variables to 3. For this analysis I chose the variables of Renters, Unemployed and LowEduc. I chose these as they are all indicators of an area. By using hypothesis testing, I analyzed each relationship between my selected independent variable and the dependent variable of Calls. Independent and dependent are defined as:
Independent Variable: Variable that changes on its own
Dependent Variable: Variable that changes due to its relation with another variable
Using SPSS, 3 regression analyses were created looking at each of the selected variables. Running each through hypothesis testing, I was able to determine the importance of each as applied to the study question and task.
To help spatially display the findings, two maps were created. The first, a choropleth map to show the number of calls per census tract. The second was a residual map to show outliers in the data.
Residual is defined as: the difference of the data point and the estimated point of the trend line.
Findings
The first variable that was looked at was Renters. Figure 2 shows the Regression Analysis generated from SPSS.
Figure 2: Regression Analysis for Renters and Calls
Here we see a pattern in the data. Based on the analysis, the number of 911 calls in an area is strongly correlated to the amount of renters in the area. Our regression equation is y = 3.803 + .006x. This means that 911 calls increase for every added renter in an area and based on the R value, this correlation is strong at a value of 0.785. The significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
Figure 3 looks at the next variable selected, number of persons Unemployed.
Figure 3: Regression Analysis for Unemployed and Calls
We see another pattern in the data. Based on the analysis, the number of 911 calls in an area is strongly correlated to the amount of unemployed persons in the area. Our regression equation is y = 1.106 + .507x. This means that 911 calls increase for every added unemployed person in an area and based on the R value, this correlation is strong at a value of 0.737. The significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
The last regression between LowEduc and Calls is shown in figure 4 below.
Figure 4: Regression Analysis between LowEduc and Calls.
Based on the analysis, the number of 911 calls in an area is strongly correlated to the percent of people without a HS degree in the area. Our regression equation is y = 3.931 + .166x. This means that 911 calls increase for every added percent of people without a HS degree in an area and based on the R value, this correlation is strong at a value of 0.753. In addition, the significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
The 3 selected variables were all strongly correlated to 911 calls. These 3 variables are also tied to an overarching issue of poverty. We often find that people in poverty rent rather than buy, are often educated to lower degrees and struggle with unemployment. We also know that crime rates are often higher in areas of poverty, increasing the number of 911 calls. These all have a tendency to raise the amount of 911 calls in an area and the data goes on to prove that.
Furthermore, to display the data spatially, figures 5 and 6 look at the numbers of calls per census tract and the residual of the renters variable.
Figure 5: Choropleth Map of 911 Calls per Census Tract.
We see in the Choropleth map that the census tracts that have the highest amounts of 911 calls are mostly clumped together in the north central part of the city. Census tracts 60, 61, 62, 65, 66 and 79 all have over 57 calls per tract. This is important to see when looking for a place to put a new hospital.
Figure 6: Standardized Residual Map of 911 Calls by Census Tract.
Figure 6 takes a look at the residuals of the Renters data. It is showing how far off the data points are by standard deviation from the estimated regression equation. We see that tracts 62 and 65 are both greater than 2.5 standard deviations from the mean. This means that they have a significantly greater amount of 911 calls in each. We see this as well as Tract 62 has 176 calls and Tract 65 has 97 calls. With the equation generated from the analysis. these tracts data points had large residuals and were far from the trend line.
Conclusion
With the data presented and the analysis done, we can go back to our study question. It was found that there were a number of factors that related to higher number of 911 calls. All three variables studied showed strong correlations to higher numbers of 911 calls. This can be attributed to each variables connection to poverty and crime. Additionally, we see that higher numbers of 911 calls are also spatially clustered in the north central part of the city.
In regards to where to put the hospital, my suggestion after taking in all of the data would be to build it in the north west corner of census tract 65. As this tract had the most 911 calls and is clustered with other tracts of high 911 calls. This would offer the most centralized location to serve the needs of the surrounding communities.
It is important to also note the shortcomings of this analysis and data. This analysis was done with only 4 of the 9 variables that were provided. Further analysis with expanded parameters may prove different relationships. Additionally, no mention of time frame was made. If this is monthly averages, then yes, another hospital could drastically improve public safety. The data would be viewed differently if this was yearly averages. These play a bigger part in the question of how big should the new hospitals ER be.
To conclude, a new hospital can help with emergency response times and improve public safety. Additionally, after reviewing the factors that may be connected to increased 911 calls, its been determined that each variable, proved to be correlated.
After a study was done gathering data of poverty and crime rates in Town X, certain assumptions were made about the connection between the two. One such assumption was that in areas where the amount of kids who get free lunches increases, so does the crime rate. Finding this a bit far fetched and believing it may lack a strong correlation, I decided to take a look at the data myself and run a regression analysis. The data I had to work with included:
- Percent of kids given a free lunch in an area
- Crime rate per 100,000 people
The question that I sought out to answer was "Is the percentage of kids receiving a free lunch in an area correlated to higher crime rates?" The news station that made this assumption offers the reason for the question, and by running a regression analysis, I can prove or disprove their claim.
By putting the data into SPSS and running a linear regression analysis, I can see the regression equation and the level of correlation and significance. Figure 1 is the tables that were generated during the analysis.
Figure 1: Regression Analysis tables ran through SPSS
There are certain numbers that are important to point out. First, using the values above, our regression equation comes out to be Y = 21.819 + 1.685x. Our constant is 21.819, meaning that if an area had 0% of kids recieving a free lunch, the crime rate would be 21.819 per 100,000. Additionally, our slope is 1.685. This means that for every percent increase of free lunches, the crime rate increases by 1.685.
These values and equation alone would sound pretty convincing when applied to the news stations claim. The important figure to look at next is the level of significance. Here, the significance is 0.005, meaning it is significant.
Next, as a way of double checking this connection between variables, we look at the R value. Our R value is the correlation between our variables. At .416 we can come to a number of conclusions. First, there is a correlation between the variables and that this correlation is positive. Also, this is where we determine our strength of correlation. At .416, it is not very close to +/- 1, indicating that it is not a very strong correlation.
This finding is arguably the most important when applying it to the new's claim. While their is a correlation between the percent of kids getting a free lunch and the crime rate, it lacks validation due to its lack of strength. To answer the study question, yes the crime rate in an area is correlated to the percentage of kids getting free lunches. Using our equation, we could find the crime rate in areas with different x values. For example, if an area in Town X was found to have 30% of kids there getting a free lunch, we could also say that the crime rate in that area would be 72.369 per 100,000. Again, as this correlation isn't very strong, this would at best be a approximation, not a exact figure.
Part II
Quick response to 911 calls can often be the difference between life or death and the distance from incident to hospital plays a huge role in the outcome of each patient. Emergency services rely on quick actions and thought out placement of facilities. In major metropolitan areas, any number of factors can influence how often 911 is called. Cities that actively improve public safety continuously analyze their area monitor fluctuating amounts of 911 calls.
In this exercise, a company in Portland, OR is looking to build a hospital. In hopes to help the city out, the company is looking to build this hospital in an area with high numbers of 911 calls. During this search for a suitable location, the city is interested in finding out what factors are causing increased numbers of calls. 911 calls come from all over the city but it is theorized that certain areas may have more due to certain variables.
The study question is "Are certain variables correlated to increase 911 calls and how are they spatially distributed?" By answering this question, we should also arrive at an answer of where to build this hospital.
Methodology
The purpose of this exercise is to use data given and statistical tools like regression analysis to find a suitable place to build the hospital. Using software like SPSS, Excel, and ArcMap, we should be able to spatially analyze the data and display the findings in a concise and clear manner.
The data was provided by the city of Portland and includes 9 variables. They include:
- Calls (number of 911 calls per census tract)
- Jobs
- Renters
- LowEduc (Number of people with no HS Degree)
- AlcoholX (alcohol sales)
- Unemployed
- ForgnBorn (Foreign Born Pop)
- Med Income
- CollGrads (Number of College Grads)
I was asked to narrow down the variables to 3. For this analysis I chose the variables of Renters, Unemployed and LowEduc. I chose these as they are all indicators of an area. By using hypothesis testing, I analyzed each relationship between my selected independent variable and the dependent variable of Calls. Independent and dependent are defined as:
Independent Variable: Variable that changes on its own
Dependent Variable: Variable that changes due to its relation with another variable
Using SPSS, 3 regression analyses were created looking at each of the selected variables. Running each through hypothesis testing, I was able to determine the importance of each as applied to the study question and task.
To help spatially display the findings, two maps were created. The first, a choropleth map to show the number of calls per census tract. The second was a residual map to show outliers in the data.
Residual is defined as: the difference of the data point and the estimated point of the trend line.
Findings
The first variable that was looked at was Renters. Figure 2 shows the Regression Analysis generated from SPSS.
Figure 2: Regression Analysis for Renters and Calls
Here we see a pattern in the data. Based on the analysis, the number of 911 calls in an area is strongly correlated to the amount of renters in the area. Our regression equation is y = 3.803 + .006x. This means that 911 calls increase for every added renter in an area and based on the R value, this correlation is strong at a value of 0.785. The significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
Figure 3 looks at the next variable selected, number of persons Unemployed.
Figure 3: Regression Analysis for Unemployed and Calls
We see another pattern in the data. Based on the analysis, the number of 911 calls in an area is strongly correlated to the amount of unemployed persons in the area. Our regression equation is y = 1.106 + .507x. This means that 911 calls increase for every added unemployed person in an area and based on the R value, this correlation is strong at a value of 0.737. The significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
The last regression between LowEduc and Calls is shown in figure 4 below.
Figure 4: Regression Analysis between LowEduc and Calls.
Based on the analysis, the number of 911 calls in an area is strongly correlated to the percent of people without a HS degree in the area. Our regression equation is y = 3.931 + .166x. This means that 911 calls increase for every added percent of people without a HS degree in an area and based on the R value, this correlation is strong at a value of 0.753. In addition, the significance value is 0.000, indicating that this correlation is statistically significant. This positive relationship could be one reason for an area to have a higher amount of 911 calls. With this data, we reject the null hypothesis as we see a correlation between the two.
The 3 selected variables were all strongly correlated to 911 calls. These 3 variables are also tied to an overarching issue of poverty. We often find that people in poverty rent rather than buy, are often educated to lower degrees and struggle with unemployment. We also know that crime rates are often higher in areas of poverty, increasing the number of 911 calls. These all have a tendency to raise the amount of 911 calls in an area and the data goes on to prove that.
Furthermore, to display the data spatially, figures 5 and 6 look at the numbers of calls per census tract and the residual of the renters variable.
Figure 5: Choropleth Map of 911 Calls per Census Tract.
We see in the Choropleth map that the census tracts that have the highest amounts of 911 calls are mostly clumped together in the north central part of the city. Census tracts 60, 61, 62, 65, 66 and 79 all have over 57 calls per tract. This is important to see when looking for a place to put a new hospital.
Figure 6: Standardized Residual Map of 911 Calls by Census Tract.
Figure 6 takes a look at the residuals of the Renters data. It is showing how far off the data points are by standard deviation from the estimated regression equation. We see that tracts 62 and 65 are both greater than 2.5 standard deviations from the mean. This means that they have a significantly greater amount of 911 calls in each. We see this as well as Tract 62 has 176 calls and Tract 65 has 97 calls. With the equation generated from the analysis. these tracts data points had large residuals and were far from the trend line.
Conclusion
With the data presented and the analysis done, we can go back to our study question. It was found that there were a number of factors that related to higher number of 911 calls. All three variables studied showed strong correlations to higher numbers of 911 calls. This can be attributed to each variables connection to poverty and crime. Additionally, we see that higher numbers of 911 calls are also spatially clustered in the north central part of the city.
In regards to where to put the hospital, my suggestion after taking in all of the data would be to build it in the north west corner of census tract 65. As this tract had the most 911 calls and is clustered with other tracts of high 911 calls. This would offer the most centralized location to serve the needs of the surrounding communities.
It is important to also note the shortcomings of this analysis and data. This analysis was done with only 4 of the 9 variables that were provided. Further analysis with expanded parameters may prove different relationships. Additionally, no mention of time frame was made. If this is monthly averages, then yes, another hospital could drastically improve public safety. The data would be viewed differently if this was yearly averages. These play a bigger part in the question of how big should the new hospitals ER be.
To conclude, a new hospital can help with emergency response times and improve public safety. Additionally, after reviewing the factors that may be connected to increased 911 calls, its been determined that each variable, proved to be correlated.
Comments
Post a Comment