Saturday, November 14, 2015

Calculating Correlation

I have been using the Gapminder data set for my research questions. Unlike my previous submissions, where I have been looking at female employment rate, here I am looking at correlation between life expectancy vs urban rate.

Code


# -*- coding: utf-8 -*- """ Created on Thu Nov 12 14:06:59 2015 @author: Abhishek """ import pandas import numpy import seaborn import scipy import matplotlib.pyplot as plt data = pandas.read_csv('gapminder.csv', low_memory=False) pandas.set_option('display.float_format', lambda x: '%f'%x) data['lifeexpectancy'] = data['lifeexpectancy'].convert_objects(convert_numeric=True) data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True) ''' scat1 = seaborn.regplot(x='urbanrate', y='lifeexpectancy', fit_reg=True,data=data) plt.xlabel('Urban Rate') plt.ylabel('Life Expentancy') plt.title('Scatterplot Urban Rate VS Life Expectancy') ''' data['lifeexpectancy'] = data['lifeexpectancy'].replace(' ',numpy.nan) data['urbanrate'] = data['urbanrate'].replace(' ',numpy.nan) data_clean=data.dropna() print ('association between urbanrate and lifeexpectancy') print (scipy.stats.pearsonr(data_clean['urbanrate'], data_clean['lifeexpectancy']))

Results


The p Value is significant and correlation coefficient is 0.61870


The scatter plot above shows that there is a positive linear relationship between lifeexpectance and urbanrate. As shown by the correlation number, the relationship is of modest strength.

The relationship is statistically significant. It is likely that countries with higher urban population have higher life expectancy.

Squaring r, give us 0.38. This means that if we know the x variable in scatter plot, in this case urbanrate, then we can predict 38% of life expectancy. 62% is unaccounted for.

No comments:

Post a Comment