I have been using the Gapminder data set for my research questions.
Unlike my previous submissions, where I have been looking at female employment
rate, here I am looking at correlation between life expectancy vs urban rate.
Code
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 12 14:06:59 2015
@author: Abhishek
"""
import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt
data = pandas.read_csv('gapminder.csv', low_memory=False)
pandas.set_option('display.float_format', lambda x: '%f'%x)
data['lifeexpectancy'] = data['lifeexpectancy'].convert_objects(convert_numeric=True)
data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True)
'''
scat1 = seaborn.regplot(x='urbanrate', y='lifeexpectancy', fit_reg=True,data=data)
plt.xlabel('Urban Rate')
plt.ylabel('Life Expentancy')
plt.title('Scatterplot Urban Rate VS Life Expectancy')
'''
data['lifeexpectancy'] = data['lifeexpectancy'].replace(' ',numpy.nan)
data['urbanrate'] = data['urbanrate'].replace(' ',numpy.nan)
data_clean=data.dropna()
print ('association between urbanrate and lifeexpectancy')
print (scipy.stats.pearsonr(data_clean['urbanrate'], data_clean['lifeexpectancy']))
Results
The p Value is significant and correlation coefficient is 0.61870
The scatter plot above shows that there is a positive linear
relationship between lifeexpectance and urbanrate. As shown by the correlation
number, the relationship is of modest strength.
The relationship is statistically significant. It is likely
that countries with higher urban population have higher life expectancy.
No comments:
Post a Comment