Questions Graphs and Numbers: Running an analysis of variance

I am using the Gapminder dataset and my response variable is FemaleEmploymentRate and Explanatory variable is Polityscore.

My hypothesis is that female employment rate is related to polity score. Polity score captures the regime authority spectrum on a 21-point scale ranging from -10 (hereditary monarchy) to +10 (consolidated democracy). Polity score is the category variable with 21 possible categories.

I have chosen to look at just the G20 countries for my research. The data set is managed accordingly.

Code


"""
Created on Fri Oct 30 06:50:52 2015 
@author: Abhishek

"""
import pandas
import numpy
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi

data = pandas.read_csv('gapminder.csv', low_memory=False)
pandas.set_option('display.float_format', lambda x: '%f'%x)

data['femaleemployrate'] = data['femaleemployrate'].convert_objects(convert_numeric=True)
data['incomeperperson'] = data['incomeperperson'].convert_objects(convert_numeric=True)
data['polityscore'] = data['polityscore'].convert_objects(convert_numeric=True)

dataG20Copy = data[(data['country'] == 'Argentina') |
            (data['country'] == 'Australia') |
            (data['country'] == 'Brazil') |
            (data['country'] == 'Canada') |
            (data['country'] == 'China') |
            (data['country'] == 'France') |
            (data['country'] == 'Germany') |
            (data['country'] == 'India') |
            (data['country'] == 'Indonesia') |
            (data['country'] == 'Italy') |
            (data['country'] == 'Japan') |
            (data['country'] == 'Mexico') |
            (data['country'] == 'Russia') |
            (data['country'] == 'Saudi Arabia') |
            (data['country'] == 'South Africa') |
            (data['country'] == 'Korea, Rep.') |
            (data['country'] == 'Turkey') |
            (data['country'] == 'United Kingdom') |
            (data['country'] == 'United States')]


# Not always necessary but can eliminate a setting with copy warning that is displayed
dataG20 = dataG20Copy.copy()

subPolity = dataG20[['femaleemployrate','polityscore']].dropna()

modelPolity = smf.ols(formula='femaleemployrate ~ C(polityscore)',data=subPolity).fit()
print(modelPolity.summary())

mean = subPolity.groupby('polityscore').mean()
print(mean)

sd = subPolity.groupby('polityscore').std()
print(sd)

mc1 = multi.MultiComparison(subPolity['femaleemployrate'],subPolity['polityscore'])
res1 = mc1.tukeyhsd()
print(res1.summary())

OLS Test Results

Group Means

Looking at the p value, we see that there is good chance that the null hypothesis can be rejected. Post Hoc test results will determine for which categories null hypothesis can be rejected.

Turn Key HSD / Post Hoc Test Results

The Groups for which reject column is True in the above results are groups where NULL Hypothesis can be safely rejected.

In conclusion, it is evident that Female Employment Rate is indeed dependent on Polity score of a G20 country.

Questions Graphs and Numbers

Saturday, October 31, 2015

Running an analysis of variance

Code

OLS Test Results

Group Means

Turn Key HSD / Post Hoc Test Results

No comments:

Post a Comment