Sunday, September 27, 2015

Running my first program

Code

// Comment
# -*- coding: utf-8 -*-
"""
Frequency Distribution

"""
import pandas
import numpy

# low_memory increases the efficiency of read
# only after the data is downloaded will it put in the data frame
data = pandas.read_csv('gapminder.csv', low_memory=False)

# Convert all DataFrame column names to upper case
data.columns = map(str.lower, data.columns)

# Bug fix for display format to avoid runtime errors
pandas.set_option('display.float_format', lambda x: '%f'%x)


# Get data for only the G20 countries (19 countries)
dataG20Copy = data[(data['country'] == 'Argentina') |
            (data['country'] == 'Australia') |
            (data['country'] == 'Brazil') |
            (data['country'] == 'Canada') |
            (data['country'] == 'China') |
            (data['country'] == 'France') |
            (data['country'] == 'Germany') |
            (data['country'] == 'India') |
            (data['country'] == 'Indonesia') |
            (data['country'] == 'Italy') |
            (data['country'] == 'Japan') |
            (data['country'] == 'Mexico') |
            (data['country'] == 'Russia') |
            (data['country'] == 'Saudi Arabia') |
            (data['country'] == 'South Africa') |
            (data['country'] == 'Korea, Rep.') |
            (data['country'] == 'Turkey') |
            (data['country'] == 'United Kingdom') |
            (data['country'] == 'United States')]

# Not always necessary but can eliminate a setting with copy warning that is displayed
dataG20 = dataG20Copy.copy()

print ('------- femaleemployrate of G20 countries -------')
print('Count: ')
filter_values = [10, 25, 50, 75, 100]
ranges = pandas.cut(dataG20['femaleemployrate'].convert_objects(convert_numeric=True), bins = filter_values).value_counts(sort=True, dropna=True)
print(ranges)

print('Percentages: All available countries ')
p1 = pandas.cut(dataG20['femaleemployrate'].convert_objects(convert_numeric=True), bins = filter_values).value_counts(sort=True, dropna=True, normalize=True) * 100
print(p1)

print ('------- Incomeperperson of G20 countries -------')
print('Count: ')
filter_values = [10000,25000,50000,75000,100000,200000]
ranges = pandas.cut(dataG20['incomeperperson'].convert_objects(convert_numeric=True), bins = filter_values).value_counts(sort=True, dropna=True)
print(ranges)

print('Percentages: ')
p1 = pandas.cut(dataG20['incomeperperson'].convert_objects(convert_numeric=True), bins = filter_values).value_counts(sort=True, dropna=True, normalize=True) * 100
print(p1)

print ('------- Polityscore G20 Countries -------')
print('Count: ')
c1 = dataG20["polityscore"].convert_objects(convert_numeric=True).value_counts(sort=True, dropna=True)
print(c1)

print('Percentages: ')
p1 = dataG20['polityscore'].convert_objects(convert_numeric=True).value_counts(sort=True, dropna=True, normalize=True) * 100
print(p1)


Results

I have shown the frequency distribution of – femaleeploymentrate, incomeperperson and polityscore for G20 countries. Initial data obtained from gapminder CSV file has been filtered to just the G20 countries (19 countries). Please see code above.
For femaleemploymentrate and incomeperson, the frequency of individual item was 1. So, I used ranges for these variables. It was easy to figure out the max and min of these fields and the range distribution was done accordingly. You can see in the results that for femaleemploymentrate the ranges are 10-24, 25-49, 50-74, 75-100. For incomeperperson the ranges are 10000 - 24000, 25000 - 49000, 50000- 74000, 75000 - 99000, 100000 – 200000. We get much better distribution this way.

Sunday, September 20, 2015

Getting My Research Project Started

Data Set

I am working with Gapminder Codebook PDF

Topic

I am interested in Female Employment Rate of nations. Associations I would like to explore
  • Female employment rate of a nation and GDP
I am also curious if any association exists between
  • Polity Score and female employment rate
  • Polity score and GDP

Code Book

For this research will need the following variables listed in Gamminder Codebook –
  1. Femaleemployrate: 2007 female employees age 15+ (% of population) Percentage of female population, age above 15, that has been employed during the given year
  2.  Incomeperperson: 2010 Gross Domestic Product per capita in constant 2000 US$. The inflation but not the differences in the cost of living between countries has been taken into account.
  3. Polityscore: 2009 Democracy score (Polity) Overall polity score from the Polity IV dataset, calculated by subtracting an autocracy score from a democracy score. The summary measure of a country's democratic and free nature. -10 is the lowest value, 10 the highest.
  4. Country: The primary key of this data set. It is assumed that country names are unique.

Questions I would like to answer

  1. Is there any relationship between female employment rate and GDP of a nation?
  2. Is there any relationship between female employment rate in G20 countries and their GDP?
  3. Is there any relationship between female employment rate and polity score of a country?
  4. What about polity score vs female employment rate in G20 countries?
  5. Finally is there any relationship between polity score and GDP of G20 countries?

What does Google say about questions above?

  1.  http://goo.gl/nqXmpW: This is an article from UN Women website. This article makes a strong argument for gender empowerment. My research question does not directly deal with gender empowerment or equality, I am only interested in female employment rate. However this article strongly suggests that higher female employment rate will lead strong economic growth.
  2. http://goo.gl/TEwfDo: Report prepared for the G20 Labor and Employment Ministerial Meeting Melbourne, Australia, 10-11 September 2014. This paper looks at achieving stronger economy based on gender-balanced economy. Even though I am not really asking any questions about gender equality this research is useful because –
    • The paper specifically talks about G20 countries
    • The premise is that greater labor force participation of women will lead to stronger economic growth. Again this is not the question I asked, but their premise and supporting report leads to my hypothesis about G20 countries.

Hypothesis

  1. Female employment rate should have a positive effect on GDP on nation, that is, GDP of nations with higher female employment rate should be relatively greater.
  2. I think the above hypothesis holds true for the G20 countries
  3. I think female employment rate will be more for countries with higher polity scores.
  4. The above hypothesis will hold true for G20 countries.
  5. I think countries with polity scores somewhere in the middle will have higher GDP