Software: Python

The goal of this project is to gain more experience with analyzing and presenting on a real-world dataset using Python. As part of the intermediate programming class, the data and analyses were slightly more complex than the final project for the beginner programming course.

Similar to the Rolling Stone project, the final deliverables for the course included a project proposal, a written report, a slide deck presentation, source code, an individual reflection, and a peer review.

In our report, we placed specific emphasis on the ethical considerations involved in how the data we used was collected and how bias may be reflected.

Sustainable Development Goal #5.5

  • Ensure women’s full and effective participation and equal opportunities for leadership at all levels of decision-making in political, economic, and public life

    • SDG 5.5.1

      • Proportion of seats held by women in (a) national parliaments and (b) local governments

    • SDG 5.5.2

      • Proportion of women in managerial positions

Sustainable Development Goal #5.6

  • Ensure universal access to sexual and reproductive health and reproductive rights

    • SDG 5.6.1

      • Proportion of women aged 15-49 years who make their own informed decisions regarding sexual relations, contraceptive use and reproductive health care

    • SDG 5.6.2

      • Extent of laws and regulations that guarantee full and equal access to women and men to sexual and reproductive health care, information and education

Data sources

The two data sets that we will use to perform our analysis were found through the SDG Indicators Database on the Model United Nations website.

There are 23 total columns including indicator number, countries, years (2013 - 2021), values that satisfy the indicator, sources, and the units of measurement of the indicator for each target.

Screenshot of data

Ethical considerations + hypotheses

Ethical considerations:

  • Protection of individual privacy (ages, locations)

  • Misrepresentation (accuracy, countries/populations left out) 

Hypothesis 1: See an increase in the amount of government seats and managerial positions held by women in less developed countries in 2024 and an increase within the next five years.

  • Acceptance criteria: Multiple and/or linear regression predicts a higher proportion of women in leadership in 2024 than in 2021, and increases until 2029 (Indicators 5.5.1 and 5.5.2).

Hypothesis 2: See an increase in the number of laws that guarantee equal access to health care

  • Acceptance criteria: Linear regression predicts an increase in the extent of laws that guarantee full and equal access to sexual and reproductive health care over the next five years (Indicator 5.6.2). (Not Feasible)

  • To test this hypothesis, we will be creating a linear regression model to analyze the global trend of the percentage of women who are able to make informed decisions regarding sexual relations, contraceptive use and reproductive health care (Indicator 5.6.1). 

By examining the progress of these SDGs, we aim to identify gaps, successes, and opportunities for improvement in efforts to advance this goal. 

Data algorithms

Methods:

  • Linear regression

  • Multiple linear regression

Approach:

  1. Identifying patterns in the data using panda dataframes 

  2. Conduct regression analyses and calculations

  3. Visualization -> matplotlib and seaborn libraries 

Steps:

  1. Load and read in data

  2. Sort by indicator and year

  3. Define functions to find global yearly averages and plot linear regressions for each indicator

  4. Perform multiple regression: graph predicted vs actual

  5. Look at specific countries, use models to predict data for next five years

Multiple regression results

Linear regression results - global trends

5.5.1

5.5.2

*Indicator 5.6.2 was not included as only contained data for certain years (2019 and 2022)

5.6.1

Mexico predictions

South Africa predictions

Philippines predictions

Predictions + conclusions

Looking at our visualizations, we can accept our first hypothesis that we would see an increase in female leadership over the next five years as our acceptance criteria is met.

  • 2024 and the next 5 years have higher results for 5.5.1 and 5.5.2

It is unclear whether we can accept our second hypothesis as we were unable to graph indicator 5.6.2 due to insufficient data. On the right, we can see that 5.6.1 increases over time, which is encouraging in thinking about 5.6.2 as the indicators are closely related.

  • Increase in 5.6.1 over the next 5 years

Future work

In order to achieve these increases in gender equality, countries must take an interdisciplinary approach. This could include:

  • Investing in education systems that provide equal access for all genders and motivate women to pursue leadership from a young age

  • Implementing economic measures to close the gender pay gap, provide access to financial resources for women, and promote women's workforce participation

  • Reforming legal barriers to gender equality, such as child marriage, gendered inheritance laws, divorce requirements, etc.

  • Provide widespread sexual and reproductive health services, including family planning, maternal healthcare, and contraceptives

Looking back at this project, I think that, as a team, we challenged ourselves to explore an extensive amount of data from a new data source, we were intentional about choosing data that was meaningful to us and creating a story to contextualize our findings, and we implemented a variety of different data science approaches we learned about in class this semester. As a result, I believe that our final product is not only a representative demonstration of what we have learned in class from a technical perspective, but is also representative of our group’s “soft skills” (critical thinking, collaboration, contextualization, research capabilities, etc.). With that being said, I believe our group could have approached this work more effectively if we had more strictly defined individual responsibilities at the beginning of the project to ease the workflow later on. 

I also believe that, had we done this better, our group would have been able to perform a more comprehensive analysis on our data, possibly by analyzing additional SDG 5 indicators and/or placing more of a focus on computing specific numerical calculations in addition to generating visualizations the way that we did. While our visualizations certainly enhanced our presentation and written report, I think it would have been helpful to calculate summary statistics for each country/year/indicator in order to analyze indicator progress from a different perspective. These were concepts that were definitely covered in class, and might have been beneficial to include in our final cumulative project as a reflection of what we learned. Despite this, I do believe that our final product reflects our entire group’s best effort and provides meaningful insights about our data.

This project served as a key stepping stone to building my programming skills using more complex functions and data analysis, which has made me a more confident programmer as a result.

Previous
Previous

Analyzing The Rolling Stone 500 Greatest Albums of All Time

Next
Next

Threads (Ars Electronica)