Software: Python
The goal of this project is to gain more experience with analyzing and presenting on a real-world dataset using Python. As part of the intermediate programming class, the data and analyses were slightly more complex than the final project for the beginner programming course.
Similar to the Rolling Stone project, the final deliverables for the course included a project proposal, a written report, a slide deck presentation, source code, an individual reflection, and a peer review.
In our report, we placed specific emphasis on the ethical considerations involved in how the data we used was collected and how bias may be reflected.
Sustainable Development Goal #5.5
Ensure women’s full and effective participation and equal opportunities for leadership at all levels of decision-making in political, economic, and public life
SDG 5.5.1
Proportion of seats held by women in (a) national parliaments and (b) local governments
SDG 5.5.2
Proportion of women in managerial positions
Sustainable Development Goal #5.6
Ensure universal access to sexual and reproductive health and reproductive rights
SDG 5.6.1
Proportion of women aged 15-49 years who make their own informed decisions regarding sexual relations, contraceptive use and reproductive health care
SDG 5.6.2
Extent of laws and regulations that guarantee full and equal access to women and men to sexual and reproductive health care, information and education
Data sources
The two data sets that we will use to perform our analysis were found through the SDG Indicators Database on the Model United Nations website.
There are 23 total columns including indicator number, countries, years (2013 - 2021), values that satisfy the indicator, sources, and the units of measurement of the indicator for each target.
Screenshot of data
Ethical considerations + hypotheses
Ethical considerations:
Protection of individual privacy (ages, locations)
Misrepresentation (accuracy, countries/populations left out)
Hypothesis 1: See an increase in the amount of government seats and managerial positions held by women in less developed countries in 2024 and an increase within the next five years.
Acceptance criteria: Multiple and/or linear regression predicts a higher proportion of women in leadership in 2024 than in 2021, and increases until 2029 (Indicators 5.5.1 and 5.5.2).
Hypothesis 2: See an increase in the number of laws that guarantee equal access to health care
Acceptance criteria: Linear regression predicts an increase in the extent of laws that guarantee full and equal access to sexual and reproductive health care over the next five years (Indicator 5.6.2). (Not Feasible)
To test this hypothesis, we will be creating a linear regression model to analyze the global trend of the percentage of women who are able to make informed decisions regarding sexual relations, contraceptive use and reproductive health care (Indicator 5.6.1).
By examining the progress of these SDGs, we aim to identify gaps, successes, and opportunities for improvement in efforts to advance this goal.
Data algorithms
Methods:
Linear regression
Multiple linear regression
Approach:
Identifying patterns in the data using panda dataframes
Conduct regression analyses and calculations
Visualization -> matplotlib and seaborn libraries
Steps:
Load and read in data
Sort by indicator and year
Define functions to find global yearly averages and plot linear regressions for each indicator
Perform multiple regression: graph predicted vs actual
Look at specific countries, use models to predict data for next five years
Multiple regression results
Linear regression results - global trends
5.5.1
5.5.2
*Indicator 5.6.2 was not included as only contained data for certain years (2019 and 2022)
5.6.1
Mexico predictions
South Africa predictions
Philippines predictions
Predictions + conclusions
Looking at our visualizations, we can accept our first hypothesis that we would see an increase in female leadership over the next five years as our acceptance criteria is met.
2024 and the next 5 years have higher results for 5.5.1 and 5.5.2
It is unclear whether we can accept our second hypothesis as we were unable to graph indicator 5.6.2 due to insufficient data. On the right, we can see that 5.6.1 increases over time, which is encouraging in thinking about 5.6.2 as the indicators are closely related.
Increase in 5.6.1 over the next 5 years
Future work
In order to achieve these increases in gender equality, countries must take an interdisciplinary approach. This could include:
Investing in education systems that provide equal access for all genders and motivate women to pursue leadership from a young age
Implementing economic measures to close the gender pay gap, provide access to financial resources for women, and promote women's workforce participation
Reforming legal barriers to gender equality, such as child marriage, gendered inheritance laws, divorce requirements, etc.
Provide widespread sexual and reproductive health services, including family planning, maternal healthcare, and contraceptives
Looking back at this project, I think that, as a team, we challenged ourselves to explore an extensive amount of data from a new data source, we were intentional about choosing data that was meaningful to us and creating a story to contextualize our findings, and we implemented a variety of different data science approaches we learned about in class this semester. As a result, I believe that our final product is not only a representative demonstration of what we have learned in class from a technical perspective, but is also representative of our group’s “soft skills” (critical thinking, collaboration, contextualization, research capabilities, etc.). With that being said, I believe our group could have approached this work more effectively if we had more strictly defined individual responsibilities at the beginning of the project to ease the workflow later on.
I also believe that, had we done this better, our group would have been able to perform a more comprehensive analysis on our data, possibly by analyzing additional SDG 5 indicators and/or placing more of a focus on computing specific numerical calculations in addition to generating visualizations the way that we did. While our visualizations certainly enhanced our presentation and written report, I think it would have been helpful to calculate summary statistics for each country/year/indicator in order to analyze indicator progress from a different perspective. These were concepts that were definitely covered in class, and might have been beneficial to include in our final cumulative project as a reflection of what we learned. Despite this, I do believe that our final product reflects our entire group’s best effort and provides meaningful insights about our data.
This project served as a key stepping stone to building my programming skills using more complex functions and data analysis, which has made me a more confident programmer as a result.