Software: Python
The goal of this project is to gain hands-on experience with finding, importing, analyzing, visualizing, and presenting a dataset. The idea is to perform an end-to-end data science project on a realistic task, using Python.
This project utilizes numerous Python functions that operate on the data, each motivated by a question about the data. The code is supplemented with textual discussion and analysis as well as Python visualizations. The final deliverables for the course included a project proposal, a written report, a slide deck presentation, source code, an individual reflection, and a peer review.
Has the quality of music decreased over the last few years?
What genres of music have proven to be most popular over time?
What are the demographics of artists who have been most successful in the music industry?
Hypothesis 1: The quality of music has decreased over the last few decades.
Acceptance criteria: >50% of albums on 500 Greatest list were released before 1990.
Hypothesis 2: The pop genre has proven to be the most popular over time.
Acceptance criteria: “Pop” is the genre with the highest number of albums.
Hypothesis 3: Artists who have been most successful in the music industry are men.
Acceptance criteria: The top five artists with the most albums on the 500 Greatest list are men.
Data:
Rolling Stone’s 500 Greatest Albums of All Time (updated 2012)
CSV file format
Includes rank, album year, album name, artist, genre, subgenre
No apparent ethical issues
Rolling Stone is a reputable magazine
Rankings list relies on votes of selected musicians, critics, and industry figures
Offers reliable and diverse collection of albums
Analysis will reveal hidden trends in music taste and preference
Extrapolate to generational differences and extent of social progress
Individuals can better analyze music behaviors and be more receptive to new music
Combat “exposure effect” explored in music psychology
Impact music sales, streaming services demand, and new/old artist standing
Functions:
def create_keys_values_list(filename)
Isolates column 0 of the csv file (album rank), with each element in this column becoming a dictionary key. The key's values come from the rest of the row.
Returns: A 2D list of [key, value] "little lists"
def make_dictionary(list_name)
Creates a dictionary from a 2D list of values
Returns: A dictionary with album rank as keys and year, album name, artist, genre, and subgenre as values
def year_counts(dictionary_name)
Extracts album years and counts the total number of albums released before/after 1990
Returns: The percentage of albums released before 1990 and after 1990
def genre_counts(dictionary_name)
Extracts album genres and counts the total number of albums per genre
Returns: Dictionary containing album genre as keys and the number of albums in each genre as values
def count_genders(filename, dictionary_name)
Opens the given csv file, reads in the gender data, and adds the gender to the corresponding values list to the album dictionary
Returns: An updated album dictionary with gender in values list
def top_genders(dictionary_name, n=5)
Extracts artist genders and counts the number of artists per gender for the top n artists.
Returns: The number of top n artists that are male, number that are female, number that are mixed (both male and female members)
def plot_year_frequencies(dictionary_name)
Extracts album years and counts the frequency of each year. The function then plots the year and its frequency in a barplot.
Returns: None
def plot_genre_frequencies(dictionary_name)
Plots album genres and their frequencies in a barplot.
Returns: None
def plot_gender_frequencies(dictionary_name, n=5)
Plots artist genders and their frequencies in a barplot
Returns: None
if __name__ == "__main__"
Uses predefined functions to conduct analysis on “albumlist.csv” and “genders.csv”
Graphs created by question:
Bar plot (years, year frequencies) : trend of music quality/popularity over time
Bar plot (genres, number of albums) : which music genres are most popular
Bar plot (gender, number of artists) : demographics of top-selling artists
Function breakdown by question:
Has the quality of music decreased over the last few years? Yes
78.2% of albums on the 500 Greatest list were released in or before 1990 and 21.8% of albums on the list were released after 1990.
The album years that appeared most frequently on the list ranged from roughly 1967-1977 based on the graph.
This proves the hypothesis that quality of music has decreased over the last few decades.
2. What genres of music have proven to be most popular over time? Rock
“Rock” was by far the most popular genre with 249 occurrences on the list, followed by “Funk/Soul” with 38 occurrences, “Hip Hop” with 29 occurrences.
The genre graph depicts a steep drop-off in frequency after the “Rock” genre.
This disproves the hypothesis that the pop genre has proven to be the most popular over time.
3. What are the demographics of artists who have been most successful in the music industry? Male
“Rock” was by far the most popular genre with 249 occurrences on the list, followed by “Funk/Soul” with 38 occurrences, “Hip Hop” with 29 occurrences.
The genre graph depicts a steep drop-off in frequency after the “Rock” genre.
This disproves the hypothesis that the pop genre has proven to be the most popular over time.
Figure 1
Figure 2
Figure 3
If I had more time…
Find dataset that includes racial and ethnic information to cross-reference (or manually compile racial/ethnic data)
Utilize different kinds of graph types and modify different graph settings, such as color, markers, labels, etc., to find best graph display
Conduct further research on the process behind the 500 Greatest list and explore other reputable data sources
The most important thing I learned in this project was the importance of managing my time well. Although it was assigned to us at the beginning of the semester, I found that I often wasn’t motivated to work on the project well in advance because there were not too many deliverables until the final project submission, which made the days leading up to the project due date more stressful. Despite these initial difficulties in time management, however, once the final deadline became closer, I was more motivated to work on the project and successfully divided up the project to work on a different part each day. I found I was most productive when splitting my workload between working on code and working on the report as I engaged different parts of my brain with each task and was able to avoid being overwhelmed.
I also made an effort to challenge myself in my code as I not only incorporated topics covered in lectures/practicums, but also integrated concepts not covered in class to manipulate the data that made it easier to present to the user (for example, using the sorted function). Looking back, I wish I had done a bit more research on the demographic information of artists that is readily available. As discussed in my report, because there were no databases with ethnic/racial information for these artists, I had to exclude this data in my analysis to avoid the long process of obtaining it manually. Overall, I feel as though this project played a key role in introducing me to real-world data analysis and allowed me to get comfortable working on my own as a new programmer.
Arvidsson, J. (2023, October 4). Rolling Stone’s 500 Greatest Albums of All Time. Kaggle. https://www.kaggle.com/datasets/joebeachcapital/rolling-stones-500-greatest-albums-of-all-time/data
Chris, K. (2022, September 13). Sort Dictionary by Value in Python – How to Sort a Dict. freeCodeCamp.org. https://www.freecodecamp.org/news/sort-dictionary-by-value-in-python/
Meyers, C. (2012). Influences on Music Preference Formation. ResearchGate. https://www.researchgate.net/publication/239522710_Influences_on_Music_Preference_Formation
Wikimedia Foundation. (2023, November 13). Rolling Stone’s 500 Greatest Albums of All Time. Wikipedia. https://en.wikipedia.org/wiki/Rolling_Stone%27s_500_Greatest_Albums_of_All_Time
Zach. (2021, August 24). How to Generate Random Colors in Matplotlib Plots. Statology. https://www.statology.org/matplotlib-random-color/