Sourcing, Processing, & Presenting

The Three Levels of our DH Project

Sourcing

Our primary dataset is “Music & Mental Health Survey”, which is available from the Kaggle website, composed of both quantitative and qualitative measures to analyze the correlation between mental health symptoms and music-listening habits. Additionally, we used another dataset called “Global Trends in Mental Health Disorder” from the Kaggle website, containing quantitative data on mental health disorders across different regions. Lastly, our secondary sources included several articles and readings that highlight the different aspects of how music and mental health intertwine with one another, as well as whether it improves or worsens mental health conditions.

In order for us to provide a holistic view for analyzing what factors of music might influence mental health scores, we took a dual approach, utilizing a combination of the provided dataset and real-world experiments. For example, our analysis draws on a study conducted by Kurniawan and others (Utilizing Machine Learning to Identify Mental Health Issues Based on Music Genre Preferences, 2025), which employs machine learning methodologies to identify mental health conditions such as anxiety, depression, insomnia, and OCD based on people’s music preference. Ultimately, it examines the relationship between musical preferences and mental health outcomes using Logistic Regression, Decision Trees, and Gradient Boosting. The study explains that computational models are capable of predicting an individual’s mental health status based on their preferred music genres and listening patterns (Kurniawan et al., 2025). However, this reading has some limitations due to unbalanced data and a lack of relevance of information, which made it more difficult to improve the results. Nonetheless, this source is particularly important to our research because it uses data-driven research methodologies to connect musical habits to psychological traits. Furthermore, it provides valuable insights into the challenges associated with modelling techniques and data.

Moreover, we were interested in the research between the relationship between music habits and mental health because music is prevalent in the development and the lives of many, becoming an integral part in the human experience. Music engagement not only shapes our cultural and personal identity, but plays a significant role in regulating complex moods. In an article published by the University Wire, Lauren Wisdom explores this relationship by emphasizing how music serves as a coping mechanism and a useful tool for emotional regulation. According to Wisdom (2020), around 40% of students with apparent mental health conditions avoid seeking help due to the stigma surrounding mental health, so she uses their testimonials to explain how music has improved mood regulation, relieves stress, as well as benefit the mental health of an individual. This article stresses the importance of music being a therapeutic outlet for those who struggle with a condition that can be rather complex, which inspired us to deeply research the intertwining of two important aspects of one’s life.

Processing

The dataset we drew from, titled Music and Mental Health, was obtained from the Kaggle domain with survey findings compiled by Catherine Rasgaitis. The dataset includes 736 observations and 33 variables extending over a wide range of key factors, including demographics and self-reported mental health symptoms surrounding anxiety, depression, insomnia, and OCD. While the existing raw data was considered clean to an extent, it required several stages of processing and re-coding to ensure the findings were presented in a more functional format for statistical analysis as well as visualization. The heart of this process involved manually removing entries with incomplete responses and N/A values listed under key variables such as “Age,” “BPM,” and “Music Effects.”

Once the raw data was cleaned, we crafted a smaller subset containing specific variables that best pertained to our research questions and comprehensive scope of our project. From there, we re-coded each frequency response, such as “Never”, “Sometimes”, and “Very Frequently” into numerical scales ranging from 0 to 3 to facilitate quantitative analysis as well as comparison abilities across genres. During this process, we employed a simplified modification of lengthy column names (i.e.; “Frequency..Hip.hop.” to “hiphop”) in order to clarify the breadth of our analysis and maintain consistency within our data. As we worked through the columns, we sought to emphasize the various levels of musical energy through making key adjustments to elements involving our datasets variables.

Initially, we structured two composite variables, “High Energy” and “Low Energy,” which averaged the listening frequencies of high energy genres, including hip hop, EDM, and rock, as well as low energy, such as classical, lo-fi, and gospel. Similarly, we re-coded binary variables, such as “While.working”, “Instrumentalist”, and “Composer,” from “Yes/No” to “1/0” to allow for numerical processing. The final stage of this process involved standardizing each variable name into a lowercase format as well as verifying that each column was numeric.

Prior to exporting the dataset, we reviewed each element of its structure and analytics to ensure the dataset was cleaned and structured in a manner best suited for our research scope. Through documenting each step of the data cleaning process on R-markdown, we ensured there was not only full transparency, but strong consistency among the teammates. Ultimately, our final cleaned dataset, titled “music_clean.csv,” allowed us to cultivate more interpretable visualizations that link various behaviors surrounding music and mental health in a full-scale model.

However, we soon came to realize that our original dataset lacked the temporal and geographic data needed to meet the project’s requirement of creating one timeline and map. Quickly, we found a second dataset from the Kaggle website titled Global Trends in Mental Health Disorder, containing information from different countries across the world and the prevalence of mental health disorders such as anxiety, depression, alcohol use, schizophrenia, bipolar disorder, eating disorder, and drug use disorder from the years 1990 to 2017. To make the data functional for our project, we cleaned the dataset on R-markdown. Specifically, we filtered any numeric years that fell out of the 1990-2017 range, then used commands such as pivot_longer() to create new columns for the different countries, country code, years, and the average total of all mental disorders. This data cleaning process allowed us to easily produce a timeline that showcases the mean percentage of all mental disorders throughout the years in each region

Presenting

As we began creating our website, our team was granted access from UCLA to a HumSpace domain by WordPress for the website design. Our team set up weekly meetings to discuss website logistics, as well as explore visualization options to optimize the overall experience of our project. In addition to weekly check-ins that involved reviewing project milestones and setting clear goals for the week, we maintained consistent text-based communication in our group chat to pose inquiries and discover collaborative solutions. We delegated work to each team member based on their skillset, combining everything later in a more cohesive manner.

Our team sought to evaluate and revise our work when needed to ensure that our information and data was in alignment with our project’s core objectives. In designing the structure and format of our website, we engaged in discussions surrounding the layout and presentation of each feature, as well as how to best organize the content. For example, we chose to include technical information surrounding the curation of our website and research process in our “About” page, as well as incorporating brief descriptions of each team member and an acknowledgement section for project contributors. We followed a similar structure for each menu segment on our website, ensuring the content on each page reflected its listed label, promoting viewer readability and navigator ease.

Each member contributed to the evolution of our website, from sharing ideas regarding the official logo and cover, to choosing a color scheme that best reflected our data. This collaborative approach allowed for a multifaceted spectrum of artistic and analytical contributions, in which the diverse perspective of each team member came to fruition in the final presentation. This stylistic blend culminated into a cohesive, yet distinct presentation that aligned ideally with our team’s chosen field of research: arts and entertainment.