Data Critique

Overview

Our team used two datasets in this project. Our primary dataset was the “Music & Mental Health Survey” by Catherine Rasgaitis on Kaggle. Our supporting dataset was the “Global Trends in Mental Health Disorder” by The Devastator on Kaggle.

Screenshots of the Music and Mental Health Survey (left) and Global Trends in Mental Health Disorder (right) from Kaggle.


Primary Dataset: Music & Mental Health Survey

Our team’s dataset comprises quantitative and qualitative measures, as well as various contextual factors across participants, to determine the correlation between music listening habits and self-reported mental health symptoms. Published on Kaggle’s database, Catherine Rasgaitis’s Google form titled, “Music & Mental Health Survey,” survey provides self-reported data from a broad pool of participants in the United States. The public survey was distributed through various unspecified online channels, such as social media platforms and forums, encouraging voluntary participants to provide feedback on their music listening habits and their perceived emotional/psychological effect. This study is an independent academic project in which the survey data were collected, anonymized, and organized into CSV format, and later publicized on the Kaggle platform by Rasgaitis. While the dataset offers a wide range of qualitative and quantitative components involving participants’ listening behaviors, our team chose key elements to draw our research from, focusing on variables that show direct associations between music and mental health. These include the variables “Hours per day,” “Age,” “Primary streaming service,” “While working,” “Foreign languages,” “Frequency by genre,” “Anxiety/Depression/OCD scores,” and “Music effects.”
Our dataset includes measured results of the participant’s demographic (age, gender, location). Additionally, it captures our participants’ music listening behaviors, such as the type of preferred genres, the amount of time spent listening, and whether they engage in multitasking while listening to music. We examine whether or not music listening habits are associated with improvements or decline in self-reported mental health conditions. Moreover, we want to see if different genres may correlate with varying levels of mental health symptoms. Further, this information illuminates broader patterns beyond merely the relationship between participants’ music habits and mental health conditions, providing a nuanced approach to analyzing how individuals may cope with psychological stressors in their daily life through the arts, potentially pioneering greater research into the clinical impacts of music as a therapeutic tool for addressing this nation’s mental health crisis.
While our dataset provides comprehensive data on a numeric and contextual level by examining the relationship between music listening habits and self-reported mental health conditions, it lacks fundamental information. Specifically, correlation between variables in a study does not necessarily mean causation. Since our data was based on a self-reported survey, there are underlying third variables that have not been accounted for or controlled for. For instance, there may be various sociological, biological, and environmental factors that may have already influenced an individual’s mental health. A particular participant may have already had a higher susceptibility to anxiety, stress, or depression simply due to epigenetics. As our study was also conducted online in an English-speaking language, it does not capture data from an international perspective, leading to less generalizable results. Mental health conditions may also be defined and interpreted differently depending on the cultural context.
Thus, our dataset’s framework compares both music and mental health variables through a quantitative lens, which can be both advantageous and limiting. If our dataset were to be the only source available, such advantages include providing a specific measure in comparing variables such as hours of music listened to and the various mental health scores. This could illuminate specific relationships and patterns that might not be so apparent without a closer look through the quantitative lens.
However, the structure of our dataset can lead to an oversimplification in complex psychological conditions, ultimately attempting to categorize real, lived experiences into numerical data. Hence, our dataset essentially promotes a data-driven ideology, leading to assumptions that nuanced emotional and psychological functions can be diminished to measurements. This further calls into question the degree to which this dataset alone can accurately predict mental health outcomes, because it is unclear if music habits and mental health variables are stable and universally measurable constructs. The prioritization of these variables also overlooks the broader demographic, social, and socioeconomic factors that could be critical for the context of how individuals experience both music and mental health. The dataset still remains a great starting point in analyzing the relationship between music and mental health, yet these reflections remind us that additional context and outside research sources are necessary in order to more confidently generalize the findings.

Supporting Dataset: Global Trends in Mental Health Disorders

Our final project required an additional dataset containing time identifiers to construct a global timeline and world map that diversified the representation of our research. On account of absent temporal data in our original “music_clean.csv” file, we ultimately chose to incorporate an entirely new dataset that closely ties to our team’s scope, while introducing broader variables to expand on our previous model. While our initial dataset follows a more qualitative approach, with an emphasis on the connection between musically-related variables (e.g.: BPM, instrumentalists v. non-instrumentalists, listening duration) and listeners’ reported mental health outcomes, it was missing crucial temporal and geographic information necessary to apply its findings to broader trends across different regions. In providing a supplemental dataset that accounts for previously missing variables surrounding geographics and chronology, we ensured our project not only aligned with the required criteria, but allowed for a robust representation of our research with a stronger contextual foundation.
Similar to our other model, we discovered this dataset on Kaggle under “Global Trends” with the subfield “Mental Health Disorder” (Global Trends in Mental Health Disorder). This model incorporates not elsewhere classified (NEC) categorical systems that track the prevalence of various mental health disorders across different countries, spanning from 1990 to 2017. Among additional variables, these findings include a wider range of mental health conditions in comparison to our prior set, providing yearly percentages for schizophrenia, bipolar disorder, eating disorders, anxiety, depression, as well as drug and alcohol-abuse disorders. Each value in our set represents the average percentage of regional populations experiencing that condition for their given year across international countries, granting us a more precise framework for visualizing long-term trends on a global scale.
To shape the dataset into a viable model, we began cleaning it through employing key measures on R-markdown. During this process, we first selected the columns containing “(%)”, in concurrence with the country (i.e., “Entity”) and correlating year variables. We then utilized pivot_longer() to reshape the arrays so each row represented one country – year – disorder combination, rather than simply storing each disorder in its own column. Reconstructing this format was a crucial step to ensure feasibility for calculating the yearly averages. In addition to implementing multimodal measures within each array, we converted the “Year” and “Percent” columns into numeric values, while filtering measures falling outside of the 1990–2017 range. To procure meaningful results, we preserved only valid percentages between a tighter range of 0 and 10, allowing for more precision and order. Lastly, we grouped the data by country and year, and computed the mean percentage across all disorders within our set. This final cleaned framework allowed us to create a timeline plot comparing the mental-health trends of five countries: the United States, Brazil, Japan, India, and the United Kingdom.
While this dataset was incredibly advantageous in presenting comprehensive measures and illuminating trends within our timeline and world map, it has its own limitations. For instance, the percentages in this set reflect disorder prevalence but do not explain their underlying causes, and they do not differentiate by age, gender, or socioeconomic context. This leaves room for misrepresentation and potential bias within our visualizations, as key measures or demographics are unintentionally hidden or erased. In addition, some countries we included are missing values entirely or reflect inconsistent reporting across the timespan, which could potentially influence the accuracy of our averages. Another unforeseen limitation within this set involves components of the model itself, as it treats each disorder equally when computing the mean, despite some conditions naturally occurring less frequently than others. Despite these minor drawbacks, the general patterns and expansive scope this data provides is crucial for bridging data from our previous model and, therefore, constructing a comprehensive framework for our research. 
Ultimately, this additional set has granted our team invaluable context for conceptualizing how mental-health burdens shift over time on a global scale. In conjunction with our prior model, it serves as a critical tool for incorporating rigorous qualitative and quantitative elements to our project. Specifically, this dataset provides the temporal and regional structure that our original music survey did not, expanding our analytical framework to factor in localized, thematically aligned findings. While incorporating this supplemental set made it possible to create geographic and chronological representations that visualized patterns, it also served as a beneficial tool within our team’s research. This set essentially brings our framework to life and paints a larger picture of world-wide mental health trends and their fluctuations over time. Our final dual model conceptualizes our findings in a manner that reinforces our project’s purpose, paving the way toward a comprehensive analytical framework that could bolster existing literature surrounding mental health variables, not only predicting mental health outcomes, but potentially propelling advancements within clinical or therapeutic mental health ministrations.