Polygence Scholar2024

Vincent Qin

Class of 2026San Jose, CA

About

Projects

"Sentiment Analysis for Youth Mental Welfare" with mentor Jason (Sept. 3, 2024)

Vincent's Symposium Presentation

Project Portfolio

Sentiment Analysis for Youth Mental Welfare

Started May 10, 2024

Abstract or project description

Mental health concerns among youth are becoming increasingly prevalent, with 20% of United States adolescents experiencing mental health problems[1]. A potential indicator of mental health concerns includes when a person’s texts express overwhelming sadness or hopelessness. I present a comparison of methods to determine the emotional polarity of text. The models are trained on the Stanford SST2[2] and IMDb[3] datasets as they are based on movie reviews, which exhibit particularly apparent emotions. The data is then encoded using a Bag-of-Words (BoW) strategy by only encoding the 10,000 most common words. I tested five models: a decision tree, a random forest, the Adaboost classifier created with scikit-learn[4], a feedforward neural network with two hidden layers created using the PyTorch module[5], and a fine-tuned version of the model DistilBERT[6]. The results are cross-validated by dividing the data into 10 shards, training the model on nine shards, testing it on one shard, and then repeating this procedure for every shard. Finally, the models’ accuracies were compared. The DistilBERT model had the highest overall accuracy (94.89%), which made it the most suitable for large-scale classification tasks. However, the DistilBERT model has very high learning (613 m) and inference times (38 ms) which makes it inefficient for smaller tasks. Instead, I recommend the slightly weaker neural network and Adaboost models. Although they have lower accuracies (88.79% and 80.21%, respectively), their short learning time (~1-2 hours) and inference times (<10 ms) are suitable for smaller tasks that can be manually verified.