Sai Surya Likith Gogula's cover illustration

Polygence Scholar2024

Sai Surya Likith Gogula

Class of 2025Tracy, California

About

Projects

"How does machine learning help with detection of suicidal symptoms on social media and identify the cause behind mental health problems in school and workplaces" with mentor Ishan (Nov. 22, 2024)

Project Portfolio

How does machine learning help with detection of suicidal symptoms on social media and identify the cause behind mental health problems in school and workplaces

Started June 3, 2024

Abstract or project description

I want to use data online to create a machine learning project that focuses upon 2 things given the specific type of data it's given. The first part of the project I want to focus on is, if the model is given social media data that includes tweets, messages, and other forms of online communication, it should be able to isolate those with depressive and suicidal tendencies and separate those that are neutral, normal and happy. All this forms under the suicide detection within the first part of the research question For the second part of the research question, I want to create a machine learning project, that if given of the datasets provided which proves a background of what everyone does for example for a workplace: their age, their salary, the amount of time they put in, how many hours they work, etc and for example for a school: how old are they, how many hours they put in, how many Ap classes they take, do they participate any factors. Then with the survey data that either confirms depression anxiety or not, then use the background information of the people with the depression and create a predictive model that uses background identifiers and factors that lead a person to have depression. For project 1, I plan on creative a detection model which takes input data such as text from a dataset and classify these texts into the categories of either the person being suicidal or the person being non suicidal. We plan on using a LLM model for these classification. If it doesn't work perfectly with the LLMK model we chose such as ChatGPT, we plan on training the model using LLM fine tuning techniques to make sure the model has a perfect classification rate. For the project 2, we took on the task of identifying tech work depression detection. For this project we use background information for those who seem to have depression and based on their specific background information to create a predictive model upon this data to see if they would have depression.

Project Dataset 1: https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch
This data set sets up some data from social media messages and factors into which ones are suicidal texts and which ones are normal. It includes columns of text and classification

Project Dataset 2: https://www.kaggle.com/datasets/osmi/mental-health-in-tech-survey
Includes data from a survey from a tech workplace that includes all the background on a person and in the end shows whether they have depression or not

Examples for project 1:

Sam : Computer Science major, 45 hours put into work, 23 years old, 300 employees into the office, he doesn't have depression Alex : Medical Major, 60 hours put into work, 45 years old, 100 employees in the company, he does have depression Mercy: Engineering Major, 30 hours put into work, 33 years of age, 300 employees in the company, she doesn't have depression Pause: Computer Science major, 50 hours put into work. 34 years of age, 100 employees, he does have depression

Conclusion for dataset #1: The machine learning model should factor that the number of hours and number of are relatively high and lower compared to the other data entries so it should consider them a detection of factors of detection and then integrate these in a predictive model layout to determine if a person has suicide for not:

Number of hours worked > 46 = Depression
Number of employees < 100 = Depression

Example of project 2:

Alex: Grade 12, he took 4 aps, participates in 4 clubs, and he had depression Paul: Grade 11, he took 3 aps, participates in 3 clubs, and he had depression George: Grade 10, he took 1 aps, he participated in 2 clubs, and he has no depression Case: Grade 12, he took 3 aps, he participated in 3 clubs, and he has depression

Conclusion for dataset #2: The machine learning model should factor the grade, the number of Aps nd number of clubs are relatively around the ame for those with depression so it should consider these factors as depression factors and integrate them into the predictive model

Grade > 12 = Depression
# Aps > 2 = Depression
# Clubs > 2 = Depression