Polygence blog / Research Opportunities and Ideas

15 Data Science Passion Project Ideas For High School Students

8 minute read

Ever wondered what it really means to be a “data scientist”? Sure, all science uses data, but data science is a field all its own—one that’s focused on making sense of massive amounts of raw information. Think of it as a unique blend of statistics and computer science, where the goal is to uncover insights from data sources like business trends, social media, or even weather patterns. And because data science can impact fields like healthcare, urban planning, and disaster response, it’s a powerful tool for real-world change.

For high schoolers curious about data science, completing a hands-on project is one of the best ways to dive into this fast-growing field. Whether you work on your own, connect with a data scientist mentor, or explore data science summer research opportunities, you’ll gain valuable technical skills and experience in a field that’s transforming industries around the world.

What Makes a Good Data Science Project?

A good data science project, like any passion project, should be centered around a topic that genuinely interests you. When you’re passionate about the subject, everything about the work becomes more enjoyable and rewarding and you’ll give yourself the best chance to go through with completing the project.

An effective data science project uses relevant and reliable data that aligns with its goals. Without quality data to analyze, the project’s impact is limited. Similarly, a successful project involves careful attention to cleaning and preparing the data for analysis.

After preparing the data, an ideal data science project idea presents a thorough analysis supported by relevant visualizations and statistical measures. The value of the project comes from the insights and takeaways that you can derive from your analysis. 

Remember, data science passion projects don’t have to be perfect from the start. Small adjustments are natural as you progress, but as long as the topic excites you, you’ll find a way to make it work.

If you’re a high school student exploring data science, then joining our Polygence Pods can connect you with like-minded students and an expert data scientist mentor. Together, your Pod will embark on a six-week research journey designed to help you explore data science in a collaborative, flexible environment. We’re currently assembling Pods for various data science topics—let us know which ones interest you most!

Do your own research through Polygence!

Polygence pairs you with an expert mentor in your area of passion. Together, you work to create a high quality research project that is uniquely your own.

What are Some Data Science Project Ideas?

1. Investigating the Relationship between Air Pollution and Health Outcomes in Rural and Metropolitan Areas

This project would involve obtaining publicly available data on air pollution levels and health outcomes (e.g., hospital admissions for respiratory illnesses, mortality rates, lung cancer prevalence/incidence). You could then analyze the data to determine if there is a correlation between air pollution levels and negative health outcomes. You could also explore the potential impact of factors such as socioeconomic status, age, or sex/gender on the relationship between air pollution and health outcomes.

Possible data sets: air pollution levels, hospital admissions data

2. Predictive Stock Market Analysis

This project aims to predict stock prices and identify market trends by analyzing historical financial data and sentiment from news and social media. By accurately forecasting stock movements, investors can make informed decisions about the stock market.. Some data science methodologies for tackling this problem are time series analysis, Long Short-Term Memory (LTSM) for sequence prediction, and sentiment analysis using  Natural Language Processing (NLP) and machine learning (ML).

Possible data sets: historical stock prices (e.g., Yahoo Finance API), financial news articles, social media data (e.g., Twitter API)

3. Social Justice Engagement Project

An example of a social justice engagement project could be if there is a new crime law in your community, use a dataset released from your community to help portray whether this new law has positively affected your community or not. This would require using data visualization graphs to report findings in an interesting and interpretable way. You will also use fundamental statistical tests to validate your results. If the findings are interesting, the findings can be written about in a blog post and/or be reported to an elected official in your community.

Possible data sets: municipality records, studies and reports (e.g., United States Census Bureau surveys)

4. Recommendation System for Movies, Music, or Books

With this project, build a recommendation engine that suggests personalized content based on user preferences. This is a very relevant project to today’s world because it can help users discover new and relevant content, leading to increased user satisfaction and retention for streaming platforms and online retailers. Here’s a written resource to get you started.

Possible data sets: movie ratings (e.g., MovieLens dataset), music listening history, book ratings (e.g., Goodreads dataset)

5. COVID-19 Data Analysis

Analyzing COVID-19 data allows us to gain insights into the pandemic's progression, track the effectiveness of public health measures, and identify regions that require additional support. This data-driven approach is crucial for policymakers and healthcare professionals to make informed decisions in managing the pandemic and dealing with future potential pandemics. Methodologies for analyzing this project can include data visualization, time series analysis, geographical mapping, and epidemiological modeling. Here’s a resource from the CDC that goes more in-depth into epidemiological modeling and why it matters.

Possible data sets: COVID-19 case data (e.g., Johns Hopkins University dataset), vaccination data, mobility data (e.g., Google Mobility Reports)

6. Customer Churn Prediction

Predicting customer churn is essential for businesses to retain valuable customers. By identifying factors leading to churn, companies can proactively address issues, enhance customer satisfaction, and improve their services, ultimately increasing customer loyalty and profitability. Some data science techniques that you could look into learning to do this project include logistic regression, decision trees, random forests, and gradient boosting. These techniques are more advanced data science methodologies, so consider this project if you’ve already had experience with data science projects.

Possible data sets: customer usage data for specific companies

7. Climate Change Data Analysis

Analyzing climate data helps us understand the impact of climate change, identify patterns, and assess potential risks. This knowledge is vital for policymakers, scientists, and communities to work towards a more sustainable future. You can conduct time series analyses and data visualizations to see how temperatures or sea levels have changed over time and identify patterns.

Possible data sets: climate data from government agencies (e.g., National Ocean and Atmospheric Administrator (NOAA), NASA Center for Climate Simulation), temperature records, and sea level data

Dig deep into that code

Interested in Computer Science? We'll match you with an expert mentor who will help you explore your next project.

8. Predicting Air Quality

Predicting air quality is essential for public health and environmental protection. By forecasting air quality, authorities can implement measures to reduce pollution and minimize health risks. For this project, you can perform regressions and time series forecasting to analyze how air quality has changed over time and maybe even compare between specific regions or cities in the US.

Possible data sets: air quality data from environmental agencies (e.g., Environmental Protection Agency (EPA), weather data, pollutant concentration records

9. Healthcare Fraud Detection

Healthcare fraud imposes significant financial burdens on healthcare systems and compromises patient care. Detecting fraudulent activities using data science methods helps save costs, preserve resources, and maintain the integrity of healthcare services.

Possible data sets: healthcare insurance claims data with fraud labels (e.g., Kaggle Healthcare Fraud dataset)

10. Social Network Analysis

Social network analysis helps us understand the structure and dynamics of relationships in social media apps. This knowledge is valuable for marketers, policymakers, and sociologists to identify influencers, target audiences, and study the spread of information.

Possible data sets: social network data (e.g., Meta Graph API, Twitter network data)

11. Web Scraping Projects

Knowing how to scrape data from the web is a very useful skill to have. Building a web scraper allows you to automatically retrieve large amounts of data from specific websites so that you don’t have to do it all manually. You can build a scraper for a ton of use cases, like analyzing real estate data, job market trends, and movie reviews. Be sure to check a website’s terms of service before you scrape.

Watch this Build a Web Scraper YouTube video to learn more!

Possible data sets: product information, customer reviews

12. Housing Predictions

Predicting house prices is crucial for homebuyers, sellers, and real estate investors. By understanding price trends and factors influencing housing costs in their area, buyers and sellers can both make well-informed decisions in the real estate market. This project will likely require regression techniques.

Possible data sets: housing price data, real estate listings

13. Transportation Traffic Congestion Analysis

Analyzing traffic congestion patterns helps to optimize urban transportation and reduce commuting time. For this project, you have the option of analyzing either your hometown or any town/city that’s of interest to you. You should be able to find local traffic databases for the specific town or region that you’ve chosen. For example, here are traffic data and statistics for the state of Texas.

Possible data sets: traffic count studies, traffic congestion trackers, Bureau of Transportation Statistics

14. Food Recommendation System

A food recommendation system helps people discover new recipes or restaurants that align with their preferences and dietary needs. A data science skill that would be helpful for this project and all recommendation systems in general is collaborative filtering, which is a technique that can filter out items that a user might like based on the reactions from similar users.

Possible data sets: recipe databases, restaurant reviews

15. Energy Consumption Forecasting

This could be an interesting project for you if you’re interested in climate change and sustainability. Forecasting energy consumption enables better energy resource planning and allows better optimization of energy production, leading to cost savings and environmental benefits. Again, this kind of project will use techniques like time series forecasting and regression.

Possible data sets: historical energy consumption data, weather data

Polygence Scholars Are Also Passionate About

How Do I Choose the Right Data Science Project?

The right data science passion project is one that excites you and feels meaningful! The ideas above are just a few examples across various fields and industries, and you may also want to check out our passion project ideas, including artificial intelligence, environmental science, and music research projects, for more inspiration. Remember to choose a project based on your genuine interests rather than its complexity or impressiveness. 

Certain projects may involve more advanced data science techniques than others. If you’re new to data science, starting with simpler analyses like single regression might be best. However, if you’re ready to dive into more advanced methods, don’t hesitate to challenge yourself!

If you’d like to pursue a data science project with the guidance of an expert mentor in a group setting, Polygence Pods is a great option. These short, six-week online programs connect you with a data scientist mentor and a small group of peers who share your passion. Through a blend of lecture and discussion, your mentor will help guide the group and support you in developing your individual project. Past students have explored projects ranging from climate science to sports performance, all in a collaborative, fully online environment. Which data science topic are you most excited about?

Polygence Core vs  Polygence Pods Program

Do Your Own Research Through Polygence

Your passion can be your college admissions edge! Polygence provides high schoolers a personalized, flexible research experience proven to boost your admission odds. Get matched to a mentor now!"