Data science research can be applied to just about anything you’re interested in. If you’re into sports, data science can be applied to enhance player performance, optimize team strategies, and analyze fan engagement. Love music? You can use data science to analyze musical trends, create personalized playlists, or even develop algorithms for music composition. If you have a love for nature and the environment, you can analyze climate patterns, track deforestation, or even contribute to wildlife conservation efforts through data analysis.
As far as careers go, data scientists may be found at universities, working on cutting-edge research projects, or at tech companies developing innovative applications. In lesser-known settings, data scientists may contribute to social impact initiatives, such as using data analysis to address issues like poverty and inequality. In the non-profit sector, they might collaborate with organizations focused on humanitarian efforts, using data to optimize resource allocation and improve the effectiveness of interventions. In finance, data scientists help detect fraudulent activities and develop predictive models for market trends.
Here are a few data science terms that you’ll probably run into:
Predictive Modeling involves developing models that can predict things like future customer behavior, stock prices, or disease outbreaks. Nate Silver is a statistician and founder of the website FiveThirtyEight, known for his accurate predictions in politics and sports using statistical models.
Natural Language Processing (NLP) enables machines to understand, interpret, and generate human language. Applications include language translation, sentiment analysis, and chatbots. Yoshua Bengio is a computer scientist and one of the pioneers in the field of deep learning. His work has significantly influenced advancements in NLP and machine learning.
Clustering and Segmentation involves grouping similar data points together. This is useful in customer segmentation, identifying patterns in large datasets, and improving personalized recommendations.
Causal Inference research aims to understand cause-and-effect relationships within data. It helps determine the impact of specific variables on outcomes, crucial for making informed decisions in various fields such as healthcare and policy analysis. Judea Pearl is a computer scientist and philosopher who came up with the Bayesian network, a graphical model used for representing probabilistic relationships among variables.
If you want to go into data science, be sure to build a strong foundation in math and programming. Math courses like statistics and calculus and programming languages such as Python or R are the backbones of data science. Since data science can be applied to so many different fields, take the time to see which topics and aspects interest you most. Check out some books that may surprise you. Spend a summer working on a cool project. Work one on one with an expert. Here are more ideas to get you going.
1. Take a Class in High School
If your school’s class options are limited, try some online resources. Platforms like Coursera, Khan Academy, and Codecademy offer great introductory data science classes. Get hands-on experience by working on small projects to apply what you've learned. Kaggle is an excellent platform for finding datasets and participating in data science competitions, offering a chance to learn from real-world problems.
Statistics - Understanding concepts like probability, hypothesis testing, and regression analysis is crucial for interpreting and analyzing data effectively. It provides the foundation for making informed decisions based on data patterns.
Calculus - Calculus helps you understand optimization, rates of change, and mathematical modeling—key concepts that come into play when developing and tweaking algorithms.
Computer Science - A solid understanding of computer science fundamentals, including algorithms and data structures, is vital for programming and implementing data science solutions. It provides the technical skills needed to work with large datasets and design efficient algorithms.
Classes In Your Favorite Subject - Taking courses in business, biology, finance, or any other domain you're interested in will help you develop domain-specific knowledge. This can make you a valuable asset when applying data science techniques to solve real-world problems in that field.
Writing and Public Speaking - Data scientists don't just crunch numbers; they need to communicate their findings effectively. Courses in writing, public speaking, or even graphic design can enhance your ability to convey complex ideas in a clear and compelling manner.
Ethics or Philosophy - As data science involves working with sensitive information, understanding the ethical implications of data analysis is crucial. Courses in ethics or philosophy can help you think critically about the ethical considerations surrounding data collection, privacy, and bias.
Remember, a well-rounded education is key. Combining technical skills with domain-specific knowledge and effective communication will set you on a path to success in data science.
2. Read a Book
Classics like The Elements of Statistical Learning lay the groundwork for understanding machine learning while the more recent The Art of Statistics helps you understand without drowning in math and Weapons of Math Destruction gets into the ethics of data science. Check out more of the books that unlock the door to a captivating data-driven universe.
The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009) - This classic introduces key statistical concepts and machine learning algorithms, providing a solid foundation for understanding the mathematical principles behind data science.
Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier (2013) - This book delves into the implications and potential of big data, examining how it is transforming various aspects of our lives and society.
Python for Data Analysis by Wes McKinney (2012) - Focused on the practical side, this book teaches data manipulation and analysis using Python, one of the most widely used programming languages in data science.
The Art of Statistics: Learning from Data by David Spiegelhalter (2019) - This is a great read for understanding statistical concepts without diving too deep into mathematical intricacies.
Weapons of Math Destruction by Cathy O'Neil (2016) - O'Neil explores the societal impact of algorithms and data science, shedding light on issues of fairness, accountability, and transparency. It's an eye-opener into the ethical considerations within the field.
The Signal and the Noise: Why So Many Predictions Fail – but Some Don't by Nate Silver (2012) - Silver explores the challenges of making accurate predictions in various fields, ranging from sports and politics to economics and weather.
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz (2017) - Stephens-Davidowitz explores the power of big data and how online behavior can be analyzed to gain insights into human behavior and societal trends.
You can also check out data science podcasts like Data Science at Home, Data Skeptic or Not So Standard Deviations to hear discussions on current trends, challenges, and innovations in the field. Towards Data Science on Medium is also a good blog to help you stay informed.
3. Extracurricular Study
When picking out an extracurricular, remember that quality is often more important than quantity. Stick to an activity that genuinely interests you and then dig in.
Hackathons and Data Competitions - Platforms like Kaggle offer a variety of competitions where you can collaborate with others and learn from experienced data scientists. These competitions help you hone your problem-solving skills, encourage teamwork, and allow you to apply your data science knowledge in a pre-built setting so you don’t have to start from scratch.
Journalism for School Paper -Combine data science with storytelling by engaging in data journalism. You can use tools like Tableau or create interactive data visualizations. This activity sharpens your communication skills, helping you convey complex findings in a compelling and accessible manner.
Collaborate with Existing Clubs - Bring your strengths and interest in data science to other existing clubs at your school such as Student Council, Debate, Environmental, or Economics clubs. Work on projects that involve collaboration with students from these other disciplines to broaden your perspective. You can demonstrate how data science can be applied across diverse domains and help solve real-world problems at your own school.
Online Certifications - Enroll in online courses and pursue certifications in data science to complement your high school coursework.
If you’ve exhausted all of your opportunities at school, you can always reach out to local universities, research institutions, or even professionals in your community who work in data science related fields. Express your interest, share any relevant coursework, and inquire about potential opportunities or mentorship. Additionally, explore online platforms like competitions on Kaggle, where you can participate in real-world projects. Stay curious and proactive—your enthusiasm can open doors. Here are some more specific suggestions for getting that research project going.
For a more comprehensive list of opportunities, check out 12 Data Science Research Opportunities for High School Students
Find research programs close to home
We’ll go into summer data science programs in more depth in the next section, but if you want to find all types of established research opportunities close to home, our High School Student Research Opportunities Database is an excellent resource. Click on your state, then search based on your location, institution, event type (in-person or virtual), and tuition (paid or free).
Open Source Contributions
You can contribute to open-source projects related to data science on platforms like GitHub. This not only enhances your programming skills but also provides valuable experience in working on collaborative projects. You can learn from experienced developers, receive feedback on your code, and build a portfolio that showcases your contributions.
Work with a professor
If you have a clear idea of your passions, you can reach out to professors in your field to see if they are open to collaborating with you. Refer to our Guide to Cold-Emailing Professors (written by Polygence literature research mentor Daniel Hazard, a Ph.D. candidate at Princeton University).
Enter a competition
Participate in STEM fairs or science competitions where you can showcase your data science projects. These contests can be both fun and challenging.
Engage in your own research project
Students with initiative and focus can opt to tackle research independently. Carly Taylor, a Stanford University senior who has completed several research projects this way, outlined a guide about how to write a self-guided research paper. By reading it, you’ll get a better understanding of what to expect when taking on this type of project.
Here are some top picks for summer data science research programs. We chose them based on a combination of their affordability, name recognition, social opportunities, and academic rigor.
Data Science Summer Program
Hosting institution: Harvard University
Cost: Free, and stipends are available
Format: In-Person (Boston, MA, and San Jose, CA)
Application deadline: Mid-May
In this 2-week introduction to machine learning, students will build a self-driving toy car. The course starts with lectures on conceptual-level statistical learning, machine learning, and programming components. Next, you will be introduced to various machine learning methods and algorithms and their applications in different fields, including biomedicine. You also learn Python and implement the new concepts you’re learning and the classification algorithms you’re generating into programs. Lunch includes conversations with machine learning experts who’ll share their views and experience with data science. Check the site for the most current application information.
Computer Science Scholars (CSS)
Hosting institution: Carnegie Mellon University
Cost: Free
Format: In-person (Pittsburgh, PA)
Application deadline: Ongoing
This excellent program for rising high school juniors gives students who have historically been excluded from STEM fields the chance to work with leaders in all modes of computer science, including data science. Over 4 weeks, students are exposed to the core elements of programming and problem-solving in Python, including algorithmic components, basic data structures, and problem-solving techniques. You’ll also get to meet industry leaders to learn about opportunities in the field. A nice bonus is that students who complete the program and want to continue may be invited to return as rising seniors to CMU’s AI Scholars program the following summer. Check the site for the most current application information.
Center for Talented Youth (CTY)
Hosting institution: Johns Hopkins
Cost: $919 (varies)
Format: In-person (various sites across the US) and online
Application deadline: Mid-May
CTY offers programs in everything from Astrophysics to Electrical Engineering, Fundamentals of Comp Sci to Probability, and Game Theory. For the in-person option, you can stay on campus or commute daily. In addition to coursework, in-person students staying onsite participate in various social activities, including sports, games, talent shows, movie nights, and much more. Check the site for the most current application information.
For all our picks, check out our 12 Data Science Research Opportunities for High School Students.
If you’re searching for a virtual data science research opportunity, consider doing a project through Polygence with one of our Data Science / Quantitative mentors.
A few of the summer programs we found were either paid or unpaid internships. You can also check with your local community college or local tech, digital marketing, IT, or software development businesses.
DSI Summer Lab
Hosting institution: The University of Chicago
Compensation: Amount not specified
Location: Chicago, IL
Application deadline: Mid-February
In this interdisciplinary 10-week paid summer research program, high school students are paired with a data science mentor to work on a research project. Topics include data science, social science, climate and energy policy, public policy, materials science, and biomedical research. As research assistants, students will engage with and hone their skills in research methodologies, practices, and teamwork. No prior research experience is needed to apply, and in fact, they encourage participation from a broad range of students. Check the site for the most current application information.
2. The Scripps Research Translational Institute (SRTI)
Hosting institution: Scripps Research
Compensation: Unpaid but receive college credit
Location: San Diego, CA
Application deadline: Late March
SRTI promotes advanced personalized healthcare through cutting-edge research, including mHealth monitoring. Their Student Research Internship Program is for highly motivated students interested in health sciences, statistics, and computational/computer science. Interns work with and learn from internationally renowned scientists in genomics, bioinformatics, and digital medicine. The program aims to prepare future leaders in translational medical research.This program does a great job of combining lab work, research projects, and mentorship. Check the site for the most current application information.
A good data science project, as with any passion project idea, should be centered around a topic that genuinely interests you. When you’re passionate about the topic, everything about the work becomes more enjoyable and rewarding, and you’ll give yourself the best chance to fully complete the project. And because data science can be applied to just about any topic (all you need is access to lots of data about it), you have endless options! Sometimes this freedom to choose can be overwhelming, so you may want to sit down and brainstorm a bunch of ideas and then narrow them down based on the amount of time and resources you have.
Polygence Scholars Are Also Passionate About
Here are some ideas our mentors provided along with their relevant data sets.
Recommendation System for Movies, Music, or Books
With this project, build a recommendation engine that suggests personalized content based on user preferences. This is a very relevant project to today’s world because it can help users discover new and relevant content, leading to increased user satisfaction and retention for streaming platforms and online retailers. Here’s a written resource to get you started. Possible data sets: movie ratings (e.g., MovieLens dataset), music listening history, book ratings (e.g., Goodreads dataset)
Climate Change Data Analysis
Analyzing climate data helps us understand the impact of climate change, identify patterns, and assess potential risks. This knowledge is vital for policymakers, scientists, and communities to work towards a more sustainable future. You can conduct time series analyses and data visualizations to see how temperatures or sea levels have changed over time and identify patterns. Possible data sets: climate data from government agencies (e.g., National Ocean and Atmospheric Administrator (NOAA), NASA Center for Climate Simulation), temperature records, sea level data
Web Scraping Projects
Knowing how to scrape data from the web is a very useful skill to have. Building a web scraper allows you to automatically retrieve large amounts of data from specific websites so that you don’t have to do it all manually. You can build a scraper for a ton of use cases, like analyzing real estate data, job market trends, and movie reviews. Be sure to check a website’s terms of service before you scrape. Watch this Build a Web Scraper YouTube video to learn more!
Possible data sets: product information, customer reviews
Check out even more project ideas in this Data Science Passion Project Ideas For High School Students post.
To inspire you and give you a sense of what’s possible under the rubric of data science, we offer up a few examples from our Polygence Scholars.
Using Data Science to Identify What Drives Happiness
Rohaan researched global happiness, proposing regression models to predict scores based on factors from the World Happiness Report. He used the key elements that significantly influenced scores, including economic state and education. These models help us understand the factors that can boost our well-being and offer political leaders insights for decision-making, potentially reducing societal problems. Watch the presentation to learn more.
Predicting Loan Defaults Using Logistic Regression
Selena came to Polygence interested in actuary science (statistics, data modeling, and uncertainty). She had become passionate about this subject after enrolling in classes through FBLA such as Securities & Investments and Insurance & Risk Management. Having taken AP Calc BC and AP statistics, she came to Polygence with an eye towards making her own statistical model, allowing her to develop her data analysis and coding skills. For her project, Selena used known and unknown features pertaining to a loan candidate and the loan to predict the risk of defaulting on a loan through statistical modeling methods in R. Read the resource.
Statistical Model for Identifying Unclear and Doubtfully Restored Signs of the Indus Script
Varun wanted to help decipher a writing system from the Indus Valley civilization by predicting missing signs in texts from 2500-1800 BCE, enhancing the limited and damaged artifacts. Using n-gram Markov chain models on the ICIT Indus text corpus, he analyzed sign patterns and built language models based on them. He achieved a 63% accuracy in matching missing single signs, aiding considerably to translation efforts! Watch the symposium presentation.
Check out more data science projects done by Polygence Scholars.
Polygence Mentor Ben, who is working on hi PhD in data science at Stanford, teaches his students these three critical steps for a data science project: 1) locate an interesting data set, 2) hone in on a research question, 3) analyze and model the data, and finally, 4) communicate the results. He says: “The thing to highlight here is that the data science workflow is incredibly iterative — it's not this sequence of steps that you do in order. Rather it's a bunch of different things that you have to say—Okay, well, I made this graph. Okay, now I understand the data better, and now I can ask a slightly different question, but now I might need to make a new graph. We iterate through these steps in a certain way.” You can learn more about his system and his own data science journey in this Polygence spotlight post.
On a related note, this incredible research primer written by Polygence mentor Ross Greer (a PhD Candidate in Electrical & Computer Engineering studying Intelligent Systems, Robotics, and Control at the University of California, San Diego) can help you get a handle on your written project. Although this computer science resource is not devoted solely to data science, Ross helpfully breaks down the states of research into: 1.) scoping out your topic 2.) working on the project and 3.) completing the project. Dividing what can seem like an overwhelming beast into these three chunks definitely makes the endeavor more manageable. Ross is big on the idea of finding the best project for you—one that takes your skillset, your interests, and your goals into account.
Finally, if you have some ideas and want to conduct data science research with the guidance of a mentor, apply to be a part of our flagship research and mentorship program.
Quick links:
As our Head of Engineering Ádám Gyulavári noted, when it comes to programming, “a research paper or a blog post might not be enough to demonstrate the work you’ve done and the features of the application or program you’ve created.” That’s why he created the very useful post Showcasing on GitHub: The Complete Guide. Other ways to showcase your Data Science research include entering your project into a science competition, attending a conference such as Polygence’s very own Symposium of Rising Scholars, or publishing in science journals such as IJHSR, SFJ, NHSJS, the Curieux Academic Journal, or The Young Scientists Journal. For more showcasing ideas, check out 20 Journals and Conferences to Consider.
Feeling competitive? Be sure to check out our recommendations for competitions in Top 10 Data Science Competitions for High School Students.