Daniel G
- Research Program Mentor
MSE at University of Pennsylvania (UPenn)
Expertise
Data Science, Machine Learning, Data Engineering, Analytics, Electroencephalography (neuroscience)
Bio
Hi! I'm Dan, and I'm currently finishing my Master's Degree in Data Science at the University of Pennsylvania. I'm interested in a wide array of applications of machine learning and data engineering, such as image classification and natural language processing. I'm a big proponent of using research as a means to teach yourself new skills, and if you have a cool idea I'd be happy to review what skills you might need and how to go about learning them. I'm also quite into neuroscience (my expertise is in electroencephalography) and have dabbled in some amateur game development, so the rare few of you that might have data-centric projects in those fields might be a particularly great fit. That being said, I am really looking to mentor anyone who wants to learn more about data regardless of context. I look forward to meeting you!Project ideas
Sentiment Analysis
Ah, words. Aren't they great? We live in an age where people can write about nearly anything they want, and in doing so express their opinions more freely than ever before. Turns out, a lot of businesses are actually quite interested in those opinions! The problem, of course, lies in the fact that reading individual written reviews is not exactly scalable (try reading 100,000 reviews and tell me otherwise). Turns out, we might be able to get computers to read for us! Or at least, extract insights deeper than "this review has the word 'good' in it." In this project, you will use machine learning to predict whether a review of something (in the domain of your choice) is positive or negative. In doing so, you'll learn about the tricky world of natural language processing and how to overcome some basic problems involving deciphering language. If you are particularly ambitious, we can even cover a bit of deep learning (neural networks). This project will likely utilize mainly Python and its associated data libraries (pandas, matplotlib, sklearn, etc.)
Combining datasets to extract insights
Data comes in many different places, and is often most powerful when combined together. This project is simple and open-ended. Find two or more datasets regarding some topic of your choice that you think might add additional insight when taken together. Your goal will be to join those datasets together and find out something cool! Depending on your ambition/comfort with Javascript, HTML, and CSS, you can even try creating a basic dashboard that allows other people to find out information about your topic. For instance, I once created a dashboard that combined housing data from Zillow with US Census data and a dataset of business information from Yelp to create an app that would help prospective movers find areas that fit their culture on a certain budget. This project will likely make heavy use of SQL, as well as Python for preprocessing.