Julian S
- Research Program Mentor
PhD candidate at Massachusetts Institute of Technology
Expertise
Data science, statistics, bioinformatics, machine learning (logistic regression, SVM, feature engineering), genomics, projects with genetically-encoded fluorescent sensors, projects in C. elegans, projects in E. coli
Bio
Biologists are increasingly adept at generating a mind-boggling amount of data. So, there's a huge demand for biologists who are well-versed in computer science, math, and statistics. At MIT, I grow bacteria and collect precise measurements of their genes, RNAs, and proteins, which I analyze and use to build mathematical and physical models of how life works. I enjoy research, but my real interest is in teaching: I love helping others grow and do fantastic science. I live a 30-minute bike ride from lab, in a suburb called Jamaica Plain, with my spouse Marianne. I love to cook every night and take excursions into local wildlife parks. I recently bought an espresso machine and still can't do latte art, but I'm trying my best. I'm excited to help you pick a project and develop reasonable and attainable learning goals.Project ideas
Basic Bioinformatics Programming with Rosalind
There are a ton of fantastic biology-focused problems on the free-to-access website Rosalind.info. Let's focus on building solid programming fundamentals and bioinformatics while working through Rosalind problems together!
Assemble a genome!
One of the most fundamental and interesting problems in biology is converting short, usually 20-100bp sequences of DNA into a full genome. There are lots of state-of-the-art programs to assemble bacterial genomes (which are the smallest, so the easiest to assemble on a laptop), including Spades. Work with me to pick out some bacterial genomes to assemble, and then use Spades (and/or other assembly programs) to assemble genomes and see whether you can generate better assemblies than the original papers.
Build a machine learning model to predict protein levels from gene sequence
If I give you the DNA sequence of a gene, can you predict how much protein that gene produces? So far, even state-of-the-art biological models cannot predict how much protein a given gene will produce. However, with modern machine learning methods, we can produce models that work decently well, most of the time. Machine learning is used across all of science, and can be relatively simple to understand and analyze! We can work together to generate data sets with gene sequences and collect existing data on gene expression, and then experiment with different machine learning models to see which ones best predict gene output from gene sequence.
Advanced Project: Codon Optimality
Some DNA triplets encode for the same amino acid--for example, 'GCU' and 'GCC' both make Alanine. Early-on in the history of molecular biology, both of these triplets (called codons) were considered interchangeable--after all, they both make the same thing, so biologists called them 'synonymous'. However, over the last decade it's become clear that not all synonymous codons are treated equally--in fact, when we use genetic engineering to produce medically-important proteins, like insulin, we 'optimize' the DNA sequence to include synonymous codons that tend to produce more protein. This observation is the basis of the study of 'codon optimality', or the idea that different synonymous codons (usually the ones that are most common) lead to different levels of protein production. How much does codon optimality matter at different locations in different proteins? We can study this by analyzing all of the codons of dozens of genomes and asking how often each of those codons appears in the genome overall. Then, we can align homologous proteins in each of those genomes and ask whether the pattern of 'optimal' and 'non-optimal' codons in those proteins varies. In other words: if Protein A has an 'optimal' codon at position 1 and a 'non-optimal' codon at position 2 in one bacteria, does that pattern also hold in another bacteria? Codon optimization is a standard process in the expression of hundreds of different proteins that are important for medicine, so better understanding how patterns of codon optimality vary across bacteria may help us produce important medicines more efficiently.
Let's write a research paper
Is there a topic in biology or bioinformatics that you're interested in learning more about? Want to learn how to bring mammoths back from the dead? Engineer plants that can be grown in the desert? Make humans live forever? I have a broad range of experience in lots of different areas in biology, so let's talk and see what we can do!