Joshua P
- Research Program Mentor
PhD candidate at Harvard University
Expertise
Bioinformatics, computational biology, immunology, stem cell biology, cancer biology, robotics, and data science.
Bio
I am a PhD student at Harvard, where I combine computer science and wet lab biology techniques to develop new methods to manufacture cancer-killing immune cells for cancer therapies. I study how immune cells (e.g. T cells) form in the body by applying advanced statistical methods (e.g. machine learning) to genomic data of cells and tissues donated by patients. Inspired by what I learn from the patient data, I design and execute experiments that test new ways to create therapeutically effective immune cells from stem cells in the lab. If successful, these cells can be used to specifically target cancer cells in patients with difficult-to-treat cancers. Outside of my lab, I like to explore datasets related to public health, transportation, and housing to learn about how public policies effect our daily lives. I have been an Arduino and Raspberry Pi hobbyist and have completed many robotics projects over the past 10 years. When I need a break from coding or working the lab, I enjoy visiting towns, cities, and nature spots in the Northeastern US.Project ideas
Bioinformatics analysis to discover the molecular nature of a disease (e.g. cancer) and propose new drug targets
Recent years have seen an explosion of publicly available genomic datasets related to human health and disease. Among the most powerful datasets available today are single cell RNA sequencing datasets, which capture information about which genes are active in thousands of individual cells. By studying the differences in scRNAseq data between healthy cells and those with a disease (e.g. a tumor vs normal tissue), we can learn about how the disease works on a molecular level and even identify new drugs to treat the disease. This bioinformatics approach can be applied to any disease with publicly available scRNAseq data, such as many forms of cancer, diabetes, Alzheimer's, and asthma. Datasets can be downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and the European Nucleotide Archive (https://www.ebi.ac.uk/ena/). Students conducting a project of this type will gain skills in gathering publicly available sequencing data, genomics analysis and visualization in Python, and statistical tests used in biomedical research. Along the way, students will learn about the fundamental biology of the disease they choose to study.
Using machine learning to predict disease mortality rates from publicly available data and identify potential public health interventions
The Center for Disease Control (CDC) maintains a database of underlying causes of death for each state (CDC Wonder: https://wonder.cdc.gov/ucd-icd10.html). By gathering additional data about individual states (e.g. smoking rate, occupation distribution, education levels, sales of certain products, weather patterns, etc), we can develop a machine learning model to predict death rates for a specific disease (e.g. lung cancer) or multiple diseases. The US government maintains many databases containing variables that can be used as predictors. Following the development of a machine learning model, the student researcher can perform statistical analyses to identify important predictors of the disease and propose potential public health interventions to reduce disease incidence. Students pursuing a project of this type will learn how to access publicly available public health databases, develop machine learning models in Python, and perform statistical analyses to identify significant predictors of disease.