STEM Program

Decoding the Human Blueprint: Advanced Machine Learning for Genomics and Medical Data

Faculty Advisor: Postdoctoral Researcher, Department of Biomedical Data Science, Dartmouth College

Research Program Introduction

In an era where AI powers everything from social media to medical breakthroughs, this Biomedical Machine Learning Practicum introduces students to the science behind intelligent algorithms. Using R/RStudio, students will explore publicly available biomedical datasets, learning to clean, visualize, and analyze complex data while applying machine learning techniques such as clustering, dimensionality reduction, and classification.

Through guided instruction and independent exploration, students will design their own data science project, applying a machine learning method to a biomedical problem—such as predicting disease outcomes or identifying genetic patterns. The program culminates in a professional research-style report that showcases both analytical and communication skills.

Ideal for 9–12th grade students interested in data science, AI, or biomedical research, this program provides hands-on experience with the same tools used by scientists and researchers today—equipping students to think critically, code confidently, and understand how data drives modern discovery.

Possible Topics for Final Project:

  • How can gene expression data from The Cancer Genome Atlas (TCGA) be used to predict patient survival outcomes using machine learning models?

  • What do single-cell RNA-sequencing datasets reveal about immune cell diversity in tumor microenvironments?

  • Can spatial transcriptomics data helpa identify how cell location influences gene activity in brain tissues?

  • How accurately can population-level datasets like the UK Biobank predict cardiovascular disease risk using supervised learning methods?

  • What patterns emerge when comparing COVID-19 clinical datasets across demographic groups using clustering and dimensionality reduction techniques?

  • How can unsupervised ML algorithms uncover hidden subtypes of cancer or neurological disorders from large-scale genomic data?

  • Or other topics in this subject area that you are interested in, and that your professor approves after discussing it with you.

Program Details

  • Cohort size: 3 to 6 students

  • Workload: Around 4 to 5 hours per week (including class and homework time)

  • Target students: 9 to 12th graders interested in Data Science, Artificial Intelligence, Biomedical Research, or other related areas.

  • Schedule: TBD. Meetings will take place for around one hour per week, with a weekly meeting day and time to be determined a few weeks before the start date.