Stephanie Yan

Stephanie Yan

Class Of 2024

I studied human genomics and evolution in Rajiv McCoy‘s lab using computational methods. This was a pretty big change from my undergraduate research, where I worked entirely at the bench on yeast cell biology and didn’t know how to code at all. Part of the reason why I chose to come to CMDB was that I liked the wide range of research available in this department, and I also liked that students have to do four rotations in their first year, which really forces you to explore research outside of your comfort zone.

I definitely didn’t come into the program thinking that I would join a fully computational lab, but I learned enough basic coding through our first-year courses to rotate in Rajiv’s lab, and I ended up loving the research and deciding to join. It was a bit of an intimidating transition, and the learning curve in the first year or two was pretty steep — there were a lot of times that Rajiv would mention something that’s a core concept in human genetics and I would be like, “I’ve never heard this word in my life” and have to frantically google it. I also wrote a lot of really bad code for years before I figured out what best practices are (and I’m definitely still learning!).

My PhD Research

I spent my PhD studying the evolutionary impacts of genetic variants and regions of the genome that are difficult to analyze with traditional sequencing methods. In my main project, I focused on identifying structural variants (SVs), which are large insertions, deletions, and rearrangements of DNA. Because standard sequencing methods generate sequencing reads that are very short, these larger genetic variants can be hard to find, and most discovery methods based on short reads tend to miss about 50-80% of SVs. Around the early 2010s, new sequencing methods started being developed that generate much longer sequencing reads — these make it much easier to discover structural variants, but these methods hadn’t really been applied to more than about a dozen samples at a time. I was interested in understanding how structural variants have impacted human evolutionary history, but at the beginning of my PhD, the long-read sequencing datasets weren’t large enough to ask those questions effectively.

To get around this, I took advantage of graph genotyping, a computational method that allowed me to assemble datasets of structural variants (discovered with long-read sequencing) and identify those variants in existing short-read sequencing datasets. This allowed me to identify SVs in a dataset of 2,504 individuals called the 1000 Genomes Project, which includes samples from across all four inhabited continents. With this data, I was able to search for structural variants that looked like they had undergone historical selection — specifically, looking for the signature of an SV that was very common in one ancestry but rare in others. We identified several candidates, but the most exciting one was a sequence that underwent positive selection around 8,000 years ago in southeast Asia, and was originally inherited into humans from Neanderthals! Since this sequence is in a region of the genome that contains immunoglobulins (genes that code for antibodies), we think that populations in this geographical region experienced some selective pressure related to the adaptive immune response, which caused this Neanderthal-origin sequence to undergo such strong selection.

The CMDB Community

I liked that CMDB was a smaller department where everyone knew everyone, but not so small that it felt claustrophobic. CMDB is also a very student-driven program — grad students lead almost all the research in the labs here, which makes the department very focused on grad student training and also provided me with a lot of opportunities to learn different skills throughout my PhD, including grant writing, giving talks, teaching, and leadership and management.

I also ended up in a very supportive and fun lab environment with an equally supportive mentor, which was by far the most important factor that contributed to me having a positive experience in grad school. It makes a huge difference if your work environment is filled with people who genuinely celebrate all of your successes, are happy to help whenever you have a question, and who make you feel comfortable asking for support. symBIOsis, a CMDB student support organization that I was on for several years, hosts an advising meeting for first years every spring on how to choose a thesis lab. Whenever I go to that meeting, I always tell the first years to try to find a lab where you feel comfortable making mistakes!

Outside of CMDB, Hopkins is generally a really great place for human genetics and genomics research, which I had no idea about when I decided to come here but has benefited me a lot throughout the years. Rajiv’s lab was adopted into the JHU genomics community pretty much right after he arrived, and we do all of our lab communications through a huge Slack workspace that encompasses dozens of genomics labs across all of the JHU campuses. It’s been really helpful to have immediate access to experts on structural variants, long-read sequencing, and any other topics that came up in my research, and everyone in the community is extremely generous with their time and willing to answer questions or collaborate.