MDJ Review of the Year: "Deep Genotyping" - What third-generation sequencing can teach us

November 03, 2025

Episode:272

Series:Research Article Awards 2025

Dr. Sarah Camargos interviews Dr. Guillaume Cogan and Prof. Alexis Brice, the corresponding authors of the article selected as the Movement Disorders Review of the Year. Listen as their conversation dives into the new sequencing technologies, and the broader diagnosis they bring to the field.

Dr. Sarah Camargos: Welcome to the MDS Podcast, the official channel of the International Parkinson and Movement Disorder Society. I am Sarah Camargos, associate editor of the podcast. Today we are joined by Dr. Guillaume Cogan and Prof. Alexis Brice from the Paris Brain Institute. They are the corresponding authors of the article, selected as the Movement Disorders Review of the year.

View complete transcript

The title of the article is, Exploration of Neurogenerative Diseases Using Long Read Sequencing and Optical Genome Mapping Technologies. Congratulations to both of you for this important nomination. Dr. Cogan, tell us a little bit about your background and what inspired you to write this paper.

Dr. Guillaume Cogan: Hi. Thanks for having us. I'm a medical geneticist and now [00:01:00] I'm doing a PhD at the Paris Brain Institute. And we had a project together from the Paris Brain Institute with the National Institutes of Health, with our colleagues that are also co-authors of the paper, Kensuke Daida, Cornelis Blauwendraat, and Kimberly Billingsley.

So basically we had a cohort of unsolved PD cases, which had blood exome sequencing. And we had familial cases and also early onset cases and we wanted to try to identify something like the mutation that is causing the disease. And to do so, we used long read sequencing. I will go into the details of that i n the future. And we had an interesting case, we wanted to report this to movement disorders, so we did. The editor asked us to do a review about long read sequencing, but also optical genome mapping to get a review of the new technologies new genetic technologies that we can use in neurodegenerative disorders.

Dr. Sarah Camargos: Very nice. I think genetics, it can sometimes feel like a challenging topic for [00:02:00] neurologists. So let's take a step back and start with the basics. We are going to talk about the type of variants that cause neurodegenerative diseases, single nucleotide variants, structural variants, and repeat expansions.

Could you please explain these techniques used to study these kind of variants.

Prof. Alexis Brice: Yes, I can start and Guillaume will complete. I think what's really important is the fact that long read sequencing means that you can analyze long fragments of DNA rather than having something like 150 to 300 base pairs read for each fragment you have something like tens of kilo bases, and this changes a lot.

Because you can detect a lot of the [00:03:00] rearrangements, which are not detected by usual techniques. And this means, for instance, when there is an inversion in the gene you can see the the junction points. You can sequence them. You can detect more easily duplications or deletions.

And that's a topic also for many neurodegenerative diseases. And as well repeat expansions. The small ones can be captured by classical techniques, but once they exceed the size of 300 base pair, they cannot. So with long read sequencing. Again, you can pick up this type of variants.

So that's the most important aspects. But there are other ones. You can, for instance distinguish individuals who have two variants in a gene. You can say whether they are in cis or in trans. And for recessive disease, [00:04:00] they have to be in trans if you want to make sure that they're responsible for the disease.

You can also have applications for gene and pseudogenes when they are highly homologous long read sequencing allows again to sequence independently the gene. So that's really the basic and we can certainly provide a few examples if you want.

Dr. Sarah Camargos: Very nice. Especially when I read your paper, I found very interesting that you use these examples to show the advantages of long read sequencing such as you have mentioned pseudogenes and phasing. Could you please share with us the story about the siblings and twins with PRKN variants?

Dr. Guillaume Cogan: Yes. So our colleague from NIH, Kensuke Daida, first had this two siblings with a phenotype that was compatible with Parkinsons disease. [00:05:00] So the siblings had an early onset, a slow progression of the disease and using I think it was exome sequencing. So short read sequencing.

They had a pathogenic single nucleotide variant but just one variant, so it's not sufficient to explain the disease. And then they use long read sequencing and they identified an inversion in the second allele, which is explaining the disease. And the inversion was very large. It's seven mega basis.

So it, it explains why they didn't get it first. And then from our cohort. We had also two siblings with autosomal recessive form of the disease compatible with PRKN, again, with the phenotype. And we first used the conventional sequencing methods multiple ligation probe amplification, targeted sequencing with exome sequencing.

And we had a deletion of exome four, but only one mutation again, so that was not sufficient to explain the disease. We used, again, long read sequencing, and we found that something very interesting. So on the first study we [00:06:00] had a deletion of exome three and exome four, and on the second added we had a duplication of exome three.

So overall we have two copies of exome three, which is normal, right? I hope we all have two copies of exome three here.

I hope and this is why the other sequencing tools were not able to see that. And only, let's say only long read sequencing could identify this.

And then Kensuke and colleagues extended this study to, I think it was 23 individuals with one PRKN variant and an early onset of PD, and they were able to solve a quarter of cases using long read sequencing. So this is something that is meaningful, I think. And neurologists could think about it when they have a patient with one mutation in PRKN and a phenotype that is compatible.

Dr. Sarah Camargos: And you were only guided for the phenotyping. This is very interesting. So you, digged a little bit just to see if there was a structural variant that could explain the other in trans [00:07:00] variation.

Is that correct?

Dr. Guillaume Cogan: Yeah.

Dr. Sarah Camargos: Amazing. Besides discovering the new genes, another interesting aspect to explore is the repeat expansion characterization.

How is this characterization relevant for understanding the phenotype or genetic counseling?

Prof. Alexis Brice: I think there are at least two aspects for the repeat expansions. First it's the size and appropriate sizing is very important because there is usually a threshold above which a repeat can be said to be pathogenic. And this varies a lot according to the disorders. And for some of them, I have to say that the threshold might be valuable is still disputed, but at least above a certain value, you are sure that it's pathogenic.

So the first is the sizing, which is absolutely [00:08:00] crucial for diagnosis. And the second aspect is the sequence of the repeat. Because it turns out that at some loci is not only the size, but also the composition of the repeat, which is important. And there are alternative composition, some of which are pathogenic and others which are well tolerated and not associated with the disease.

With long read sequencing, you get both with a single stone. You have both the size of the repeat and the composition, and therefore you can say whether the repeat is pathogenic or not. And this is very important for many of the recently identified repeat as FGF 14 or RFC 1 for instance.

Dr. Guillaume Cogan: Yes, maybe I can give examples about it. FGF 14 is a very good example. So it's responsible for [00:09:00] spinocerebellar ataxia number 27 B was identified quite recently and we know that expansions of GAA that are lower than 200, they are not pathogenic. However, when the repeat is above 300, we know that it's pathogenic and using short read sequencing because the size of the fragment is of the read is about 150.

We cannot know anything that is longer. We can just say that there is an expansion above 150, but we can't say if it's above 300 or not. So we can't say if it's pathogenic and we need another technology to, to say that. But using long read sequencing, we can tell more precisely the size of the expansion and we can say if the expansion is above the threshold or not.

Another example of the importance of the motif so it's a gene responsible for the well-known syndrome cerebellar ataxia neuropathy vestibular areflexia syndrome, so the CANVAS. And we know that long expansions of [00:10:00] AAAG, so it's five nucleotides. They are not pathogenic. However, expansions of motif AAGGG are pathogenic and using long read sequencing, we have the motif so we can know if it's pathogenic or not.

And I think something also important is the presence or absence of interruptions. And we discuss this in the paper. So I think a good example is spino cerebellar ataxia two. So it's a CAG expansion. And if in this expansion of CAG, we have an interruption or several interruptions of CAA instead of CAG, we know it's responsible not for spinocerebellar ataxia but Parkinson's disease.

The motif is the presence or absence of integration is important. For the phenotype, but also for the age onset, the penetrance, the inheritance of the disease and the severity and the type of phenotype.

Prof. Alexis Brice: So we certainly can expect to [00:11:00] identify more of these repeats in the future. And we know that in neurological disorders there are particularly frequent.

So I think that, by sequencing many more cases, we certainly will discover unknown mutations in the future.

Dr. Sarah Camargos: And also you were able to check the methylation. You can understand a little bit more about the gene expression.

Dr. Guillaume Cogan: Absolutely using the main methods of long read sequencing. We can identify the methylation, and this has of course, several implication in diseases because usually it's not always the case. But usually hypermethylated elements in the DNA have less expression compared to hypomethylation.

And in the context of neurodegenerative disorders. I think this is interesting. For example, if we have the gene. Just studied this in for NOTCH2NLC, which is responsible for when [00:12:00] you have a GGGC, expansion in the five pre material region. It's responsible for a disease called Neuronal Intranuclear Hyaline Inclusion Disease, and they found that it's very interesting, unaffected parents of children affected by this disease had longer expansions compared to offspring. So it's not expected, right? But using long read sequencing still, they found that this GGC expansion in the parents was hypermethylated. So the gene was less expressed compared to the lower expansion. And this expansion is pathogenic because it's leading to RNA 4 C that leads to a sequestration of RNA binding proteins.

So basically if you have a lower expression, you have less RNA 4C and then you don't have the disease. It's very interesting to see that methylation allows us to understand why some mutations are not pathogenic compared to others.

Dr. Sarah Camargos: Amazing. Very interesting. [00:13:00] Let's talk a little bit about, targeting in long read sequencing. In April, 2024, we interviewed Prof. Houlden and Dr. Zhongbo Chen using targeted long read sequencing to describe SCA4. Could you remind us how this method work and if there is another methodology for targeting instead of sequence everything.

Dr. Guillaume Cogan: Yeah. So it's a good question because you can use the long read sequencing in several aspects. So you can do whole genome sequencing but you can also do targeted sequencing. And I would say there are, in the paper we talk about three methods to do this. So first you can use Cas9 based method.

So you just catch the region of interest using CRISPR Cas9, and then you only sequence this region of interest. So this is the first one. Another one is also simple. It's long range PCR. So you amplify your region of interest using long [00:14:00] range PCR together. But of course if you use this, you can have the amplification bias because you use a polymerase chain reaction, right?

So this is the second one, and the last one is provided only by Oxford Nanopore Technologies. So the last one is adaptive sampling and it's quite interesting.

So basically you have your strand of DNA, which is going through the pore. And the pore will analyze the first 100s base pairs. So for example, 400 base pairs, and it'll see if these base pairs are in the region of interest that you told the sequencing machine to sequence. So if it sees that this fragment is not a fragment of interest, it ejects the fragment.

So it will only sequence all the way through the pore fragments of interest. So those are the three methods. Cas9 based, long range PCR, and ONT adaptive sampling.

Dr. Sarah Camargos: Very nice. Professor Brice, I get the sense that long read sequencing is almost like the genetic version [00:15:00] of deep phenotyping, kind of a deep genotyping. Do you agree?

Prof. Alexis Brice: I fully agree, with the deep phenotyping, you find things that you didn't see because you didn't look for them properly. And I think that's really important for neurologists to be able to perform this deep phenotyping, which can help diagnosing. And here it's exactly the same because the tools we used until long read sequencing were not able to detect some of these variants.

And now that we have this tool, we can pick up these variants and we can improve diagnosis. So it's really exactly it's very similar except that the cost of deep phenotyping might be less than long read sequencing at the moment, at least.

Dr. Sarah Camargos: Yes. Speaking about the challenges when it comes to long read sequencing what are the [00:16:00] big challenges for us? Besides the cost.

Dr. Guillaume Cogan: Beyond the cost the challenges are first, the wet lab. So the wet lab protocol is not yet standardized. Also, the bioinformatic pipeline to call the variance is not standardized. And and also you need a lot of data storage capacities because it generates hundreds of giga bases.

So you need good storage capacities. Also good GPUs and CPUs to call the variance is very computational intensive and at the end of the chain, let's say to interpret the variance, it's also harder because we have way more variance. At least we are confident in our variance because, for example, if you do short read sequencing, you try to analyze structural variance, you know that a lot of them are just false positives. But using long read sequencing, you know that most of them are true positives. However when we analyze variance, if you want to identify the cause of the disease of a patient, you want to remove variance that have high [00:17:00] frequency that are common, right in all of us only to select variants.

The thing is that for long with sequencing, we don't have catalogs such as Genome 80 for short read sequencing. So it's hard to filter short read variance. Nevertheless, there are some collaborative projects, for example, the 1000 Genome project that is sequencing hundreds if not thousands of healthy controls to provide us population databases, allowing us to feature the variance according to the frequency.

And, finally, having a bit of experience with it, it's sometimes frustrating because you're like, I sequence all the types of variance I have as a structural variance SNDs, short read repeats, but I don't find the genetic mutation. But I know it's here somewhere, but I just, I can take maybe one example.

For example, we have in intronic structural variants that may affect the splicing of a gene, but we don't have any tool to predict if it's affecting the splicing or not. So we hope that in the future bioinformaticians [00:18:00] will develop those tools so that we can finally identify the cause of the disease of all people with a genetic disorder, let's say.

Dr. Sarah Camargos: Or even sequenced all the RNA too.

Dr. Brice or Cogan: Yeah, this is another. Using long sequencing. Yeah, we can identify new isoforms as we talk about in the paper.

Dr. Sarah Camargos: And also in your paper you explored the possibilities of optical genome mapping. Explain us how this technique work and what are the main advantages.

Dr. Guillaume Cogan: So it, it's quite different compared to long read sequencing because it's not a sequencing. So basically you just label your DNA at some canonical sequence, short sequences, and then you have a microscope that is looking at the distance between your labeled tag of the DNA. So with that you can identify structural variants that are above 500 base pairs.

So you can't see that anything below that. That's one thing. You [00:19:00] can't also see SNDs, single nucleotide variant using optical genome mapping. However, it's quite interesting because the depth, the coverage is better compared to long read sequencing. You can get 150 coverage depth using OGM compared to usually 20 to 30 x with long read sequencing.

And, and yeah, that's the main thing of OGM I would say. So some people compared OGM and long read sequencing. And for long read sequencing, it's hard to identify variants, structural variants that are above 50 kb. However, OGM is good to identify those kind of mutations. So I would say if you have money, you can use both and that would be that would be the best.

But you need money to do

Dr. Sarah Camargos: Of course.

Prof. Alexis Brice: No, I mean, you also need high molecular weight DNA and that's sometimes a limitation for these techniques. And classical biobanking sometimes use extraction techniques [00:20:00] that do not provide such DNA. So it's something you have to take into account at least to, for the prospective cohorts or samples that you are getting.

Dr. Guillaume Cogan: Yes. And something also for OGM, I think a limitation of OGM that is good to know for listeners is that it's very hard to get the precise location of break points using OGM. You can't get it usually. It's between 6 KB to 15 kb. So it can be important in medical genetics because using OGM, you can say that a structural variant is encompassing an exome, whereas it's not so it's not a false positive, but it's just not as accurate for structural variance as long read sequencing.

So again, it's good to use both if you want to use OGM, yes.

Dr. Sarah Camargos: So you predict these two technologies would or will become the first year approach to diagnose hereditary neurogenerative disorders in the future?

Prof. Alexis Brice: Provided that the [00:21:00] money is there to pay for them. Yeah, I think that, they clearly allow to solve a greater proportion of cases and they are therefore very useful. I think if we want to have an approach a first and last tier approach where a single technology can provide the the results.

Dr. Guillaume Cogan: Yeah, there's still a long way to go to get these technologies as first here, I would say. Yeah.

Dr. Sarah Camargos: Yeah. Very nice. Before we finish is there one key message you both would like our listeners to take away from this paper?

Dr. Guillaume Cogan: Yeah. So I think you need to know your project and really what you wanna see before using these technologies. And I think in our article, we tried our best to give examples so that the reader can understand what he can get from this technologies. And I think yeah, reading this article is a way of better understanding this and use it in an [00:22:00] appropriate way for each for someone's project. Yeah.

Dr. Sarah Camargos: Yes. So doctors thank you so much for joining us. And sharing your insights today. Thank you to all our listeners for being with us on the MDS Podcast. Stay tuned for our next episode, and until then take care and goodbye.

Dr. Guillaume Cogan: Thanks for having us.

Special thank you to:

Guillaume Cogan, MD
Paris Brain Institute
Paris, France

Alexis Brice, MD
Paris Brain Institute
Paris, France

Host(s):

Sarah Camargos, MD, PhD

Movement Disorders Unit
Hospital das Clinicas, Universidade Federal de Minas Gerais

Belo Horizonte, Brazil

All Episodes