View complete transcript
Dr. Mata is the PI from the Latin American Research Consortium on the genetics of Parkinson's Disease or large PD. We are going to discuss their recent paper published on Movement Disorders journal, entitled X-Chromosome Association Study in Latin American Cohorts Identifies New Loci in Parkinson's Disease.
Welcome, Dr. Mata and many thanks for your time. [00:01:00] Before we go into the article, I would like to ask you to explain us what exactly is the large PD and what are their goals.
[00:01:10] Ignacio Mata : Thank you for you for inviting us to be here. So large PD or the Latin American research consortium genetics of Parkinson's Disease, as you mentioned, really is a collaboration going on since 2006, quite a long time already. The goal of this is really to understand the genetic component, although we also look at the environment and, other factors in neurological disorders in Latinos.
So Latinos are very under represented in genetic studies. So the idea of this collaboration was really to, put those people in the big population within the world and we want to make sure that they were represented so right now we have 40 centers in 13 different countries all across Latin America and Caribbean and we had over 6,000 individuals, both Parkinson's patients and also healthy controls that we need for our studies, from a lot of these different countries.
[00:01:59] Sarah Camargos : Perfect.[00:02:00] What was the large PD outcomes so far.
[00:02:03] Ignacio Mata : We've been working for a long time, right? So as the numbers have been increasing, we've been able to do, a lot more things. At the beginning, we really looked at familial forms of Parkinson's disease, mutations so causal variants in these forms that were identified in mostly Europe and the United States, and see if they were present in Latinos, we saw that we could find synuclein mutations, we could find parkin mutations, we also found several LRRK2, two mutations. So, the things that we all know that are causing Parkinson's in families are also present in Latin America. And as Thiago is gonna explain later, latinos are very complex genetically. They have an mix of many different populations. So one of the things that we observe very early on is that a lot of these mutations that have been found in Europeans are actually present in Latinos through conquistadors. So a lot of them are coming from a common ancestor that happened in Europe, a long time ago, especially the LRRK2 mutations so that was one of the first things that [00:03:00] we , did and as the numbers started to increase, and people started paying more attention about, diverse cohorts, in research we were able to do the first genome-wide association study. So we did this with some funding from the Parkinson's Foundation in 2016. We were able to genotype 1500 individuals which is the same cohort that Thiago will explain that we've used on this study as well, and what we found is that obviously in a genome-wide terms, this genome-wide association studies usually have tens of thousands of people when you do it in Europeans, for example we didn't have those big numbers, so it was a little bit under power. But even with very small cohort, we were able to see synuclein as the top, risk gene for Parkinson's disease, which is something that we saw in Europeans. The Asian gene also showed a synuclein was, one of the top hits. So it was, it was very similar to what we have found in other populations. But, in addition to that, we were also able to nominate a new, gene that was actually associated with Native American ancestry. [00:04:00] This region in the genome is close to a gene called NRROS. We don't know much about this, this gene just yet, but it seemed like it could be important, and it could be population specific to, Parkinson's disease in individuals that have these Native American ancestry like many countries in Latin America, like Peru, Bolivia, Ecuador, a lot of those countries that have a lot of Native American, this might be a, risk factor. We also saw that, polygenic risk score, calculated, with European data was actually effective at distinguishing between cases and controls in Latinos.
[00:04:34] Sarah Camargos : Perfect. So the PRS calculated for European works for Latinos.
[00:04:41] Ignacio Mata : Yeah, this was a surprise because, modeling, so computer modeling , not real data sets have shown that if you use polygenic risk score, which is, just to explain for, the audience. So, polygenic risk score is really just taking the aggregated of all the risk factors that are in the genome. So in Parkinson's, we have almost [00:05:00] 100 different regions of the genome that are involved in risk some of them increase the risk, some of them reduce the risk. So polygenic score takes for each individual, how many of those variants we actually carry. And then it makes a calculation of if you have a, genetic component, or if you have a very small, genetic component depending on what your PRS or polygenic risk score is. A lot of the papers were saying that because most of the data that is publicly available comes from Europeans. If we try to do this for any disease, not only for Parkinson's, but cardiovascular disease, if we try to do this polygenic risk scores using data from Europe, they don't translate very well to other populations just because the variant, the genetic variants might be different or the effect might be different.
So these models might not be really good to predict, if somebody's gonna develop the disease in other populations. So when we try the results from the Nalls et al. Paper from 2019, which is the largest, GWAS that has been done in Parkinson's disease, that only included European individuals when we take that data and then [00:06:00] calculate polygenic score for Latinos, it actually worked really well, surprisingly.
We went really in depth to analyze how was that possible? And we found out that there was one variant in synuclein that actually accounted for 73% of the effectiveness of that polygenic risk score. If you take that variant out, the polygenic risk score doesn't work at all. And this is a variant that is much more common in Asians, and Latinos than it is Africans or Europeans. And the other thing what we found that is very interesting is that these variants, so the same variant that is in both populations is actually surrounded by a completely different set of variants in latinos compared to Europeans, which we might think that it might actually modify the effect of this variant.
So that means that the variant itself is important but also knowing what all the in the same genome are might actually explain why some people might have a high risk or a low risk to develop the disease. It was quite interesting to do this study and, that hasn't been done, before, so it was quite novel.
[00:06:58] Sarah Camargos : So it's great.[00:07:00] It's a new venue I think for Parkinson's disease. And Professor Mata why and how X chromosome came up to your eyes?
[00:07:10] Ignacio Mata : Yeah, so the x chromosome, a lot of us have X chromosome data but we usually tend to exclude it, if you look at all the GWAS that have been done in other populations or in other diseases even you see that they only look from the chromosome one to the chromosome 22. The chromosome X and the chromosome Y are completely ignored, and something that we like to do in our lab is to study those things that are underrepresented, right? So we study Latinos because nobody does. So we decided to also study the X chromosome 'cause nobody does, and then also because Parkinson's as other neurological disorders affects one sex more than other.
So the males have a little bit of increased risk compared to women at developing Parkinson's. So we thought that maybe the X chromosome could carry some important information to explain, those differences. The reason why we [00:08:00] don't study the X chromosome is really not because we don't want to do it, its because it's quite hard.
The doses is different between males and females. The recombination rate is different also between males and females. There's also the x inactivation that we can tell when you genotype, you don't know which of the variants might be in activated. And then another thing that Thiago might mention is that in Latinos it's even more complicated because the ancestry is different.
The X chromosome ancestry is actually different to the autosome, chromosome, so that increases the difficulty of analyzing it. But Thiago did a great job and he had to develop a lot of new, pipelines to be able to analyze this data.
[00:08:36] Sarah Camargos : Oh, very nice. See, very interesting. Dr. Mata, thank you. And Thiago welcome. And many thanks for your time. Can you tell us about the population background of your study in detail?
[00:08:49] Thiago Peixoto Leal : So first of all, thank you for the invitation, I'm really happy to be here. So the data set for large PD phase one that is the data that was generated in [00:09:00] 2016, is composed by 1,500 individuals for five different countries in Latin America, from Brazil, Chile, Colombia, Peru, and Uruguay. The Latin American populations are well known by the genetic diversity that is a consequence of centers of interaction between populations from Africa, Europe, and also the Native Americans. So, all the Latin American populations share these three parental populations, but the socioeconomic and geopolitical dynamics make each population unique on terms of genetic background. So for example, the Brazilian population, in our dataset has the biggest African contribution while the Peruvian people, has the biggest Native American ancestry and also we have Europeans in other cohorts, there is some cohorts that almost has any African ancestry. So Latin America is this beautiful heterogeneity and[00:10:00] at the same time, that it's good, it's bad because you have to account this, in your analysis, you cannot assume that your population is homogeneous, that your population follows the rules that GWAS was follows, because you always has this heterogeneity that can cause some problems in some analysis.
[00:10:20] Sarah Camargos : Interesting. I wasn't aware that the background was important, for GWAS for the roles you were saying. Please tell us what are the technical limitations for genome WAS, with chromosome x.
[00:10:34] Thiago Peixoto Leal : First, what is the premise of genome and association studies. In this kind of study, you assume that differences in allele frequency is caused by the disease. So the idea is, if this allele here is more represented in my case than my controls, this is associated with my disease, but there is several revolutionary force that can cause difference in allele frequencies that is not associated [00:11:00] with the disease, and ancestry is one of them.
So, we have to account this in our analysis that we do on large PD.
So our XWAS not the first. We have in 2021 Le Guen from Stanford, he published the first XWAS on Parkinson's disease and they found two regions associated with Parkinson's disease risk in European cohort. And on that time we have here on our lab, the doctor
Is now the Dr. Shapiro, but on the time she was a PhD student, she was already studying the difference between PD males and females and Came this article and we say, oh, let's see if we can see the same variants, if we can find new variants in Latin population, because they just use the European individuals that are highly homogeneous.
They selected a very specific subset of European population. So in this point, we started our project and we [00:12:00] start to facing some problems because the first problem is the Le Guen et al pipeline was made for a very homogeneous population, and our populations are about the Heterogenetic heterogeneity, and also our sample size is not so great.
It's not so big. But when you do XWAS, you want to see if you find some variants associated with females and males to see if there is a use that is associated with a single say like these are, increase the risk on males, but not in female males. So you have to do a lot of segregated dataset by sex. So our sample size that is already small for GWAS, it's became even small to XWAS. So, what we do was we started to make a lot of plans to adapt to the Le Guen et al model and also to the outcome of these problems that show to us.
[00:12:51] Sarah Camargos : How did you overcome these limitations? For XWAS?
[00:12:55] Thiago Peixoto Leal : So, our first step was to study the Le Guen et al pipeline. [00:13:00] Like the pipeline was really good, but they had a step that they remove individual that are heterogenetic. And the first thing that we have to do is remove this step because we will finish with any samples. So, after this, we need to understand why he was doing this step because anyone exclude samples by any reason, and we have to adapt our model to try to control for the problems that he do excluded the heterogenetic samples. So, our dataset use the MEG array platform that has a good coverage. So one of the problems that Nacho told that the some genotype arrays platform is not so good was solve it to us. We try to minimize the impact of a small sample size by running two different analysis, one that we called the LARGE-ALL. It's in few words, we merge everyone that was on large PD and run a single analysis. So we have our big sample size, but the big sample size has a problem [00:14:00] that the difference on ancestry could cause, some false positives or false negatives.
So we run another analysis to try to control this. We run the regression by country, and after this we meta analyze this data. So that we call the large method. So we have two different data sets for each analysis that we did. So in the end we have to do eight different regressions and we also had several tests about methodological things that probably if I talk someone that the computational side as me will be really excited but our audience is more about people that are not so interested in this. So if you are interested in this, go read my paper because everything is there, clear.
[00:14:47] Sarah Camargos : Okay. Thank you so much. But I'm glad you are explaining everything in a more, understandable language for us. What associations did you analyze and what did you find in this [00:15:00] XWAS?
[00:15:01] Thiago Peixoto Leal : So as we run 8 different analysis, we have two data sets. The LARGE-ALL, again, have all the samples and the LARGE-Meta that we run a regression and a meta analysis. So for each one of these two data sets, we run analysis using just the females to see if we find females variants associated with the disease. We run just with males to see if there is some variants associated with the gene on males. We merge both data sets on a single data set that we call the both and run. And also we performed the Le Guen et al called it male plus female, that it's you take the female only data and male only data and perform a meta analysis that takes account, the sex, and also see if there is a high heterogeneity, and all the statistical controls that you can have doing this kind of analysis. So we have eight data sets, eight sets of results. And [00:16:00] with these eight sets of results, we find 86 newly associated variants in eight regions of linkage disequilibrium. So we had variants in intronic regions in Intergenic, and someone of in Exonic regions, but only two regions were replicating our replication cohort.
On the other hand, we replicated one of the Le Guen et al variant.
The dataset also replicate one of our new associated variants.
[00:16:28] Sarah Camargos : And how did you replicate your cohort? Which cohort did you use?
[00:16:32] Thiago Peixoto Leal : The replication, it's a crucial step, is a very important step in genetic studies because sometimes you can find some very good results by the chance. So when you do a genetic study, you usually try to replicate in an independent cohorts to see if it's not by the chance, but you find something that it's really associated with the disease.
And this became a problem to us because large PD has a, historical problem with replication [00:17:00] cohorts. We don't have any good replication cohort because large PD, it's the biggest study on Latin Americans. So, it became a problem. So we tried to use the Latino dataset from the International Parkinson's disease Genomic Consortium, the IPDC, that was genotype, and use the neuro chipp array. We have another data from IPDDC, but they use chip that has very, very, few variants on the X chromosome. If I'm not wrong, it's 2000. So we cannot use the dataset with more samples because they don't have variants enough. So this dataset has 155 individuals, which are very small sample size to do any statistical analysis.
So, In this moment we have to find another cohort to increase our sample size. So I go to the EPIGENE in Brazil dataset, that was the data that I work on the UFMG when I did my master [00:18:00] and PhD program and they have there the Bambui Aging cohort study. The Bambui dataset is composed by 1,422 individuals.
That represents 82% of the residents of the city in the baseline year and everyone is higher than 60 years old. So this cohort also contains PD cases, so we using merging Bambui, with the PDC, we also increase our cases and this inclusion was done, thank you to the Dr. Pell and Dr. Paolo Carelli, they help us give you the ideas for all the one that has the RPKs, and other kind of PDs.
So using this IPDDC plus Bambui dataset, we were able to make the replication and as I told before, we replicate two variants. One of the variants has a different positive effect direction between the discover and the replication cohort. So we are trying to [00:19:00] understand why this happens to this one and another was what we call a full replication. The P value was statistical significant, and also the effects follow the same direction.
[00:19:12] Sarah Camargos : It's interesting because Bambui is the largest epidemiological study run in Brazil about Parkinson's disease. So this cohort was useful for you because we know who is a Parkinsonian and who is not.
[00:19:27] Thiago Peixoto Leal : It was a really good surprise because I was including the data and some, and Dr. One of my previous supervisor told me, oh, they have PDs on Bambui I say, oh, nice. Let me find the individuals because we can also improve our simple size. Better. So it was really nice.
[00:19:45] Sarah Camargos : Yes. the study was conducted there in Bambui and people from our university came there to examine the patients, all the patients. All older patients.
So let's go to another question.[00:20:00] How did you study the gene expression of those variants? What you did you find?
[00:20:05] Thiago Peixoto Leal : After the replication, our first step was go to the GTEx Portal, which has information about the expression contactive loss, which has the goal to identify genetic variants that affects the expression of one or more genes. So if this variant increase or decrease the pressure of a gene. So after this, we build a list without the KT associated genes, and we start two different analysis. The personality was a colocalization that was done with the help of Dr. Sara Bulgrensita from NIH. This analysis has to the goal to investigate if the variant mediates the risk through the expression of the end of the gene. So like, if this variant increases this, or decrease the expression of these genes, and if this cause the disease. Unfortunately, our colocalization analysis [00:21:00] didn't show that this is happening, but we got a very good first step to see if we can find a variant that is in linkage to our replicated variant or not, so we have future steps to go. And the second analysis was using the Parkinson's progression markets in to achieve data or the PPMI. And the PPMI have their RNA counts by cases and control. So we extracted the information from all the genes in our list. So we take our list of genes and extract the information from PPMI data. After this, we segregate our analysis by sex. And we made up two tests between the RNA counts, between cases and controls, and also segregating by the heterogeneity . So we achieve three statistical significance results, but more work is needed. We need to do, because the difference in the sample size for the European populations and hispanic all Latinos populations are [00:22:00] pretty high, so we need to do more works to be able to see if these differences are really associated with the disease or just because of sampling problems or something like this.
[00:22:13] Sarah Camargos : Now I would like to ask you both to share with us your conclusions and limitations of this study and the next steps.
[00:22:23] Ignacio Mata : Hopefully it was, clear during this chat with you that we need to we need to do more studies in other populations, right? it is great to study Europeans, but I think that there is a great deal that we can find studying more diverse cohorts. And I think that this is a, huge effort that the Parkinson's, community is doing through the genetic, program called GP2 that is funded by the ASAP initiative, and I think we'll get there. I think there's more and more efforts to include individuals from all around the world, not only from Latin America, but from Africa, Southeast Asia, India, regions that are very underrepresented. I think the Michael J [00:23:00] Fox as well has putting a lot of funding towards those, efforts. I think again, even with the small cohorts, as we showed you can still find things even if they're not as big as Europeans. I think there is a lot of ways that we can find interesting things and make sure that also the things that we have found in Europeans to see how they apply to other populations because there is a great deal of overlap, especially because Latinos have some European in them a lot of things we do find in Europe and the States can be translated to Latinos as well.
And I think the limitations are also very clear, which is that lot of the things that we don't have, that Europeans have, are bigger data sets, replication data sets, right, as Thiago was mentioning very hard to find samples to be able to replicate some of these results. And our cohort is, currently not big enough to be able to split it into two and do a discovery phase and a, replication phase. And, and then also a lot of the things that, databases that we use like PPMI or GText. They're not following up with this inclusive approach and they still [00:24:00] don't include a lot of Latinos. They don't include a lot of black African American individuals and that means that a lot of the data for the next steps, right? So all the protein expression, gene expression, those things we're using data that are not really perhaps, as good as they should for the populations that we're actually studying.
So, the next steps will be critical in trying to generate those, as well as like IPS cell models, a lot of the models that we use are also based on European. So I think those are kind of like my main goals is to try to improve this, and again, I think we are on the way through, GP two to be able to do a lot of these things and running a lot of these analysis in the full GP two cohort, which hopefully will have around 150,000 individuals. So it'll be a very large study.
[00:24:45] Sarah Camargos : Yes. Perfect. What about you, Thiago what do you, think, will be your next steps? You will replicate yourself in these studies?
[00:24:55] Thiago Peixoto Leal : So about the conclusions, I agree with everything that Dr. [00:25:00] Mata told, like we need to increase the diversity of our studies and for the next steps I have two different next steps. The first one is that all this was done using the large PD phase one data.
Dr. Ignacio Mata was funded by a R one grant and he's collecting more samples from Latin Americans. So on the future, the most basic thing is do a XWAS on large two phase data that has more samples for more countries, so more diversity and all that kind of problem that we like to solve. And also there is statistical models that takes account what we call local ancestry. So, when I say, ancestry for instance in a mix individual, you can do in two different levels. The first one is the individual level, which you say, I dunno, for example, me, if I made a test and I'm 50% European, [00:26:00] 27% African and 23% Native America. But, you can do this in a chromosomal level. You can go look each piece of each chromosome in age mixed individual. And there is statistical models that take this in account when you do the analysis. So one of the things that I, believe that I will work in the next months or years will be using one of those methodologies to run these on the chromosome X. Of course, we expect this won't be so easy because all the methodologies, the local one three inferences and also the methodologies to include the local ancestry on the X chromosome was made for autosomals. So we have to build the whole pipeline to bring this data, this methodology, this techniques from the autosomals to X chromosome. So that's our next two steps. That is large PD phase two. Also included the local ancestry in the models [00:27:00] to be able to reduce the heterogeneity and extract the full potential of heterogenetic individuals have.
[00:27:06] Sarah Camargos : Wow. Perfect. Many thanks Dr. Mata and Dr. Thiago, I think we are envisioning a new year for Parkinson's disease and hopefully we'll get more and more samples and you'll replicate yourself. Thank you very much.
[00:27:22] Ignacio Mata : Thank you, Sarah.
[00:27:23] Sarah Camargos : Hey, I would like to make an addendum. Bambui cohort was initiated in 1997 to investigate the incidents and predictors of health outcomes in elderly population. They had 2000 participants aged 60 and over. Professor Francisco Cardoso and Professor Maira Barbosa conducted a two-phase population based approach.
They screened all individuals aged over 65 with a questionnaire, and then [00:28:00] patients is scoring two or plus. In this questionnaire were examined 66 7 patients. This cohort was used for application of data found in XWAS from large pd. They used PD patients data, and importantly, the 1500 controls. And as they ruled out the Parkinsonian ones, they created an accurate control agent cohort.
This was extremely important for this study, and the large PD authors are extremely grateful for having the opportunity to use this data. [00:29:00]