View complete transcript
[00:00:38] Dr. Émeline Courtois: Hello Michele. It's a pleasure for me to be invited by the Movement Disorder Society to present my work today.
[00:00:45] Dr. Michele Matarazzo: Great. So let's discuss about this. Nowadays machine learning is everywhere and we read about it in every field of our life, and it does come with a lot of expectation to the field of medicine. But can you explain in just few words when we say machine learning, what are [00:01:00] we talking about?
[00:01:02] Dr. Émeline Courtois: right. We heard a lot about machine learning recently, and it's important to define it clearly. As a biostatistician, in fact, I do prefer the equivalent term, statistical learning. The machine as a computer, in fact, is learning through statistics. What I mean by learning is either an estimation task, for example, when you want to estimate the effect of a given drug on disease.
It could be also a prediction task if you want to predict if an individual is at risk to develop a disease, or it could be a feature selection task. It's, for example, when you want to select which covariates, which variables, are going to be interesting if we want to understand the biological mechanism behind a disease or something like that.
So with this definition, in fact, even a simple linear regression [00:02:00] is a machine learning method. As I understand, in the recent literature when we are talking about machine learning, sometimes you want to refer to more complex approaches which are in particular still relevant when you are working with large databases.
[00:02:15] Dr. Michele Matarazzo: Now machine learning can be applied to any set of data and you were kind of mentioning that before, but in this article you use a very large data set of subjects analyzing the drug intake in the years before the diagnosis of Parkinson's disease. And you compare that to controls. So keeping in mind the possibility of drug repurposing, you looked for candidate molecules that might have an impact on the development or not of Parkinson's disease. I think this is a fascinating approach. But before we dive into the technicalities of the paper, what do we mean when we say drug repurposing?
[00:02:48] Dr. Émeline Courtois: Okay. Thank you for the very nice summary of this work. Drug repurposing or drug repositioning is the application of a known and marketed drug to a new [00:03:00] indication. So the main advantage of drug repurposing is that it can lead to shorter and less costly drug development cycles with increased probability of success.
I just want to add that in the specific context of Parkinson's disease, the currently available treatment option are only partially or transiently effective and they fail to retard the disease progression. And in addition, Parkinson disease drugs are not available in many low income countries.
So there is an urgent need to identify effective, safe, and inexpensive drugs. Thus, that's why we think that drug repurposing could be an accelerated route for drug discovery in the context of Parkinson's disease.
[00:03:48] Dr. Michele Matarazzo: All those are very good points indeed. I think everybody would agree with you that we do need some better drugs, especially for disease modifying of Parkinson's disease and other neurodegenerative disorders.
Now let's talk about the paper. How [00:04:00] many subject did you study and where did you get the data from?
[00:04:03] Dr. Émeline Courtois: We worked with the French National Health Data System and if I have to keep it simple, it's an administrative database filled with claims from healthcare consumption. So this database includes exhaustive individual information on demographic characteristics. And information about healthcare consumption, benefits for long-term disease and detailed information on hospital stays.
This database contains all this information from more than 90% of the French population since 2006. So first we identified Parkinson's disease patient in this database through an identification algorithm, previously validated with a neurologist.
Then we considered only incident parkinson's disease patients identified by the algorithm in 2016, 2017, and [00:05:00] 2018, in order to have at least 10 years of follow up for each individual. Three controls were randomly matched to each Parkinson's disease patient on sex, age of the incidence here and place of residence.
Then we applied some exclusion criteria to both patient and controls. And in the end, in the main analysis, we considered more than 40,700 Parkinson's disease patient and more than 176,000 controls.
[00:05:33] Dr. Michele Matarazzo: That's a huge number.
To analyze such a huge amount and of complex data. I guess it is very difficult and you must take into account many different factors, variables, and possible confounders. Can you try to explain for someone more oriented to the clinical side, such as myself, how you planned and run the analysis?
[00:05:53] Dr. Émeline Courtois: I will try to keep it simple. So as you said, we had to handle very complex data, and in particular we [00:06:00] had a lot of variables. So, just to be clear, our main objective was to screen in a fully agnostic way, several hundred of marketed molecules to figure out which one could be potentially associated with the Parkinson's disease statues in a protective way.
But in the same time, we wanted to adjust for confounding for the screening to be relevant. Thus, we were looking for a procedure that could select interesting molecules and also interesting confounding factors among all the covariates that we had. So among several statistical approaches to perform feature selection such as the statistical workforce screening, we chose the LASSO logistic regression because it has nice properties when we have to enter such large database.
So into word, this regression could shrink some regression coefficient to zero, and the selected covariates are the one with non-zero [00:07:00] coefficient. But anyway, we perform this feature selection, so selection of molecules, selection of confounding factors with the LASSO and a criteria named cross validation.
Then in a second step, we estimated through a classical logistic regression, the effect of the selected molecules adjusted on the selected confounding factors on the Parkinson disease stages. So first step selection, second step estimation. because in statistic, it is problematic to perform these two tasks on the same dataset.
We split our dataset in two sub samples. And we repeated this procedure 500 times in order to obtain more stable results and to be more confident in the feature selection step of this analysis. So at the end, we considered interest in the drug that were frequently selected in the first step and with the negative [00:08:00] average effect on Parkinson's disease stages.
[00:08:03] Dr. Michele Matarazzo: Okay. And what drug did you find to be associated with lower risk of Parkinson's Disease?
[00:08:07] Dr. Émeline Courtois: So the most promising signal. And when I talk about signal, I mean discovery. So the most promising signal that we found was the chemical subgroups of plant sulfonamide diuretics, and in particular furosemide. The many indication of furosemide for treatment of edema and it was our strongest signal and we found the pattern that could suggest those affect relationships.
And this is very interesting. So we had a Parkinson's disease specialist in the team as well as a pharmacologist, and they listed all the biological property of this molecule that make it a very credible candidate for drug repurposing. So this is well described in the paper, but what is very interesting about this drug is that another drug of the sulfonamide group, so namely the [00:09:00] zonisamide, which is not prescribed in France, is undergoing clinical trials for the treatment of Parkinson's diseases, Japan.
So this is a very promising. We, we also found that anticholinergics in combination with anticholinergics, including triple combinations where corticosteroids were associated with a lower risk of Parkinson's disease. So, there has been recent interest in the association between beta-2, anticholinergics, agonist, and Parkinson's disease.
But, with inconsistent findings among studies. Beta-2, anticholinergics, agonists are indicated for treatment of medical conditions related to smoking, which is well known to be conversely associated with Parkinson's disease. So in our analysis we indirectly and partially adjusted for smoking. Since we also found that drugs used in nicotine dependents were inversely [00:10:00] associated with Parkinson's disease.
And furthermore, since we considered the prodomal phase of Parkinson's disease, we are rather confident in the credibility of this signal about beta-2 anticholinergics agonists.
[00:10:14] Dr. Michele Matarazzo: Even after adjusting for the smoking status?
[00:10:17] Dr. Émeline Courtois: Yes. The association stay, in our study, we also adjusted on several other confounding factors such as hospitalization for conditions related to smoking. So indirectly, we took into account the smoking statues and the association with beta-2 agonist in this analysis.
So this is very interesting. And lastly, we also identify a signal for insulin use. But this signal, in fact, was not highlighted in the sensitivity analysis, but it could be interesting because all the recent studies support an increased risk of Parkinson's disease within diabetes patient.
This study only focused on type two diabetes, which is [00:11:00] characterized by insulin resistance. But nevertheless, the inverse association between insulin and Parkinson's disease could be explained by the property of insulin. Which in fact crosses the blood brain barrier. So it's a hypothesis.
[00:11:16] Dr. Michele Matarazzo: When I was reading the paper, and actually, well, we've discussed a little bit about this in the last few minutes, but I guess when drawing conclusion of all of these results, one of the main problems. Such as, what happens already in classical statistics is the classic causation versus correlation issue.
Is it possible that the effects are rather related to the underlying condition for which these drugs were prescribed or maybe to the improvement of these underlying diseases that were treated with those drugs?
[00:11:46] Dr. Émeline Courtois: Of course, this is what we call the bias called prescription bias, and it's a very important issue that we have to consider when interpreting the results. So, just to be clear, our machine learning algorithm [00:12:00] was designed to raise hypothesis, but the credibility of our findings had to be evaluated by experts in order to avoid the bias that you mentioned and to determine if the results were reliable.
So yes and avoid this pitfall. I would also like to add that because Parkinson's disease has a very long prodromal phase, we had to face another kind of bias in the same situation. The reverse causation bias that mean when you are affected by Parkinson's disease, you can be, in fact, ill long before the diagnosis is official.
So, therefore drugs could be taken to treat early stage symptoms. And from a statistical point of view you will find that these drugs are associated with a higher risk of Parkinson's disease. So, the rationale is a little bit different. But when you are looking for potential protective [00:13:00] drug this kind of bias remains.
In this work, in order to minimize all these sources of bias, we design our study in order to have a very long follow up period and to consider a quite long lag in order to account the the prodromal phase of Parkinson's disease.
So we worked with the eight year lag before the index date, before the incident date.
[00:13:23] Dr. Michele Matarazzo: Perfect. We mentioned some of the, the bias problem and I think it's important what you said, that those results are basically a hypothesis generator rather than a confirmed fact. And so this hypothesis generator should be confirmed in further studies, but maybe also this should be replicated in other cohorts.
Well, right now a lot of health systems are generating huge amount of data. So probably, I think it will be feasible to reproduce this study with other databases. And that would be great. When you find the same results or it's consistent in different databases, obviously you can be more confident about this and then proceed to the [00:14:00] next steps and do maybe an experimental
[00:14:02] Dr. Émeline Courtois: Mhm.
[00:14:03] Dr. Michele Matarazzo: Now what do you plan to do with these results, to translate them into something that might be useful for the patients?
[00:14:09] Dr. Émeline Courtois: The very next step of this project is to focus specifically on our discoveries, in particular the furosemide, and to refine those signals by addressing and measure confounding with external data. In fact, we do have as you say, other data sources such as administrative database, matched with cohort data.
So we have the same amount of information but with other confounding factors such as physical activity or diet. So this is the next step to refine our discovery not in an agnostic way, as we did before. We really want to focus on our discovery, but if I had to be very optimistic, in the future, of course I hope that this kind of results could be translated to clinical trials and lead to new strategy in the [00:15:00] treatment of Parkinson's disease.
[00:15:01] Dr. Michele Matarazzo: Well, let's, be optimistic.
[00:15:03] Dr. Émeline Courtois: Yes, exactly.
[00:15:05] Dr. Michele Matarazzo: Perfect. Well, Émeline thank you very much for your time. It has been a pleasure to have you on the MDS podcast.
[00:15:10] Dr. Émeline Courtois: Thank you.
[00:15:11] Dr. Michele Matarazzo: We have had Émeline Courtois, and we have discussed the article, Identifying Protective Drugs for Parkinson's Disease in Healthcare Databases Using Machine Learning from the Movement Disorders Journal.
Don't forget to download and read the article from the website of the journal, and thank you all for listening. [00:16:00]