The scientific scandal of the decade? Researcher takes on med-tech giant

Simon Tilma Vistisen smiles wryly and straightens up in his chair. It's clear he's told this story many times before and has developed a routine, yet his passion is evident as he talks about the persistence that has led him through a world of misconceptions, lost relationships, and complicated legal matters.

Simon Tilma Vistisen is a researcher in cardiovascular physiology at the Department of Clinical Medicine, and according to his own statement, he has somewhat accidentally landed in one of the biggest scientific scandals of the past decade within his field.

"In my research, I use measurements from intensive care patients and patients undergoing surgery to say something about how the heart functions and how it works. The aim is to be able to tell doctors about the current state of the patient's heart, and how it should possibly be treated with fluid administration or types of medication, if the heart is not pumping as much as the body needs," explains Simon Tilma Vistisen.

This work led him to the Netherlands on a Sapere Aude postdoctoral grant at the end of 2016. There, he collected data that later turned out to be useful in validating a so-called machine learning algorithm that was being developed by the American technology company, Edwards Lifesciences.

"They needed to validate their algorithm. My colleague in the Netherlands had an ongoing collaboration with the company, and since I had already collected relevant data, it could just as well be reused for their validation. When I returned home from my stay abroad in the Netherlands and later Boston in 2018, I was sent a manuscript describing how the algorithm was able to predict drops in blood pressure during surgery," says Simon Tilma Vistisen.

A Simple Algorithm

Developing an algorithm that can predict low blood pressure in patients seems like a good idea, explains the researcher. Blood pressure can be quite unstable during major surgeries and can drop unexpectedly. Large observational studies indicated that it could be problematic for patients if they experienced low blood pressure during an operation, as, for example, their kidneys might not have been sufficiently supplied.

The goal of the algorithm was, therefore, to predict if a drop in blood pressure was imminent – so it could be treated proactively and possibly even avoided. It was precisely the algorithm's ability to correctly predict blood pressure drops that Simon Tilma Vistisen's data was used to validate in the study.

"Specifically, the machine learning technology had allegedly learned to recognize what a blood pressure curve looks like when the blood pressure falls below a critical threshold five minutes into the future. As part of the validation, I asked the researchers from Edwards to demonstrate whether the algorithm was better than a mere mundane guess," he says.

The researchers from the company added this, and their algorithm apparently performed significantly better than the mundane guess.

"The mundane guess is somewhat akin to asking a person whether the stock price of Novo will soon exceed 750. The person wouldn't have the basis to know immediately, but if it's informed that the stock price is either 500 or 700, anyone would guess that in the latter case, the stock is more likely to cross 750 in the near future. Similarly, I asked the company to analyze how much better their algorithm is at predicting whether the blood pressure will fall below the critical level in five minutes or not, compared to a guess based on the current blood pressure," says Simon Tilma Vistisen.

The study was published in 2019 and is highly cited today. Shortly before this, the company had also published the first article about the overall development of the algorithm.

Photo: Simon Fischel, AU Health. Generated by Adobe Firefly.

Trouble in the Machinery

The analyses by Edwards Lifesciences did not align with Simon Tilma Vistisen's gut feeling, but he assumed they must have conducted their analyses correctly. After all, Edwards Lifesciences is a huge company and was collaborating with one of the most important European researchers in the field.

"One day in autumn 2021, I was explaining a particular curve of the algorithm to a medical student, and I could hardly do it because I realized it had a strange shape. Shortly after, I showed the curve to my Ph.D. student Johannes Enevoldsen and said that it must have something to do with how the data was selected. Johannes came into my office 20 minutes later, almost with fire in his eyes: He had re-read the very first study and, being the genius he is, quickly realized what was described about the data selection. It was like a puzzle piece that rotated and then fit right into everything we were wondering about," says Simon Tilma Vistisen.

With the described data selection, the curves for the mundane guess and the algorithm's predictions should have the same shape – but they were not even close.

"I contacted my four co-authors, all of whom worked entirely or partially for the company. The company's developers wrote back that the data was selected in almost the same way – except for one small difference. There was the puzzle piece. The curves showing how well the algorithm and the mundane guess predicted low blood pressure were different in their form, solely because they had not been subjected to the same data selection. In concrete terms, this means that the study's original conclusions were completely wrong, as we were thus comparing apples and oranges," says Simon Tilma Vistisen.

However, the company insisted that there was no problem, even though Simon Tilma Vistisen maintained that the study had to be corrected. He therefore decided to describe the error in a major commentary to the very first study, along with his Ph.D. student.

The description was peer-reviewed by a leading expert in how data behaves over time.

He wrote in his review: "I stopped reviewing the paper in question on page 5, line 20, because I was genuinely shocked by what I read. I looked at the previous manuscript instead, and verified that they had, in fact, defined a non-event exactly as represented by Enevoldsen and Vistisen. The previous manuscript is based on a fundamentally (and fatally) flawed data definition. When the data are correctly analyzed, the findings should speak for themselves."

"When I read this, I had no doubt that we would succeed in correcting my own study," Simon Tilma Vistisen recounts.

A Prolonged Struggle

Since then, Simon Tilma Vistisen has stood firm on his research integrity, widely communicating the project's erroneous conclusions in scientific articles, on social media, and through direct contact with the company's developers. However, maintaining scientific integrity became increasingly difficult:

"During the discussions, my primary collaborator resigned his professorship in the Netherlands and began working full-time for Edwards Lifesciences, responsible for communicating about this very technology. He was my main collaborator and extremely important to me as a young researcher. He was also a good friend, so it was really unfortunate that I had to speak up about this. He certainly felt betrayed, and our contact is limited today," says Simon Tilma Vistisen.

Challenging a company with such clout has not been without concern, and the researcher has been mindful from the start to avoid legal entanglements.

"At one point, the company asked me to delete all data. They had shared them with me, and I have in writing that I could reanalyze them. But suddenly in 2022, I was no longer allowed to use the data. I considered it a lot, but today I have actually deleted everything. I don’t have the temperament to worry about a potential American lawsuit," says Simon Tilma Vistisen.

The researcher had hoped for much more attention from trade media and scientific journals, but in 2022, the case was stalled for over six months.

"This standstill has been an incredibly frustrating process, but if I, as a researcher, don't have my scientific integrity, then everything else is irrelevant. I had many thoughts about what my persistence would mean for my career, my relationships, and how to balance it without appearing shrill," says Simon Tilma Vistisen and continues:

"At one point, I was about to lose my patience, as nothing was happening. It occupied me so much that I woke up a lot at night because I simply couldn’t let it go. Also, during work hours, I found it hard to focus on anything else. I don't know if one should call it stress or the consequence of my perseverance, but it has been too overwhelming and consuming at times," says Simon Tilma Vistisen, who also points out that at AU, there has been good support in this unique case.

"I would say that it has been valuable to communicate openly with management and colleagues about how the case has affected me and to make use of the resources available for counseling and even an industrial psychologist during the most pressing period," he says.

"The Emperor's New Clothes"

Today, the flawed study's conclusions have been retracted by the journal, and many in the scientific world have become aware of Simon Tilma Vistisen's fight. There is an increasing number of independent research groups that support the interpretation of the error with their own data. Moreover, the case has led to Simon Tilma Vistisen becoming more involved in the world of machine learning, and he now attends a variety of scientific conferences to talk about his experiences with the subject.

"I hope my story can help draw attention to this specific case, but also help correct the mechanisms that didn't work here," says Simon Tilma Vistisen, and continues:

"Machine learning is not a magic wand. Artificial intelligence is powerful and applicable in some contexts, like ChatGPT and medical imaging, but there are just so many applications in health where it doesn't work yet, and where projects are not scrutinized enough. To some extent, machine learning has become a kind of hocus-pocus term that makes people forget their otherwise healthy skepticism. In reality, in some cases, it becomes mere 'The Emperor's New Clothes'."

Furthermore, the researcher hopes that there will be regulatory focus on how such technology is approved by authorities like the American and European agencies, before tools that could lead to the mistreatment of patients are passed through the system.

"In my view, this case is the biggest scandal of the decade so far for companies in our field. I believe that in 2024, it will become clear to the majority in the scientific world what went wrong. And then we'll see whether the company decides to continue its aggressive marketing or finally acknowledges the problem," concludes Simon Tilma Vistisen.

Contact

Associate Professor Simon Tilma Vistisen
Aarhus Universitet, Department of Clinical Medicine
Phone: 20 67 68 68
Email: vistisen@clin.au.dk