Machine learning enhanced virtual autopsy
Shane O’Sullivan; Andreas Holzinger; Kurt Zatloukal; Paulo Saldiva; Mohammed Imran Sajid; Dominic Wichmann
Abstract
We propose a study that compares and contrasts virtopsy with traditional autopsy, using large study groups based on four different levels of information: Macroscopic; Microscopic; Medical imaging; and Molecular phenotypes. This proposal includes the development of a machine learning enhanced virtual biobank to understand the process of diseases, and to evaluate clinical diagnosis and treatment. This virtual biobank, enhanced by machine learning and knowledge extraction approaches, will be composed of a unique collection of non-invasively generated autopsy images (e.g., X-ray, computerized tomography, magnetic resonance imaging, ultrasound) and digital pathology imaging data of corresponding biological samples.
Based on this vast resource, we will experimentally design, develop, test and evaluate machine learning algorithms that can self-learn from, as well as make predictions (based on the virtual biobank data). These ambitious studies will help us go beyond the state-of-the-art to demonstrate to what extent machine learning can enhance medical expertise, so as to understand the process of diseases and to evaluate both clinical diagnosis and treatment.
The algorithms will make data-driven predictions or decisions, by building a model from sample inputs; a variety of subject data was used. We will use the same machine learning approach developed by a research group at the Medical University of Graz.
This strategy will allow achievement of results that include the use of fewer images when compared with conventional machine learning (e.g., deep learning approaches), and so helps to alleviate some major issues one would expect for this type of study set-up. In the application of virtopsy we do not have access to many cases, (i.e., sample-size is small, and not the large numbers in the thousands to millions which are usually necessary), and so consequently we need algorithms which are able to self-learn (with only a few examples), in the same way humans do. The chosen approach of interactive machine learning is an inventive way to overcome this problem. The intention is to utilize this method of smart injective machine learning. It is not a black box method, which demands the input of a massive amount of data. Instead, expert knowledge is fed directly into the algorithmic loop.
Regarding virtual microscopy, digital pathology now provides an entirely innovative technique to analyze image data from tissues. We propose a study that conducts a systematic comparison between the morphology of tissue (as seen by digital pathology) and medical images such as magnetic resonance imaging (MRI).
In addition, pathologists will provide images using a medical care standard digital camera for macroscopic documentation and analysis. This image will be compared with corresponding medical images, as different levels of imaging information add to the originality of this study.
Finally, we propose to include nuclear mass resonance (NMR) spectroscopy in the active research of this study. This adds to the originality and scientific aspect as it involves a new level of information regarding metabolic status and morphology of tissue. Consequently, focus is not only on explaining disease biology, but it is also developing machine learning algorithms that provide image analysis – currently there is a lack of literature on large-scale programmes that compare MRI images with both NMR and microscopic images.
NMR will enable us to obtain the metabolomic profile, as the metabolome provides detailed information about the chemical composition of the tissue. For NMR metabolomics, it is essentially the same principle as for MRI, but the read-out of the signal is different. This is scientifically challenging but relevant, as we can then compare MRI data with chemical composition of tissue investigated. In addition, we will compare this data with histological and macroscopic data. There has been no current literature to date demonstrating that this has been undertaken in such a coordinated approach as set out in our proposal.
Ideally, for this type of image analysis study, we intend to include many additional datasets from autopsies performed rapidly (~3 hours) after clinical death. This is important because virtopsy must be compared with traditional autopsy whilst avoiding autolysis. The proposed virtual biobank will be used to train algorithms to cope with effects due to autolysis and other post-mortem effects. This is also of huge relevance to forensic medicine. By performing autopsies in as early as 3 hours after clinical death, we are managing a more feasible situation when compared with secondary phenomena due to autolysis. As a result, we will collect better material for molecular studies, and this is cutting-edge research with the generation of new knowledge in the scientific approach.
Furthermore, accessing the affected organs and tissues by traditional autopsy provides an opportunity for molecular analyses, such as metabolomics by using NMR spectroscopy. As discussed, the innovative aspect we will introduce to this study is to compare virtopsy with traditional autopsy using four different levels of data information:
In brief, the main objective is to demonstrate that it is possible to create a self-learning system, which will assist pathologists and physicians in diagnosing certain medical conditions from digital data. This information technology (IT) learning approach is a major objective of the proposal. We will create an IT-based methodology that provides an innovative technique to perform autopsies. This includes all other issues with machine learning to facilitate physicians to determine the correct diagnosis; it revolutionizes the method of performing autopsies.
By making the systematic comparison of tissue alterations across different organs, we generate digital slides and then compare the histologic imaging features with medical imaging features, e.g., MRI; one of the techniques we use to link virtopsy to digital pathology.
This study provides a systematic understanding that can distinguish between postmortem inflammation infarcted from autolysis, and thereby generates a new database for further imaging programs and algorithm developments. This is a “machine learning from images” study that has a novel scientific approach with a practical setup, a strong focus on what to look for, and an innovative way to analyze the data.
In most countries, there are inadequacies with regards to conducting large-scale programs for this type of study. Firstly, for some subjects, pathologists do not receive approval from the relatives, and therefore they perform fewer autopsies. Secondly, the time lapse of consent is approximately one day, resulting in performing autopsies with a marked delay after death. Thus, it is most advantageous to include additional cases that examine a situation more closely related to a living body, which allows us to distinguish between disease-related alterations and secondary alterations (caused by prolonged autolysis periods).
The virtual biobank will facilitate data fusion, data mapping, data integration and data sharing on four different levels: (1) Macroscopic; (2) Microscopic; (3) Medical imaging; and (4) NMR. The virtual biobank will consequently be the basis for a fully-fledged and powerful machine learning pipeline. We propose to include analysis of in vivo and ex vivo brain structural changes in a large sample of subjects; adding a novel parameter to the study and enabling insight into entirely original, previously unknown aspects.
A team with in-depth experience in machine learning and knowledge extraction will support and help in building the machine learning pipeline and novel knowledge extraction tools. In particular, the highly needed visualization tools that enable the experts to find, e.g., anomalies, similarities, dissimilarities in arbitrarily high dimensional data sets. These would be otherwise inaccessible to the human end user, if following an interactive machine learning approach with the doctor-in-the-algorithmic-loop.
Our unique technique is that we propose to construct machine learning algorithms that self-learn from and make predictions based on the study group data stored in our virtual biobank. Imaging biomarkers depend on access to biomaterial, and these biological features are readily available at many medical universities. Some academic hospitals remain very open to both traditional and imaging autopsies for clinical and academic studies.
In most cases, a study of this kind would usually require 1-2 years of data collection. However, some research groups possess a record of previously existing virtopsy data, which can enable them to perform machine learning image analysis from the onset. This data allows researchers to develop algorithms which they then validate, adapt, use and test in the autopsy cases which are to be collected during the random period of their study.
Initially, they can train the algorithms on existing datasets, which are available from previous virtopsy studies, improving the likelihood of achieving expected results. However, it is critical to include many datasets from autopsies performed rapidly (~3 hours) after clinical death to avoid the risk that ultimately the algorithms will fail to recognize autolysis, which is the most common phenomenon in these cases. The time-course advantage is invaluable for the validation of this study. Using an “early cohort” in the mix can address specific questions that cannot be explained by using a “delayed cohort”. Therefore, the “early cohort” allows to witness alterations in the dead body as early as possible, as well as to follow how this has changed over time and recognize how features are influenced by autolysis. Subsequently, the outcome is having a superimposition of the pre-existing disease – be it an inflammation, infarction or tumor. This is then superimposed by the effects of autolysis. The intention is to avoid making studies on autolysis in pathology since the primary aim is to discover the underlying disease.
The ability to compare radiology data with histology and digital pathology allows us to see how far the principles of machine learning image analysis can be applied - with different technologies and different imaging types. We create a biobanking library of digital images together with material data, which makes it a compelling resource. These are major factors of the core scientific approach that increase the likelihood of success.
This virtopsy proposal aims to use autopsies while incorporating new technologies - such as medical imaging - to improve how infectious diseases are examined. This research will help to understand the processes of diseases and evaluate both clinical diagnosis and treatment. It will enhance understanding in respect of therapy resistance by cancer patients with minimal tumor burden, and even the cause of death in such patients. Essential baseline data outcomes will quantify and compare tumor mass in cases involving diverse patients (according to their metabolic conditions). We hypothesize that virtopsy is dramatically more advantageous at mapping cardiovascular alterations, hemorrhages, and misuse of medical devices.
In summary, we plan to combine all the aforementioned diverse steps of autopsy and data analysis. The anticipated result will provide a new method of analyzing digital data from virtopsy and digital pathology; identifying improvements in diagnostic techniques, and assisting physicians with finding the correct diagnosis. We are confident that the entirety of our multi-faceted approach will provide the desired results and advance the state-of-the-art.
Keywords
References
Holzinger A, Plass M, Holzinger K, Crişan GC, Pintea CM, Palade V. Towards interactive Machine Learning (iML): applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach. In: Buccafurri F, Holzinger A, Kieseberg P, Tjoa A, Weippl E, editors. Availability, reliability, and security in information systems. Cham: Springer; 2016. p. 81-95. (Lecture Notes in Computer Science; 9817). CD-ARES 2016. [https://doi.org/10.1007/978-3-319-45507-56].
Publication date:
12/08/2017