Feasibility analysis of machine learning in medical
diagnosis from aura images

 

Tatjana Zrimec and Igor Kononenko
University of Ljubljana
Faculty of Computer and Information Science
Tržaška 25, SI-1001 Ljubljana, Slovenia
tel: +386-61-1768390, fax: +386-61-1264647
e-mail: {tatjana.zrimec;igor.kononenko}@fri.uni-lj.si

 

Abstract

Machine learning technology is well suited for the induction of diagnostic and prognostic rules and solving of small and specialized diagnostic and prognostic problems. The medical diagnostic knowledge can be automatically derived from the description of cases solved in the past. In several medical domains we actually applied machine learning algorithms. Typically, the automatically generated diagnostic rules achieved the same or slightly better diagnostic accuracy than physicians specialists.

In spite of huge development of the biomedical technology, the diagnostic accuracy is in many cases rather low. The reason is that medical instruments do not provide enough relevant information for reliable diagnosis. Recently developed technology for recording human's aura provide a completely new information about the biophysical and psychical state of the patient that could in some cases drastically improve the diagnostic process. However, the problem is the interpretation of the aura images. By using machine learning we could alleviate that problem by means of automatically generating diagnostic rules from aura images from the records of patients with known diagnoses.

 

1 Introduction

Machine learning technology is well suited for the induction of diagnostic and prognostic rules and solving of small and specialized diagnostic and prognostic problems. Data about correct diagnoses/prognoses is often made available from archives of specialized hospitals and clinics, where the number of stored cases grows daily. All that has to be done is to type the data, i.e. the records of the patients with known correct diagnosis, into the computer in the appropriate form and run the learning algorithm. This is of course an oversimplification, but in principle, the medical diagnostic knowledge can be automatically derived from the description of cases solved in the past. The derived classifier can then be used either to assist the physician when diagnosing new patients in order to improve the diagnostic speed, accuracy and/or reliability, or to train the students or physicians non-specialists to diagnose the patients in some special diagnostic problem.

In several medical domains we actually applied machine learning algorithms, e.g. in oncology and urology (Bratko and Kononenko, 1987; Roškar et al., 1986), thyrology (Hojker et al., 1990), rheumatology, and other medical areas (Kononenko, 1993; Kononenko et al. 1998; Kukar et al., 1996, Zelič et al. 1998). When applying a machine learning system in medical diagnosis there are several specific requirements that the system must meet, such as reliability and transparency of decisions. Typically, the automatically generated diagnostic rules achieved the same or slightly better diagnostic accuracy than physicians specialists.

In spite of huge development of the biomedical technology, the diagnostic accuracy is in many cases rather low. The reason is that medical instruments do not provide enough relevant information for reliable diagnosis. Recently developed technology for recording human's aura provide a completely new information about the biophysical and psychical state of the patient that could in some cases drastically improve the diagnostic process. However, the problem is the interpretation of the aura images. By using machine learning we could alleviate that problem by means of automatically generating diagnostic rules from aura images from the records of patients with known diagnoses.

 

2 Machine learning in medical diagnosis

This section discusses several issues related to the use of machine learning in medical diagnostic and prognostic problems. When applying a machine learning system in medical diagnosis there are several specific requirements that the system must meet. We discuss advantages and disadvantages of several different machine learning algorithms when used in medical diagnosis. We illustrate the problematic issues within several different applications of machine learning in medical diagnostic problems that were developed in the past.

 

2.1 Machine learning

In recent years many different machine learning algorithms were developed (Michalski et al., 1983; 1986; Rumelhart & McClelland, 1986; Dietterich & Schavlik, 1990; Weiss & Kulikowski, 1991). They can be classified into three major groups (Michie et al., 1994): statistical or pattern recognition methods (such as the K-nearest neighbours, discriminant analysis, and Bayesian classifiers), inductive learning of symbolic rules (such as top-down induction of decision trees, decision rules and induction of logic programs), and artificial neural networks (such as the multilayered feedforward neural network with backpropagation learning, the Kohonen's self-organizing network and the Hopfield's associative memory). However, not all the systems are equally appropriate. When applying a machine learning system in medical diagnosis there are several specific requirements that the system must meet which will be discussed below.

 

2.2 Medical diagnosis

Typical diagnostic process is the following. During the interview of the patient the anamnestic data is obtained and immediately afterwards during the preliminary examination of the patient the physician records the status data. Depending on the anamnestic and the status data, the patient takes additional laboratory examinations. The diagnosis is then determined by the physician who takes into account the whole available description of the patient's state of health. Depending on the diagnosis the treatment is prescribed and after the treatment the whole process may be repeated. In each iteration the diagnosis may be confirmed, refined or rejected. The definition of the final diagnosis depends on the medical problem. In some problems the first diagnosis is also the final, in some others the final diagnosis is determined after the results of the treatment are available and in some problems there is no way to obtain the 100% reliable final diagnosis. For example, in the problem of localization of the primary tumor the final diagnosis can always be obtained with an operation where the location of the primary tumor is verified, although this "examination" is avoided and replaced with other laboratory tests unless it is really necessary to obtain the verified diagnosis. In the problem of predicting the recurrence of the breast cancer after the removal of the breast with an operation, the final verification of the prediction is impossible until five years after the operation. And in the urology, in the problem of diagnosing the incontinence, in practice the final diagnosis is never obtained as there is no practical way to verify the diagnosis.

Medical diagnosis is known to be subjective and depends not only on the available data but also on the experience of the physician, her intuition and biases, and even on the psycho-physiological condition of the physician. Several studies have shown that the diagnosis of one patient can differ significantly if the patient is examined by different physicians or even by the same physician at different time (different day of the week or different hour of the day). Machine learning can be used to automatically derive diagnostic rules from description of the patients treated in the past for which the final diagnoses were verified. Automatically derived diagnostic knowledge may assist physicians to make the diagnostic process more objective and more reliable.

 

2.3 The performance

Typically, automatically generated diagnostic rules slightly outperform the diagnostic accuracy of physicians specialists when physicians have available exactly the same information as the machine. Table 1 provides the comparison of the performance of two machine learning algorithms, the naive Bayesian classifier and Assistant (Bratko and Kononenko, 1987), with the average performance of four physicians specialists in each of four different medical diagnostic problems. The problems include: the localization of the primary tumor, the prediction of the recurrence of the breast cancer, the diagnostics of thyroid diseases and rheumatology. The data used in our experiments was collected at the University Medical Center in Ljubljana.

The characteristics of data sets used in our experiments are summarized in Table 2. Entropy together with the number of classes (diagnoses) shows the difficulty of the diagnostic problem. The number of attributes approximately tells how well the patients are described. The majority class is the prior probability of the most probable diagnosis and is in fact the classification accuracy of a default classifier, which for each patient always selects the same most probable diagnosis.

In our experiments one run was performed by randomly selecting 70% of instances for learning and 30% for testing. Results (accuracy) are averages of 10 runs and are given in Table 1.

Four physicians specialists in each domain were tested to estimate their diagnostic accuracy. From a set of training data a subset of patients was randomly selected and their description printed on paper without the final diagnosis. The physicians were asked to select the most probable diagnosis for each patient. The performances of physicians in Table 1 are the averages of four physicians specialists in each domain. The physicians were tested in University Medical Center in Ljubljana. Although in breast cancer and rheumatology diagnosing of a patient on the paper is somewhat unnatural, for other two domains it often occurs in practice.

Both algorithms significantly outperform the diagnostic performance of the physicians in terms of the classification accuracy and the average information score of the classifier. However, during the examination of the patient the physician often observes the patients condition in terms of the intuitive impressions which cannot be formally described and therefore cannot be typed in the computer.

Table 1: The comparison of performance of different classifiers in four medical domains.

classifier

primary tumor

breast cancer

thyroid

rheumatology

naive Bayes

49%

78%

70%

67%

Assistant

44%

77%

73%

61%

physicians

42%

64%

64%

56%


Table 2
: Basic description of medical data sets

domain

#cls

#atts

#val/att

# inst

maj.cl (%)

entropy(bit)

primary tumor

22

17

2.2

339

25

3.89

breast cancer

2

10

2.7

288

80

0.73

thyroid

4

15

9.1

884

56

1.59

rheumatology

6

32

9.1

355

66

1.73


The lack of such information may be in some cases of crucial importance for the (in)ability to obtain the reliable diagnosis. The accuracy of physicians should therefore be considered as an estimate of how well the algorithms perform and not how badly the physicians diagnose. Although machine learning may derive more reliable diagnostic algorithms from the limited description of the patient, such diagnostic tools definitely cannot and also do not intend to replace the physicians but should be rather considered as helpful tools that can improve the physicians' performance.

 

2.4 Selecting the appropriate Machine Learning system

For ML system to be useful in solving medical diagnostic tasks the following features are desired: good performance, the transparency of diagnostic knowledge, the ability to explain decisions, the ability of the algorithm to reduce the number of tests necessary to obtain reliable diagnosis, and the ability to appropriately deal with missing data. In our study (Kononenko et al., 1998) we used seven algorithms: Assistant-R and Assistant-I for building decision trees (Kononenko et al., 1997), LFC system for Lookahead Feature Construction in decision trees (Ragavan and Rendell, 1993), the naive and the semi-naive Bayesian classifier (Kononenko, 1991), backpropagation with weight elimination (Rumelhart and McClelland, 1986; Weigand et al., 1990), and the K-nearest neighbors algorithm. We compared the performance of the algorithms on several medical data sets. Table 3 summarizes the comparison of algorithms with respect to the appropriateness for developing applications in medical diagnostic and prognostic problems.

Among the compared algorithms only decision tree builders are able to select the appropriate subset of attributes. With respect to the reduction of the number of tests criterion these algorithms have clear advantage over other algorithms. With respect to the performance criterion the algorithms are more similar. The best performance was achieved by naive and semi-naive Bayesian classifiers. In medical data sets, attributes are typically relatively conditionally independent given the class.

With respect to the transparency and the explanation ability criteria there are great differences between algorithms. As K-NN does no generalization the transparency of knowledge representation is poor. In the naive and semi-naive Bayes the knowledge representation is a table of conditional probabilities which seems to be of interest for physicians. Therefore such knowledge representation is estimated as good. On the other hand, the decisions of Bayesian classifiers can be naturally interpreted as the sum of information gains (Kononenko, 1993). Such information gains can be listed in a table to sum up the evidence for/against the decision.

Table 3: The appropriateness of various algorithms for medical diagnosis.

classifier

performance

transparency

explanation

reduction

miss.data

Assistant-R

good

very good

good

good

acceptable

Assistant-I

good

very good

good

good

acceptable

LFC

good

good

good

good

acceptable

naive Bayes

very good

good

very good

no

very good

semi-naive Bayes

very good

good

very good

no

very good

backpropagation

very good

poor

poor

no

acceptable

K-NN

very good

poor

acceptable

no

acceptable

One of the main advantages of such explanation is that it uses all available attributes. Such explanation was found by physicians as very good and they feel that Bayesian classifiers solve task in a similar way they diagnose. Namely, they also sum up the evidence for/against a given diagnosis. Backpropagation neural networks have non-transparent knowledge representation and in general cannot easily explain their decisions. This is due to the large number of real-valued weights which all influence the result. Decision trees (Assistant-I and Assistant-R) can be used without the computer and are fairly easy to understand. Positions of attributes in the tree, especially the top ones, often directly correspond to the domain expert's knowledge. Lookahead feature construction (LFC) also generates decision trees. However, in each node a potentially complex logical expression is used instead of a simple attribute value. On the lower levels of the tree the expressions are often very specific and typically meaningless. Due to complex logical expressions in nodes, the number of attributes used to classify an instance can be higher than in usual decision trees.

 

3 Medical image processing

This section discusses various approaches to medical image processing. Many researchers have claimed that automatic processing of medical images can be successful, only if model-based strategies are used in much the same way as radiologists do. We discuss the problems of medical imaging and we clarify why learning from images is necessary. In the following sections we describe the aura images and the procedure for applying machine learning from aura images.

Modern medical imaging technology offers a broad range of new ways to examine a patient in order to diagnose and to prescribe a proper treatment. In clinical practice today, physicians have access to the imaging devices they need and to the large amount of digital data they produce. The challenge of diagnosis and proper pre-therapeutic planning is to identify relevant information from the large collection of often disparate images.

Medical treatment changes continuously toward less invasive therapeutic procedures. The evaluation of pathology relies on the inspection of various medical image data that provide a kind of geometric mapping of the patients anatomy. Various imaging modalities for measuring morphological and functional anatomical structures are available, for example imaging modality based on X-ray technology that supply 2D projections of the 3D patient anatomy, scanning devices based of tomographic techniques (CT, PET) or MRI (Magnetic Resonance Imaging) that provide a stack of cross sectional of volume data.

Medical image analysis has to support the clinicians ability to identify, manipulate and quantify anatomical structures or other form of anatomical information.

Ongoing research in computer vision aims to develop automatic image processing techniques for reconstructing complex structures from images and visualizing those structures. Medical image processing systems however, are diverging from methodologies drawn only from computer vision since many of the underlying assumption of computer vision are not applicable in medical imaging (Wilson et al., 1995). New integrated techniques are being investigated and developed.

A number of studies have combined image processing with knowledge-based approaches in order to achieve better results in interpreting medical images (Hill et al., 1991, Delaere et al., 1991, Robinson et al., 1993, Suetenset al., 1989). These resulted in various expert systems for image processing and image understanding. In most expert systems, the knowledge mainly concerns how to effectively use image processing operators for image analysis (Matsuyama, 1989). However, systems for image understanding, also known as model-based image processing, use object models to generate predictions during image analysis. These systems interpret images by finding instances of modeled objects and by predicting the existence of objects that have not been found but are known, according to the model, to be necessary for that scene. We use the same approach for analyzing medical images, which are well suited for this type of analysis (Zrimec et al., 1997). When a radiologist examines an image, he or she looks for known features, having in mind a model of the imaged structures. Radiologists know, from experience, what features to expect and their organization in the image. They not only use declarative knowledge about what to expect in the image, they also have procedural knowledge about where to look when there are obstructions or abnormalities. We have developed a system that can use high-level knowledge and knowledge about imaged anatomy to guide low-level image processing operations (Zrimec et al., 1997).

This medical image understanding system is specific for interpreting and labelling vessels in cerebral angiograms in order to construct a 3D model (Zrimec et al., 1994). The role of the system is to accept data from various sources, for example, Magnetic Resonance (MR) images, MR angiograms and X-Ray angiograms, and to produce a meaningful description of the imaged object for 3D visualization. We are using anatomical as well as other knowledge as background knowledge and we are acquiring domain specific knowledge through learning. The last type of knowledge, which is gained through practice and experience, exists in the minds of the radiologists and it is very difficult or impossible to extract from them.

 

3.1 Why learning

In many meetings with radiologists, they were asked to perform an image interpretation while providing a verbal commentary. We found that the radiologists use different kinds of knowledge, which gives them various clues during an investigation, and interpretation of images. Radiologists can easily detect patterns in angiograms that show a particular disease. For example, a narrowing of a blood vessel indicates a stenosis. A stroke is manifested by blocked vessels. In an Arterio-Venous Malformation (AVM) enlarged vessels and early filling of some veins can be detected. In the case of a tumor a few patterns can be observed: vessel displacement, vessels pushed away, and the Sylvian point is depressed. In the case of an AVM, radiologists also use signs for recognizing a feeding vessel such as: subtle enlargements, increased flow and changing contrast at the entrance or vessel disappearing. In all of these cases, it is much easier to ask a radiologist to demonstrate the problem in an image and what to expect than to try to formalize their descriptions. Further, using examples of cases, they communicate more knowledge than if they are asked to provide general information. It is only possible to acquire such domain specific knowledge by learning from images which have been processed by experts (Zrimec et al., 1997a).

One of the modern imaging techniques is able to record the bio-energy field distribution of human aura, providing a very comprehensive image of the functioning of the entire mind-body system (Chalko, 1996). We are preparing experiments for invetisgation human aura based on the CrownTV equipment. CrownTV system uses the technique of Gas Discharge Visualization.

 

3.2 CCD device and GDV images

CCD (charge-coupled detector) type of imaging device is very sensitive to direct exposure of x-rays as well as light, therefore CCDs have found wide application in a dental radiography in a vidicon system and recently in a system for measuring human bio-energy field in a computerized Kirlian equipment. CCD images that are used in the GDV (Gas Discharge Visualization) technique, which is known as Kirlian effect, have a few good properties. Single frame can be stored in the computer and viewed continuously or a sequence of images can be produced and stored for further examination.

In order to follow the blood vessels, X-ray devices use protocol of producing temporal images. To be able to produce a clear imaged showing only the blood vessels a technique called Digital Subtraction Angiography (DSA) is used. In this technique the first image called mask is stored before injecting a contrast medium in the blood vessel. The difference of the first and the subsequent images taken after the injection outlines the blood vessels. Similar technique can be used in processing the sequence of images capturing the bio-energy state of a person. Paired images can be used to show differences of bio-energy field, which changes with some mental activities.

An important part of image analysis is to formulate and describe visual information, such as what type of image features we can extract from the images, what properties those features are expected to have, and how they are related to each other. However, a lot of knowledge is required to read auras (Chalko, 1996).

The image analysis of those images requires preprocessing, segmentation into meaningful image structures and transform the structures into a suitable representation. It was found that Oability to read the bio-energy field distribution around fingers provides extremely efficient way of diagnosing problems and malfunctions of various organs and systems of the organism, long before physical symptoms become evident (Chalko, 1996). Through long practice a considerable amount of domain knowledge has been collected for interpreting those aura images. We propose to apply similar approach to aura image processing as in our medical image understanding system i.e. to use domain (expert) knowledge to guide the image processing.

 

4 Machine learning of medical diagnostic rules from aura images

The procedure for using GDV images consists of image preprocessing to prepare the images for segmentation and image analysis. The preprocessing includes applying of different algorithms to improve the image quality or to enhance the image. Applying various filters and other algorithms for grey scale images will enable to extract and display the information patterns. Feature extraction process results in acquiring higher-level image information such as shape or colour (different intensity) information. Pattern classification process uses this higher level information and identifies objects or patterns within the image. Next, a correspondence between the patterns in the GDV images and particular disorders has to be established. Pattern classification can be done manually or automatic by using the domain knowledge.

We will perform experiments on data obtained and verified by classical medical verification as well as obtained and verified with the help of an extrasense therapist who is by himself able to see aura.

In both cases an expert will classify the patterns by attaching symbolic labels to patterns. In some cases a set of successive patterns can indicate a disease or some characteristic organisation of patterns in the image can strongly relate to some diagnose.

In medical imaging community recently new tools providing interactive communication between complex image data and the observer have been developed to assist the clinician and other experts to better delineate the pathology and to more accurately distinguish between normal and abnormal patterns. A similar tool for interactive communication between complex multi-layers aura images and the expert can be developed. Using such tool the expert can easily postprocess the images and attach labels to the recognised regions, which correspond or indicate a particular disease or can correct the segmented areas.

In the beginning of the experiments the classification will be done of-line on the printed images. Each image will be described with a set of attributes and a class. The set of attributes will contain values of the characteristic features (patterns) extracted from the image and some other attributes relevant for the image and the patient. The class value will be the diagnosis given by the expert.

The patient's aura will be recorded on his/her first visit to the physician specialist. After the classical diagnostic process and treatment is finished the verified diagnosis will be stored. The set of data from different patients with obtained image features and the verified diagnoses will serve as an input to the machine learning algorithm. After learning, the automatically derived diagnostic rules will be verified on separate 'test' cases.

 

5 Conclusion

We propose to apply the techniques, knowledge and experience gained from the work in the domain of medical imaging to another kind of medical images, that of human aura images.

We have available different tools for classical image processing and tools specialized for processing of medical images. We will investigate and compare the results of different approaches to aura image processing. The aim is to produce good and meaningful image analysis and to reveal more information encoded in the human aura.

We will try also various machine learning algorithms in order to produce interpretations useful for manual diagnosing by non-experts and adequate for developing an automatic diagnostic system.

 

References