It is no secret that the capabilities of artificial intelligence (AI) are rapidly increasing, particularly in healthcare. Not only this, but its ability to diagnose and predict disease risk is also speeding ahead. 

In recent weeks, researchers have revealed AI models that can scan retinal images to predict eye- and cardiovascular-disease risk, and that analyse mammograms to detect breast cancer. In reports, researchers have explained that an algorithm is successful if it can identify a particular condition from such images as effectively as pathologists and radiologists can. 

Even though some AI tools have found their way into the clinic already, that by no means that it is ready for full implementation, reports Nature. Most reports are viewed as analogous to studies showing that a drug kills a pathogen in a Petri dish. There is no denying that these studies are exciting, but scientific process demands that the methods and materials be described in detail and that the study is replicated and the drug tested in a progression of studies culminating in large clinical trials. This, however, doesn’t seem to be happening enough in AI diagnostics. Many in the field complain that too many developers are not taking the studies through to more final stages. Therefore, they are failing to apply the evidence-based approaches that are established in mature fields, such as drug development.

Furthermore, reports of new AI diagnostic tools say that shockingly they go no further than preprints or claims on websites. What’s more, they haven’t undergone peer review, and might never do so. 

For example, one investigation published last year found that an AI model detected breast cancer in mammograms better than 11 pathologists who were allowed assessment times of about one minute per image. But, a pathologist given unlimited time performed as well as AI and found difficult to detect cases more often than the computers did. 

It is important to bear in mind that some issues might not arise until the tool is applied. For instance, a diagnostic algorithm might incorrectly associate images produced using a particular device with a disease – but only because, during the training process, the clinic using that device saw more people with the disease than did another clinic using a different device. 

However, these problems can be overcome. One way is for doctors who deploy AI diagnostic tools in the clinic to track results and report them, so that retrospective studies expose any deficiencies. Such tools should be developed rigorously, whilst being trained on extensive data and validated in controlled studies that undergo peer review. This process can be very slow and difficult, partly because privacy concerns can make it hard for researchers to access the massive amounts of medical data needed. 

Approaching this with rigor does bring with it risks. The hype-fail cycle could discourage others from investing in similar techniques that might, in fact, be better. AI is an extremely competitive field, and a well-publicised set of results can be enough to prevent rivals from entering the same field.

A better way to approach this is definitely slowly and carefully. Backed by reliable data and robust methods, it may take longer, and will not churn our as many crowd-pleasing announcements. But, it could prevent deaths and change lives in the long term.