Using Generative AI to validate causal chain of death

Overview

The Cal Poly Digital Transformation Hub (DxHub), in collaboration with the State of Michigan, developed CODA, an AI-enabled coaching system designed to improve the accuracy of death certificates. When a person dies, a death certificate filed with the state documents the who, when, where, and why. Survivors use death certificates to settle the affairs of the deceased. Collectively, however, death certificates provide important statistical information about the mortality of the population—information used to assess the burden of disease, identify emerging mortality trends, respond to epidemics and pandemics, and to support biomedical research. And while the who, when, and where of death certificates can be objectively determined, why someone died—the cause of death on the death certificate—is the best medical opinion of the physician completing the death certificate.

The cause of death statement on the US Standard Death Certificate in use across the United States conforms to an international standard designed to elicit a sequence of diseases or events in two parts. Part I asks the physician to enter a causal sequence—a chain of events such as diseases or complications—that directly caused the death. Part II asks the physician to list other significant conditions that may have contributed to the death but not included in the chain of events in Part I. Medical certifiers write cause of death statements on the death certificate in free text. To be useful for statistics, that free text is coded to the International Classification of Disease, 10^th Revision (ICD10), an international standard used to code mortality information.

Problem

Many physicians receive little or no training in medical school on the death certificate process, and most medical specialists’ complete death certificates infrequently. As a result, the quality of cause of death information has been questioned. To understand and categorize the quality of cause of death information, officials from the National Center for Health Statistics (NCHS), the Federal agency that oversees vital statistics in the US, classified underlying cause of death (UCOD) codes according to their suitability and utility for statistical use. NCHS identified three main subtypes of unsuitable causes of death. Including unknown and ill-defined causes, immediate and intermediate causes, and nonspecific causes. (Flagg & Anderson, 2021)According to NCHS, in 2018 nearly 35% of all deaths had unsuitable underlying cause of death codes. Table 1 below shows the number of Michigan residence deaths in 2022, as well as the number of suitable and unsuitable UCODs by type.

Table 1. Michigan deaths to residents, and Unsuitable cause of death codes by type, 2022

	n	%
Total Deaths	110,148	100%

Suitable UCODs	75,422	68.5%

Unsuitable UCODs	34,726	31.5%

-nonspecific	21,468	19.5%

-unknown	995	0.9%

-intermediate	12,263	11.1%

To improve the quality of cause of death information, jurisdictions have developed training programs to educate physicians and other medical professionals involved in the process. Attempts at training have had limited success, likely because most physicians infrequently complete death certificates.

All US jurisdictions currently use electronic death registration systems (EDRS), web-based applications that allow physicians to certify death certificates online. These systems, combined with modern machine learning and artificial intelligence (AI) technology present an opportunity to provide real-time training or feedback to physicians through the EDRS user interface as they complete death certificates, essentially coaching them by analyzing input, flagging potentially unsuitable causes or improbable causal sequences, and prompting for further information when the context suggests it is needed.

Conceptually, the Certificate of Death Agent (CODA) is an AI-enabled coach that would be made available to EDRS systems through a web service. CODA would analyze the cause of death information entered by physicians into the EDRS and suggest improvements such as:

Flagging mechanisms of death or other ill-defined conditions
Prompting physicians when entering an incorrect causal sequence
Prompting physicians when an underlying cause of death is an intermediate or immediate cause

The ultimate goal of this project is not to replace the physician’s medical judgment in the death certification process, but rather to coach the physician through an unfamiliar task.

Innovation In Action

After executing a data sharing agreement, Michigan resident deaths from 2022 were flagged with immediate or intermediate UCODs were provided to Cal Poly (12623 records). This was limited to Immediate or Intermediate causes—scenarios where an intelligent agent might prompt a physician to provide more information or correct a causal sequence of events.

Leveraging a dataset of over 12,000 Michigan death certificates with immediate or intermediate underlying causes, the DxHub team rapidly prototyped and tested CODA. The system analyzes the free-text causes of death entered by physicians and provides real-time feedback. CODA identifies issues such as invalid causes, improbable causal sequences, or nonspecific conditions, and then guides physicians to provide more precise, medically valid entries without overriding their professional judgment.

Technical Solution

CODA is built on Amazon Web Services (AWS) and at its core, the system uses a REST API hosted on Amazon API Gateway as the entry point for communications. Each submission from a physician triggers an AWS Lambda function, which routes the cause-of-death chain to the processing components.

The first stage of processing focuses on mapping the free-text medical conditions entered by physicians to ICD-10 codes. This is achieved using a PubMedBERT embeddings model running on Amazon EC2, which performs vector similarity search against medical terminology. To further refine the accuracy of this mapping, CODA incorporates Cohere Rerank 3.5 on Amazon Bedrock, ensuring that ambiguous or complex entries are more reliably aligned with valid ICD-10 codes.

Once conditions are mapped, CODA validates the causal chain in two stages. The system first checks whether each condition is appropriately positioned within the chain, following established rules for what constitutes an underlying cause versus an intermediate or immediate cause. In the second stage, the entire sequence is evaluated against medical relationship data stored in an Amazon Neptune graph database. This ensures that each condition in the chain plausibly leads to the next, preventing impossible or invalid sequences from being accepted without review.

Finally, the results of these validations are transformed into clear, actionable feedback for physicians. Using Amazon Bedrocks Anthropic’s Claude 3.5 Sonnet model on Amazon Bedrock, CODA generates concise prompts that explain errors or suggest corrections. These messages are then delivered back to the physician through API Gateway.

Results

CODA generated 6230 prompts, including 383 errors due to invocation time. Excluding the errors there were 5847 (47.7%) meaningful prompts issued for the 12,263 death certificate causal chains submitted. Table 2 below shows the types of prompts issued by the system.

Table 2. Error Prompt types

Prompt	Description
Invalid Cause	A single condition was entered for cause of death and it was not a valid UCOD code
Suggested Swap	Two or more conditions are listed and a simple swap of two conditions will result in a valid causal chain
Attempted swap	The input chain is incorrect so a swap of conditions was attempted but resulted in an invalid causal chain, so the chain can’t be fixed with a simple swap and needs revision.
Invalid causal chain	The condition in position i of the chain couldn’t be matched to a valid medical cause of death.
Swap not possible	The two conditions in question don’t fit in either order, so the chain can’t be fixed with a simple swap.
No ICDs	The system couldn’t recognize any conditions in your chain to check.

Table 3 shows the number of prompts issued by type of swap.

Table 3. Number of CODA error prompts by type

Error Prompts	n	%
Swap not possible	926	15.8%
Invalid Causal Chain	3026	51.8%
Invalid Cause	1089	18.6%
Suggested swap	422	7.2%
Attempted swap	384	6.6%
Total	5847	100.0%

Significance

These results suggest that an AI-enabled agent or coach could provide real-time guidance to physicians completing death certificates by prompting them for more information.

The invalid cause prompt represents the simplest case where a physician enters a single cause, such as septic shock or cardiac arrest, that does not code to a valid, suitable UCOD code. This simple intervention may have resulted in 1089 additional suitable UCOD codes in 2022, nearly 1% of all births that year. Taken together, all of the error problems might have improved the quality of Michigan mortality data by adding up to 5847 additional suitable cause of death codes, improving the quality and utility of Michigan mortality data for public health, policy, and research. This number could be considered a floor, and not a ceiling, as additional development, combined with advances in machine learning and natural language processing techniques will increase the value of this approach.

As an initial proof-of-concept, this project demonstrated the ability to develop an AI-augmented coach that can be integrated with existing EDRS systems to provide real-time feedback to physicians as they perform the critical function of documenting why a person died, resulting in data that will allow us all to live longer, healthier, lives.

Student Spotlight

Noor Dhaliwal

Software Developer

Supporting Documents

Source Code

All of the code and assets developed during the course of creating the prototype.

About the DxHub

The Cal Poly Digital Transformation Hub (DxHub) is a strategic relationship with Amazon Web Services (AWS) and is the world’s first cloud innovation center supported by AWS on a University campus. The primary goal of the DxHub is to provide students with real-world problem-solving experiences by immersing them in the application of proven innovation methods in combination with the latest technologies to solve important challenges in the public sector. The challenges being addressed cover a wide variety of topics including homelessness, evidence-based policing, digital literacy, virtual cybersecurity laboratories and many others. The DxHub leverages the deep subject matter expertise of government, education, and non-profit organizations to clearly understand the customers affected by public sector challenges and develop solutions that meet the customer needs.