Insurance Part 3: The Data Foundation

02.23.2026

Insurance Part 3 The Data Foundation-4df24f.png

The "AI Adoption" Series: Where We Are

Part 1: We established the Business Strategy (Outcomes: Lower Combined Ratio, Profitable Growth).
Part 2: We defined the Subordinate Strategies (IT enables, HR upskills, Ops redesigns).

Now we arrive at the fuel. The most sophisticated AI engine in the world will seize up if you pour sand into the gas tank. For insurers, that "sand" is the vast, messy, siloed swamp of legacy data.

To make AI work, we must move from the abstract buzzword of "Big Data" to the concrete reality of data Utility, Hygiene, and Accessibility.

The Industry Reality: Drowning in Documents

Insurance is unique among industries because its primary product is a contract—a document. As a result, the industry is sitting on a mountain of unstructured information that traditional databases cannot read.

The Unstructured Problem: It is estimated that 80% of insurance data is unstructured (PDFs, emails, images, adjuster notes), according to Accenture.
The Utilization Gap: Because this data is hard to reach, it is often ignored. Research suggests that while insurers collect massive amounts of data, they analyze only a fraction of it for decision-making.
The Cost of "Bad Data": Poor data quality is not just annoying; it is expensive. Gartner estimates that poor data quality costs the average organization $12.9 million annually. In insurance, this manifests as "leakage"—paying claims you shouldn't because the data needed to deny them was trapped in a PDF attached to a different file.

The Strategic Imperative:

Your goal is not to "clean all the data." That is an impossible, multi-year money pit. Your goal is to clean the specific data required to execute the Business Strategy defined in Part 1.

The Strategy Template: Hygiene, Accessibility, Lineage

To build a data foundation that supports AI, you need to execute on three specific fronts.

1. Data Hygiene: Defining the Truth

"Cleaning data" is not just about fixing typos. It is about semantic consistency (Ontology).

The Problem: In System A, "Cyber" is a standalone policy type. In System B, "Cyber" is a rider attached to a General Liability policy. When you ask an AI, "What is our total Cyber exposure?", it will give you a wrong answer because it cannot reconcile the two definitions.
The Fix: You must establish a Data Dictionary for the critical fields related to your strategy. If your strategy is "Automate Small Commercial Quotes," you must rigorously define and standardize "Revenue," "Employee Count," and "SIC Code" across every system that touches that customer.

2. Data Accessibility: The "Wrapper" Strategy

As discussed in the IT Subordinate Strategy, you cannot wait to replace your mainframes. You must access the data where it lives.

The Problem: Data is often "trapped" in legacy systems that run overnight batch processes. AI needs real-time data.
The Fix: Build an API Layer (or "Wrapper") that sits on top of the legacy systems. This layer pulls data out of the mainframe, standardizes it (applying the Hygiene rules), and presents it to the AI models in a modern format (JSON/REST). This allows you to modernize your data access without risking a "rip and replace" of the core system.

3. Data Lineage: The Governance Anchor

This is where the "Governance" underpinning becomes critical.

The Problem: An AI model predicts a high risk of fire for a specific property. The underwriter asks, "Why?"
The Fix: You must have Data Lineage—a clear audit trail that shows exactly where the data came from. Did the model see a permit for "wood stove installation" in a municipal database? Or did it hallucinate based on a similar address?
Practical Rule: If you cannot trace the data back to its source, you cannot use it for automated decision-making.

The Direction: From Historical to Real-Time

The nature of the data itself is changing.

Current State: Insurers rely on Internal, Historical Data (loss runs, past claims, demographic tables). This tells you what happened.
Future State: Leading insurers are shifting to External, Real-Time Data.
- Property: Using satellite imagery to see if a roof is damaged before writing the policy.
- Workers Comp: Using wearable IoT data to see if a worker is lifting with poor posture before an injury occurs.
The Trend: The most valuable data for your AI will likely not come from your own database. It will come from third-party APIs (weather, geospatial, economic).

Next Step: Making the Data Speak

You now have the Strategy (Part 1), the Team (Part 2), and the Fuel (Part 3). Now we need to ignite it.

In Insurance Part 4, we will discuss Analytics & Machine Learning. We will cover how to use this clean data to build predictive models that find patterns humans miss—moving from "What happened?" to "What will happen?"

Salvatore Magnone is a father, veteran, and a co-founder, a repeat offender in the best way in fact, and a long-time collaborator at DOOR3. Sal builds successful, multinational, technology companies and runs obstacle courses. He teaches business and military strategy at the university level and directly to entrepreneurs and military leaders.

https://www.linkedin.com/in/salmagnone/