How does Sprout.ai use Large Language Models (LLMs)?

Here at Sprout.ai, we use large language models (LLMs) in several innovative ways to automate insurance claims processing and detect fraud. They are a key technology for streamlining the insurance claim process, making it faster, more accurate, and less susceptible to errors.

This not only benefits insurers by reducing operational costs and improving efficiency, but also enhances the customer experience by speeding up claim resolutions.

What are Large Language Models (LLMs)?

Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand and generate human-like text based on the training they receive from vast amounts of textual data.

These models are a type of machine learning model known as transformers, which use deep learning techniques to process words in relation to all the other words in a sentence, rather than one at a time.

This allows LLMs to generate coherent, contextually relevant text based on the input they receive. They are highly effective in tasks such as language translation, content generation, summarisation, and, as in the case of Sprout.ai, processing unstructured data from documents in multiple languages.

Here’s a detailed look at how we’re using these models:

1. Natural language processing (NLP)

Unstructured data refers to information that does not follow a predefined model or format, which is common in the insurance industry where claims forms and supporting documents can vary widely.

We use NLP technologies to interpret unstructured data from insurance claim documents. This includes reading and understanding text from PDFs, images, and handwritten notes—even in complex languages such as Japanese. Sprout.ai’s systems use NLP to identify and categorise key data points from this unstructured text, such as claim amounts, dates, policy numbers, and specific terms relevant to the insurance claim being processed.

The ability of LLMs to extract and process this data is how we identify relevant information quickly and accurately.

To handle multiple languages, our NLP models are trained on diverse datasets that include multiple languages. This training enables the models to understand and process text in various languages with high accuracy. The system can identify the language of the text automatically and apply the appropriate NLP model tailored for that language, ensuring effective processing without manual intervention.

2. Optical character recognition (OCR)

Next, we use advanced OCR technologies, which are integrated with LLMs, to convert different images and document formats into machine-readable text. This step is how we digitise all relevant documentation associated with insurance claims.

Sprout.ai’s OCR is optimised for claims processing and is capable of recognising text across various formats and from different document types. This includes everything from standardised forms to freeform notes and even handwritten text.

The OCR technology we use is enhanced with machine learning to improve its accuracy and efficiency over time, especially in handling documents in non-Latin scripts like Japanese, which are typically more challenging for standard OCR systems.

3. Data verification and enrichment

Once the data is extracted, LLMs are used to enrich and verify this information. Our systems cross-reference the extracted data with external databases—complying with GDPR for data protection—to validate claims and detect any potential fraud. This process involves things like checking the data against weather reports, geolocation data, business directories, and medication databases to ensure the claims are consistent with real-world events and information.

4. Policy checking

We also use LLMs to interpret the specific terms and conditions of insurance policies. The models contextualise the extracted data relevant to the policies, checking for coverage limits, inclusions, exclusions, and excesses, ensuring that each claim is assessed according to the correct policy terms.

5. Predictive modelling

Deep learning algorithms, a subset of LLMs, are used to predict the outcome of claims based on historical data and the insurer’s claims handling philosophy. These models are trained to provide recommendations on whether a claim should be approved, the likely amount to be disbursed, and to identify any claims that require more detailed human review.

6. Synthetic data

We use synthetic data to train our AI models. Typically, training an AI requires a comprehensive dataset that demonstrates what good outcomes look like. This process involves extensive data cleaning, labelling, and normalisation.

Accessing sufficient quality data can be difficult, as it is often stored in dispersed, unstructured, hard to access formats. To address this, we use smaller samples of synthetic data to train the AI more quickly.

For example, if Sprout’s AI needs to learn about a specific type of document, we can create multiple synthetic versions of this document to enhance training. Instead of the AI seeing the document format once, it can see it hundreds of times through these synthetic replicas. This accelerated learning process helps the AI to recognise and process similar documents more effectively in the future.

LLMs come in here in a couple of ways. They can analyse and interpret large volumes of text data, helping to clean, label, and normalise it more efficiently. They are also how we generate textual data that mimics real-world examples, such as forms.

Conclusion

By using LLMs for Natural Language Processing (NLP), Optical Character Recognition (OCR), data verification and enrichment, policy checking, and predictive modelling, we’ve created a highly efficient, accurate, and scalable solution for managing complex, multilingual claims.

This not only streamlines the insurance claim process, making it faster and reducing the likelihood of human error, but also delivers considerable cost savings by automating routine tasks and enhancing fraud detection.