Connect with us

Glossary

NLP for efficiency: explained by Kornelia Papp

In a recent speech, Swiss Re’s Kornelia Papp describes how data scientists combine language, technology and insurance.

Published

on

Take six words. How many sentences can you make out of these: you, than, I, insurance, love, more?

We can make four or more sentences from these six words.

I love you more than insurance.
You love insurance more than I.
I love insurance more than you.
I love insurance more than you.

Hang on...isn’t that just three sentences?

Add emphases, though, and now we have multiple meanings. Consider:

I love insurance more than you.
I love insurance more than you.

The differences are obvious to humans in conversation, but not to a computer. This example shows that language, while natural to us, is actually difficult and complex.

Natural language processing (NLP) is a branch of artificial intelligence that combines computer science and linguistics to allow computers to achieve humanlike levels of communication. Specialists such as Kornelia Papp, a Zurich-based cognitive data scientist at Swiss Re, are deploying NLP to help the industry become incredibly more efficient.

This article is based on a presentation Papp delivered at an insurance conference in Hong Kong in October.

Reinventing human language
How is it that humans understand one another, given the complexities of language? Well, sometimes we don’t. But when we’re conversing, our brains process not just the we hear, but also how we view the speaker, how they move, how their expressions change. We have a lot of shared knowledge that puts speech in context, and we are always learning new slang and idioms.

This human reality is why NLP is still often a poor tool for insurers seeking to efficiently rifle through and categorize zillions of documents. But it’s getting better, making it more realistic for insurers to use NLP to do useful things. These include, with varying degrees of success:

  • Identify a document’s language (French, Japanese, etc)
  • Classify or categorize a piece of text
  • Detect fraud
  • Determine sentiment or opinions
  • Retrieve information

The first basic method to make this happen is called optical character recognition (OCR). “It’s not sexy, but OCR is important,” Papp said. “Half of our data is in the form of documents and text, and we need to make these machine-readable.”

OCR is the ability of a machine to identify a letter in the alphabet or a Chinese character. Using deep learning, computers can also use OCR to identify logos, footers, and special characters.

It's not sexy, but OCR is important

- Kornelia Papp, Swiss Re

The second method is table extraction. “Understanding a table is not the same as understanding free text,” Papp said. “I need to keep its structure.” This becomes difficult once data scientists start merging cells, rows or columns.

Third: topic modeling. This is about determining the subject material in a report or a news story.

Fourth: image processing, a form of unstructured data processing.

Tailoring NLP for insurance
One area where data scientists have done well is document classification; Papp says models are now more than 90% accurate, meaning only a small portion of documents need a person to verify what they’re about.

Kornelia Papp, Swiss Re

Work is now focusing on the next level up: ontologies, that is, sets of concepts and categories in a subject that show the relationship among them. In other words: context.

For example, insurers want their computers to understand a legal document, say a contract to be negotiated. Does the computer understand the relationships among sentences and clauses? This requires the building of a gigantic library and all the potential phrases and synonyms. For example, a contract may also be referred to as an agreement or an article.

Why not just use Google or Baidu for this? Because Google trains its algos on news and Wikipedia, general sources in which insurance’s jargon doesn’t appear, or has a different meaning. And word meanings can change over time: the language used in a 20-year old life insurance contract may have evolved; modern phenomena that could impact a contract’s value may not be reflected in the original language (e.g., “Brexit” won’t show up in contracts or legal documents more than two years old).

“We need to train machines on our own data,” Papp said. “Words don’t have meaning on their own; they have meaning in context.”

As data science improves, NLP is creating ways for insurers to save time, and saving time means saving money. Expensive lawyers don’t need to be paid for the basic tasks of sorting through documents. Digitizing speech and text can also be used to save time for agents, by speeding up a customer’s processing.

Thanks for reading.

This is no.1 of your 3 free articles.

Copyright © 2017 Digital Finance Media Limited. All rights reserved.

NLP for efficiency: explained by Kornelia Papp