You are a bank or an insurer. You need to pool data to, say, build a credit model or develop speech-recognition software. Or maybe you need to update your customer app by pulling everyone’s personal data from their smartphone to the cloud.
If you’re a very large bank or insurer, you have sufficient access to enough data to do all this by yourself (maybe). But mid-sized players don’t. And even the largest firms are finding it harder to access data that is meant to be private, secure, and maybe subject to data-localization laws.
And even if you can harness all of that data, you may lack the sheer bandwidth to move it from one server or device to another and back again.
Google faced this issue with updating app software on all the Android phones out there. Two years ago, its engineers came up with an idea: instead of trying to get their hands on everyone’s data, why not leave the data in place?
Instead, they built a model to measure how the data would respond to a Google algorithm (in this cased, an app update). Data points would be analyzed for how they respond to the algo’s parameters, but the data itself would remain private and untouched. To further protect the privacy of the underling data, encryption was added to the transmission of the parameters back to Google.
Put these together, and you have federated artificial intelligence: insights into data based on federating disparate, protected data points, without disrupting privacy, security, or where the data is held.
Yang Qiang, head of the A.I. team at WeBank in Shenzhen, in an interview with DigFin, says the trick to analyzing data you can’t touch involves two mathematical functions.
When you share data, it’s like a joint investmentYang Qiang, WeBank
The first is the loss function. This is basically measuring errors, or the “cost” associated with a mathematical event. It’s a longstanding statistical tool used by insurers to model benefits versus premiums, or by banks to estimate risks of losing money on a transaction.
Loss functions can be powerful when applied to real-world situations (that is, empirical, measurable experience), but are too often based on academic hunches (as too many banks and investors did in the run-up to 2008).
The second tool is what computer scientists call gradient functions. Gradients are derivatives, measuring the rate of change of a function, in this case, the direction of a movement.
So to put this together in banker-speak: the math geniuses can test far-flung data in a Monte Carlo sequence to figure out which ones have the greatest tracking error against which scenario.
What it does
That way they can test an algorithm, like “will this program update everybody’s software”, or “will this program tell me whether this loan product is priced to make the bank money”.
What federated A.I. can’t tell you is anything about the data itself or the people or companies it involves. It can’t tell you that Lucy is a good bet for your new life insurance policy, or that Widgets’R’Us should be declined a loan.
Rather it allows companies to run big-data analytics on data sets that are otherwise out of reach. All data comes with bias. Every institution has some kind of distortions or imbalances in their data sets. Federated A.I. broadens the data pool, which smoothes out a lot of these biases and gives users a more accurate view of “the world”. That helps developers make algorithms – for trades, loans, premiums – that are more accurate than they could make if they just relied on in-house data.
“In the digital economy, data is like money,” Yang said. “When you share data, it’s like a joint investment. Machine learning and data mining let you extract knowledge from the data. Raw data is not useful, but if you have enough of it, you can still extract knowledge.”
Data sets and standards
Google has been testing federated A.I. for the B2C world of phone apps. This year, WeBank, the digital bank under Tencent, has begun backing federated A.I. for the B2B world. Both of these initiatives are open-sourced (WeBank’s uses Linux). Anyone can see the code and contribute to it.
For smaller institutions, federating A.I. makes possible access to data to test algos that would otherwise be only available to the biggest corporations. It also allows them to tap “long-tail” data, from individuals or small businesses that would not be viewed as useful to banks serving a smaller number of large clients.
Because the concept is new, however, it has a long way to go.
The first challenge is how to set standards for exchanging data without breaching privacy and security. Google has designed one model, WeBank has another, but there’s no definition yet for how to measure and exchange data.
Again in banker-speak: the world needs a SWIFT for messaging around data. Otherwise a company that wants to exchange data with WeBank may find itself in an apples-and-oranges dilemma.
The more the merrier
The second challenge, specific to B2B federated A.I., is the need to involve vast amounts of data. In other words, the federation needs lots of members and contributors willing to share their data with the model (that is, let the data be treated, without being revealed or moved).
Yang says it is possible to reap benefits with only a few contributors. But there are different outcomes, depending on what is to be measured.
Think of it as a grid. On one axis, is the number of users or customers whose data is being reviewed. Google’s Android work is 100% on this axis: lots of users being tested for tiny amounts of data (so that a mobile phone is enough to participate).
The other grid is features (a banker might call these “factors”): credit histories, income levels, doctor visits.
If a federation includes just one bank and one insurance company or healthcare provider, there will be some overlap. Some might share the same customers. Or they might all be interested in the same factor.
Yang says user data begins to add value with just 2,000 sources, as can just a handful of seemingly unrelated corporations. But WeBank is keen to get as many companies around the world to join, particularly in financial services and healthcare.