Statistical
Fraud Detection
Our goal has been to design statistically principled fraud detection
algorithms that are able to keep up with the stream of transactions, use
all the information about each customer, and detect fraud reliably and
accurately with acceptable false alarm rates, even if the customer compromised
by fraud has made only one or two transactions in the past. The quickest
way to process the data transaction-by-transaction is to build the
algorithms on top of software that is designed to process transactions
for billing. So, instead of starting a bill for a new customer and then
updating it with each new transaction that the customer makes, we start
a signature of predicted usage behavior for each customer, update it with
each non-suspicious transaction that the customer makes, score transactions
for fraud using predicted behavior for the customer as the baseline, and
then accumulate translation scores into account fraud scores that are updated
with each new transaction that the customer makes. All this is done with
each transaction, so it has to be fast enough to keep up with the flow
of transactions. So, not only do we have to cope with tremendous
variability across the customer base, high turnover among customers (so
at any point in time there is little information on most customers), but
we also have to live with the limited numerical capabilities of the computers
that handle transaction processing for billing.
Every step of fraud detection is a fascinating, nonstandard statistical
problem. Some background on fraud from both a conventional perspective
and our statistical perspective are described in the paper
Detecting Fraud in the Real World[PDF][PostScript].
Here are
a few of the major statistical issues that arise in fraud detection, and
a sketch of our approach to coping with them.
-
Massive Model Selection
Our algorithms are based on tracking the behavior of each customer
with a multivariate probability distribution that can be used to predict
the customer's next legitimate call. We call our estimate of this distribution
a signature because it captures (ideally, at least) all that is
known about the customer's current transaction behavior. But monitoring
all possible variables and all their possible interactions for all customers
in complete detail would require too much space and probably too much processing.
Monitoring the same set of marginal and conditional distributions for all
customers also simplifies processing. Thus, signature design involves choosing
a set of marginal and conditional distributions that is best for detecting
fraud subject to constraints on space. Statistically speaking, we have
to select the form of the model to represent each customer's current behavior,
choosing the one that is best for distinguishing fraudulent from legitimate
behavior. The criterion for best that we adopt is "best for many and good
for most customers" rather than "best averaged over all customers". The
details are described in the paper Reducing Transaction Databases,
Without Lagging Behind the Data or Losing Information[PDF][PostScript].
-
Initialization of Signatures
To catch fraud quickly, we need to be able to assign a meangingful
initial signature or predictive distribution to new customers. We have
designed new initialization methods that are based on at most a few transactions
for the customer. These methods resemble clustering or customer segmentation,
but with two important differences. First, each marginal and conditional
distribution in the signature is initialized separately, which gives a
huge number of possible multivariate predictive distributions. This is
justifiable because the model selection stage retains the important dependencies
in signature components. Second, the metrics for optimizing initial distributions
are based on the idea of "best for many" and "good for most" rather than
on "best averaged over all customers". Our methods are described
in more detail in the paper Reducing Transaction Databases,
Without Lagging Behind the Data or Losing Information[PDF][PostScript].
-
Incremental Updating Algorithms
The predictive model for each customer is updated with each transaction
that the customer makes. Thus, the customer eventually moves from a customer
segment defined by its initial distributions to a personalized segment
that reflects its individual behavior. Standard dynamic updating algorithms
(like exponential weighted moving averaging) can often be used if the variable
being updated can be modeled as a random draw from its signature component.
There are some hard questions that updating raises, though, especially
when updating has to be fast and space efficient.
-
Chronological variables
These are periodic timing variables, such as day-of-week or hour-of-day.
They are not observed at random but "in order", so all the Monday transactions
for the week have to be observed before all the Tuesday transactions for
the week, for example. We have used a dynamic Poisson process to derive
an approximation to a posterior mean that is almost as easy to compute
as a linear estimate and that predicts well on both real and simulated
data. This work is described in the paper Estimating Millions of
Dynamic Patterns in Real-Time[PDF][PostScript] in JASA,
March
2001.
-
Quantiles
Incremental estimation of quantiles is difficult because quantiles
are not linear estimates and the quantile of a small sample is unstable
or even undefined for a sample of size one (the relevant sample size when
updating occurs transaction-by-transaction. We have successfully
used a variant of stochastic approximation that we call exponentially weighted
stochastic approximation to track quantiles dynamically, though. This work
is described in the paper Incremental Quantile Estimation for
Massive Tracking[PDF][PostScript] that appears in the Proceedings
of KDD2000.
-
Updating Quantized Probabilities
With limited space, each reported probability can assume only a finite
number of values (say 256 if each probability is allowed one byte). Often,
it is necessary to use a weight for exponentially weighted moving averaging
that is smaller than the difference between adjacent probability values
to avoid unstable estimates, though. If so, the initial probabilities can
never be moved by data with standard exponential weighting. To avoid stagnant
probabilities, we use a form of stochastic quantization that is unbiased.
This work is described in the paper Reducing Transaction Databases,
Without Lagging Behind the Data or Losing Information[PDF][PostScript].
For further information about this work, contact Diane Lambert, José Pinheiro
or Don Sun.
Back to: [projects]
[statistics homepage]