Statistical Methods

Statistical Fraud Detection

Our goal has been to design statistically principled fraud detection algorithms that are able to keep up with the stream of transactions, use all the information about each customer, and detect fraud reliably and accurately with acceptable false alarm rates, even if the customer compromised by fraud has made only one or two transactions in the past. The quickest way to process the data transaction-by-transaction is to build the algorithms on top of software that is designed to process transactions for billing. So, instead of starting a bill for a new customer and then updating it with each new transaction that the customer makes, we start a signature of predicted usage behavior for each customer, update it with each non-suspicious transaction that the customer makes, score transactions for fraud using predicted behavior for the customer as the baseline, and then accumulate translation scores into account fraud scores that are updated with each new transaction that the customer makes. All this is done with each transaction, so it has to be fast enough to keep up with the flow of transactions. So, not only do we have to cope with tremendous variability across the customer base, high turnover among customers (so at any point in time there is little information on most customers), but we also have to live with the limited numerical capabilities of the computers that handle transaction processing for billing.

Every step of fraud detection is a fascinating, nonstandard statistical problem. Some background on fraud from both a conventional perspective and our statistical perspective are described in the paper Detecting Fraud in the Real World[PDF][PostScript]. Here are a few of the major statistical issues that arise in fraud detection, and a sketch of our approach to coping with them.