© 1992 by Institute of Mathematics and its Applications
Developing rule-based systems for credit-card applications from data with the genetic algorithm
Computer Studies and Mathematics Department, Bristol Polytechnic Bristol BS16 1QY
Received on 1 July 1991. Learning a set of rules from data is basically a problem of classifying the fields of data available and combining them to give the best prediction of the goal variable. A suitable cost function for this problem is supplied by information theory in the form of information entropy. Limiting the number of classes for each field to a relatively small number, and allowing the user to define when the predictive value of a class can be considered irrelevant, can avoid the generation of a set of rules that contains a lot of irrelevant information embodied in the data.
The genetic algorithm uses a technique analogous to natural evolution to search a large space of possible solutions for a near-optimum one. The search is conducted by evaluating a number of randomly generated possible solutions from the space, and then repeatedly selecting a number of pairs of these solutions with a probability proportional to their value, forming new solutions from the pairs using operators such as crossover and mutation, evaluating the new solutions, and replacing old solutions with them. Such a search requires thousands of evaluations in order to converge. To accelerate the process, a system has been built on a large multi-processing computer and will run on a Transputer-based parallel database engine.
The rule-based systems are being built in collaboration with TSB Trustcard, and are an application of current research to problems of credit control.