Abstraction and structure in knowledge discovery in databases.
Doctoral thesis, UCL (University College London).
Knowledge discovery in databases is the field of computer science that, combines different computational, statistical and mathematical techniques, in order to create systems that support the process of finding new knowledge from databases. This thesis investigates adaptability and reusability techniques, namely abstraction and structure tactics, applicable to software solutions in the field of Knowledge Discovery in Databases. The research is driven by two business problems in operational loss specifically fraud and system failure. Artificial Intelligence (AI)1 techniques are increasingly being applied to complex and dynamic business process. Business processes that require analytical processing of large volumes of data, highly changing and complex domain knowledge driven data analysis, knowledge management and knowledge discovery are examples of problems that would typically be addressed using AI techniques. To control the business, data and software complexity, the industry has responded with a wide variety of products that have specific software architectures, user environments and include the latest AI trends. Specific fields of research like knowledge discovery in databases (KDD)  have been created in order to address the different challenges of supporting the discovery of new knowledge from raw data using AI related techniques (e.g. data mining). Regardless of all this academic and commercial effort, solutions for specific business processes are suffering from adaptability, flexibility and reusability limitations. The solutions‟ software architecture and user interfacing environments are not adaptable and flexible enough to cope with business process changes or the need to reuse accumulated knowledge (i.e. prior analyses). Consequently the life time of some of these solutions is reduced severely or increasing efforts are required to keep them running. This research is driven by a specific business domain and it is conducted in two phases. The first phase focuses on a single intelligent and analytical system solution and aims to capture specific problem domain requirements that drive the definition of a business domain specific KDD reference architecture. Through a case study a detailed analysis of the semantics of fraud detection is done and the elements, components and services of an intelligent and analytics fraud detection system are investigated. The second phase takes the architectural observations from phase I, to the more generic and wide KDD challenges, defines an operational loss domain model, a reference architecture and tests its reuse in a different type of operational loss business problem. Software related KDD challenges are revised and addressed in the reference architecture. A second application is analysed through a second case study and it is used to test the architecture and refine it. This application is in the domain of detection and prevention of operational loss due to data related system failure, The software architectures defined in the different phases of this research are analyzed using the Architecture Trade off Analysis Method (ATAM)2  in order to evaluate risks and compare their adaptability, flexibility and reusability properties. This thesis has the following contributions: It constitutes one of the first investigations of adaptability and reusability in business domain specific KDD software architecture from an abstraction and structure viewpoint. It defines the TRANSLATIONAL architectural style for high data volume and intensive data analysis systems that supports the balancing of flexibility, reusability and performance. Using the TRANSLATIONAL architectural style, it defines and implements OL-KDA, a reference architecture that can be applied to problems in operational loss, namely fraud and data related system failure, and supports the complexity and dynamicity challenges. Developed and implemented a method for supporting data, dataflow and rules in KDD pre-processing and post-processing tasks. It defines a data manipulation and maintenance model that favours performance and adaptability in specific KDD tasks. Two substantial case studies where developed and analysed in order to understand and subsequently test the defined techniques and reference architecture in business domains. 1 AI: Artificial Intelligence techniques are used in computer science to mimic or use aspects of human behavior within information systems. 2 ATAM: Architecture analysis method that focuses on analyzing quality attributes and use cases.
|Title:||Abstraction and structure in knowledge discovery in databases|
|Additional information:||Permission for digitisation not received|
|UCL classification:||UCL > School of BEAMS > Faculty of Engineering Science > Computer Science|
Archive Staff Only