UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Data mining temporal and indefinite relations with numerical dependencies

Collopy, Ethan Richard; (1999) Data mining temporal and indefinite relations with numerical dependencies. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Data_mining_temporal_and_indef.pdf]
Preview
Text
Data_mining_temporal_and_indef.pdf

Download (5MB) | Preview

Abstract

We propose that data mining, the search for useful, non-trivial and previously unknown information within a database, can be successfully performed with Numerical Dependencies (NDs), a generalisation of Functional Dependencies (FDs), to model the data, together with resampling, a computationally intensive statistical sampling process, which allows us to make inferences from temporal and indefinite databases. We use NDs to model relations containing temporal and indefinite information. We extend the theory of NDs by presenting measures for data mining and generalise the chase procedure, a method for updating a relation to satisfy a constraint set, for NDs. We motivate NDs in real-world applications by introducing a database design tool. The consistency problem, that of attempting to find a relation satisfying a set of FDs within an indefinite relation, known to be NP-complete, is studied in the context of using NDs for approximation. We employ resampling, based on taking samples of definite relations from indefinite ones, on incremental sample sizes until an approximate fixpoint is reached, denoting an upper bound on the required sample size. Extensive simulations highlight that resampling to find upper bounds in conjunction with the chase for indefinite relations returns valid approximate solutions. We also study NDs in temporal sequences of relations for knowledge discovery purposes. Each relation within a sequence is mined for a set of NDs which evolve with updates in data. We introduce a temporal logic for the discovery of rules and properties within these sequences, or subsequences, which includes statistical functions within the temporal operators for time series analysis. We also show that time series data may be analysed using a restricted set of the logic. We apply discovery algorithms to both sequences and resampled sequences, allowing smoothing for trend detection. Investigations, presented herein, show these rules to provide interesting and practicable results.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Data mining temporal and indefinite relations with numerical dependencies
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Applied sciences; Knowledge discovery
URI: https://discovery.ucl.ac.uk/id/eprint/10107566
Downloads since deposit
61Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item