Cabral de Moura Borges, José Luis;
(2000)
A data mining model to capture user web navigation patterns.
Doctoral thesis (Ph.D), UCL (University College London).
Text
A_data_mining_model_to_capture.pdf Download (7MB) |
Abstract
This thesis proposes a formal data mining model to capture user web navigation patterns. Information characterising the user interaction with the web is obtained from log files which provide the necessary data to infer navigation sessions. We model a collection of sessions as a hypertext probabilistic grammar (HPG) whose higher probability strings correspond to the navigation trails preferred by the user. A breadth-first search algorithm (BFS) is provided to find the set of strings with probability above a given cut-point; we call this set of strings the maximal set. The BFS algorithm is shown to be, on average, linear in the variation of the number of iterations performed with the grammar's number of states. By making use of results in the field of probabilistic regular grammars and Markov chains, the model is provided with a sound foundation which we use to study its properties. We also propose the use of entropy to measure the statistical properties of a HPG. Two heuristics are provided to enhance the model's analysis capabilities. The first heuristic implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring the trails with higher probability. A stopping parameter measures the distance between the current rule-set and its corresponding maximal set providing the analyst with control over the number of induced rules. The second heuristic aims at finding a small set of longer rules composed of links with high probability on average. A dynamic threshold is provided whose value is set in such a way that it can be kept proportional to the length of the trail being evaluated. Finally, a set of binary operations on HPGs is defined, giving us the ability to compare the structure of two grammars. The operations defined are: intersection, difference, union, and sum.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | A data mining model to capture user web navigation patterns |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Thesis digitised by ProQuest. |
Keywords: | Applied sciences; User navigation |
URI: | https://discovery.ucl.ac.uk/id/eprint/10107582 |
Archive Staff Only
View Item |