UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A data mining model to capture user web navigation patterns

Cabral de Moura Borges, José Luis; (2000) A data mining model to capture user web navigation patterns. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of A_data_mining_model_to_capture.pdf] Text
A_data_mining_model_to_capture.pdf

Download (7MB)

Abstract

This thesis proposes a formal data mining model to capture user web navigation patterns. Information characterising the user interaction with the web is obtained from log files which provide the necessary data to infer navigation sessions. We model a collection of sessions as a hypertext probabilistic grammar (HPG) whose higher probability strings correspond to the navigation trails preferred by the user. A breadth-first search algorithm (BFS) is provided to find the set of strings with probability above a given cut-point; we call this set of strings the maximal set. The BFS algorithm is shown to be, on average, linear in the variation of the number of iterations performed with the grammar's number of states. By making use of results in the field of probabilistic regular grammars and Markov chains, the model is provided with a sound foundation which we use to study its properties. We also propose the use of entropy to measure the statistical properties of a HPG. Two heuristics are provided to enhance the model's analysis capabilities. The first heuristic implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring the trails with higher probability. A stopping parameter measures the distance between the current rule-set and its corresponding maximal set providing the analyst with control over the number of induced rules. The second heuristic aims at finding a small set of longer rules composed of links with high probability on average. A dynamic threshold is provided whose value is set in such a way that it can be kept proportional to the length of the trail being evaluated. Finally, a set of binary operations on HPGs is defined, giving us the ability to compare the structure of two grammars. The operations defined are: intersection, difference, union, and sum.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: A data mining model to capture user web navigation patterns
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest.
Keywords: Applied sciences; User navigation
URI: https://discovery.ucl.ac.uk/id/eprint/10107582
Downloads since deposit
85Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item