Jawaheer, G;
Kostkova, P;
(2011)
Web crawlers on a health related portal: Detection, characterisation and implications.
In:
2011 Developments in E-systems Engineering.
(pp. pp. 24-29).
IEEE: Dubai, United Arab Emirates.
Preview |
Text
Kostkova_PDFsam_DeSE2010Kostkova_JawaheerDeSE2010Webcrawlers-published2011.pdf Download (270kB) | Preview |
Abstract
Web crawlers are automated computer programs that visit websites in order to download their content. They are employed for non-malicious (search engine crawlers indexing websites) and malicious purposes (those breaching privacy by harvesting email addresses for unsolicited email promotion and spam databases). Whatever their usage, web crawlers need to be accurately identified in an analysis of the overall traffic to a website. Visits from web crawlers as well as from genuine users are recorded in the web server logs. In this paper, we analyse the web server logs of NRIC, a health related portal. We present the techniques used to identify malicious and non-malicious web crawlers from these logs, using a blacklist database and analysis of the characteristics of the online behaviour of malicious crawlers. We use visualisation to carry out sanity checks along the crawler removal process. We illustrate the use of these techniques using 3 months of web server logs from NRIC. We use a combination of visualisation and baseline measures from Google Analytics to demonstrate the efficacy of our techniques. Finally, we discuss the implications of our work on the analysis of the web traffic to a website using web server logs and on the interpretation of the results from such analysis. © 2011 IEEE.
Type: | Proceedings paper |
---|---|
Title: | Web crawlers on a health related portal: Detection, characterisation and implications |
Event: | 2011 Developments in E-systems Engineering |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/DeSE.2011.83 |
Publisher version: | https://doi.org/10.1109/DeSE.2011.83 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Inst for Risk and Disaster Reduction |
URI: | https://discovery.ucl.ac.uk/id/eprint/10088955 |
Archive Staff Only
View Item |