UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Discovering and understanding community opinions of neighbourhoods expressed in question answering platforms

Saeidi, M; (2017) Discovering and understanding community opinions of neighbourhoods expressed in question answering platforms. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of Marzieh Saeidi Thesis.pdf]
Preview
Text
Marzieh Saeidi Thesis.pdf - Accepted Version

Download (7MB) | Preview

Abstract

Humans value the opinions of others. In recent years, people have been using social media platforms to both voice and gather opinions. Looking for relevant pieces of information through the huge amount of expressed opinions across several platforms is an overwhelming task. This is why automatically extracting information from such sources has received a great deal of attention in both academia and industry. However, little work in this field has been dedicated to the domain of city neighbourhoods. One reason is that unlike for many products and services, there are no dedicated review platforms for collecting opinions regarding the neighbourhoods. In the absence of dedicated review sites, a great amount of expressed opinions on neighbourhoods and other domains can be found on community question answering (QA) platforms. So far, this data has not been used. This raises a question as to what the strengths and limitations of QA data are and what challenges does it bring for extracting opinion information expressed about neighbourhoods. In this thesis, we comprehensively investigate these questions, using data from Yahoo! Answers for neighbourhoods of London. First, we investigate how well QA discussions reflect the demographic attributes of neighbourhoods present in census (e.g. age, religion, etc.). Our results show that significant, strong and meaningful correlations exist between text features from QA data and many demographic attributes. For instance, the terms poverty, drug, and rundown are amongst the top correlated terms with the attribute deprivation. We further demonstrate that text features based on Yahoo! Answers discussions can achieve a very good accuracy in predicting a wide range of demographic attributes for neighbourhoods. These predictions outperform predictions that are made using Twitter data, a platform that has been used widely in the past for predicting many real-world attributes. Demographics data provides objective statistics related to the population of neighbourhoods. Many attributes of interest are not reflected in those statistics. For instance, census data does not record statistics regarding whether a neighbourhood is posh, quiet or good for nightlife. Knowing these aspects is complementary to the demographic attributes in forming an understanding of neighbourhoods. We investigate whether text features from QA data can predict such aspects. To do this, we create a dataset of neighbourhoods labeled with these aspects. Our prediction results show that QA data can predict such aspects with a higher performance compared to Twitter data in the presence of these labels. Predicting a single value for a characteristic of a neighbourhood cannot provide a complete picture of people's opinions. To provide a fine-grained summary, a popular approach is to extract the sentiments towards different aspects of a given entity from each expressed opinion. Aspect-based sentiment analysis has been studied extensively, but research has always utilised the text from dedicated review platforms where a user usually writes opinions on a single specified entity. In the absence of a review platform for neighbourhoods, we extend the task to process the text from QA platforms where fewer assumptions can be made and the data is noisy. We construct a human-annotated dataset based on text from Yahoo! Answers discussions with a high inter-annotator agreements of over 70%, a suitable level for this task. To address this task, we propose methods based on representations of text that are learned sequentially using recurrent neural models or representations that are defined using the traditional bag of n-grams features. Our proposed methods can achieve prediction accuracies on similar levels to the less challenging sentiment analysis tasks. In summary, the study in this thesis demonstrates the strengths of QA data in predicting the values of real-world entities and for extracting information from opinions, specifically for the domain of city neighbourhoods.

Type: Thesis (Doctoral)
Title: Discovering and understanding community opinions of neighbourhoods expressed in question answering platforms
Event: University of London
Open access status: An open access version is available from UCL Discovery
Language: English
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
URI: https://discovery.ucl.ac.uk/id/eprint/1555648
Downloads since deposit
430Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item