ROAR: increasing the flexibility and performance of distributed search.
Doctoral thesis, UCL (University College London).
Search engines are a fundamental building block of the web. Be they general purpose web search engines, product search engines for online catalogues or people search in online networks, search engines provide easy access to a huge amount of information. To cope with large amounts of information, search engines use many distributed servers to perform their functionality. For instance, to search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are added for each of these machines. The key parameter of these search algorithms is the trade-off between replication and partitioning: increasing the partitioning level typically improves query completion time since more servers handle the query. However, partitioning too much also has drawbacks: startup costs for each sub-query are not negligible, and will decrease total throughput. Finding the right operating point and adapting to it can significantly improve performance and reduce costs. In this thesis we propose that the tradeoff between partitioning and replication should be easily configurable. To this end we introduce Rendezvous On a Ring (ROAR), a novel distributed algorithm that enables on-the-fly re-configuration of the partitioning level. ROAR can add and remove servers without stopping the system, cope with server failures, and provide good load-balancing even with a heterogeneous server pool. We experimentally show that it is possible to dynamically adjust the partitioning level to cope with different loads while meeting target query delays, and in doing so the system can reduce its power consumption significantly. To test ROAR we introduce Privacy Preserving Search: a particular search application that allows users to store encrypted data online while being able to easily search that data. Our contributions include novel protocols that allow PPS for numeric values, as well as a proof of concept implementation of PPS running on top of ROAR and allowing users to match as many as 5 million files in well under 1s.
|Title:||ROAR: increasing the flexibility and performance of distributed search|
|Open access status:||An open access version is available from UCL Discovery|
Archive Staff Only