UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Manousis, Antonis; Cheng, Zhuo; Ben Basat, Ran; Liu, Zaoxing; Sekar, Vyas; (2022) Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams. In: Özcan, Fatma and Freire, Juliana and Lin, Xuemin, (eds.) Proceedings of the VLDB Endowment. (pp. pp. 3249-3262). Association for Computing Machinery (ACM): New York, NY, USA. Green open access

[thumbnail of 3551793.3551867.pdf]
Preview
Text
3551793.3551867.pdf - Published Version

Download (603kB) | Preview

Abstract

Today’s large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a “sketch of sketches” to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times.

Type: Proceedings paper
Title: Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams
Event: VLDB Endowment
Open access status: An open access version is available from UCL Discovery
DOI: 10.14778/3551793.3551867
Publisher version: https://doi.org/10.14778/3551793.3551867
Language: English
Additional information: This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10183189
Downloads since deposit
3Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item