UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach

Jackson, J; Mitra, R; Francis, B; Dove, I; (2022) On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach. In: Privacy in Statistical Databases. PSD 2022. (pp. pp. 205-219). Springer International Publishing: Cham, Switzerland. Green open access

[thumbnail of PSD_2022_Revised_Jackson_et_al.pdf]
Preview
Text
PSD_2022_Revised_Jackson_et_al.pdf - Accepted Version

Download (831kB) | Preview

Abstract

The synthesis mechanism given in [4] uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While [4] considered the case of generating m=1 data set, this paper considers generating m>1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d), that are suited to m>1 data sets; and also considers the more general issue of how best to analyse m>1 categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.

Type: Proceedings paper
Title: On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach
Event: International Conference on Privacy in Statistical Databases - PSD 2022
ISBN-13: 9783031139444
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/978-3-031-13945-1_15
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Synthetic data, privacy, categorical data, risk metrics, contingency tables
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10159958
Downloads since deposit
11Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item