Towards Learning on Vertically Partitioned Data with Distributed Differential Privacy

Ergute Bao; Fei Wei; Yin Yang; Xiaokui Xiao; Tianyu Pang; Chao Du

doi:10.1109/ICDE65448.2025.00161

Abstract

Analysis of distributed data typically requires the collaboration of the data owners, as well as privacy protection. This paper focuses on the scenario where the database is vertically partitioned onto the data owners (referred to as vertical federated learning or VFL), e.g., an e-commerce platform and an online payment service collaborate to build a model to predict user behavior. To avoid revealing their private data during model fitting, the data owners commonly participate in a cryptographic protocol such as secure multiparty computation. However, the resulting model may still leak sensitive information under sophisticated data extraction attacks. A rigorous solution to this issue is to compute the model with differential privacy (DP), which provides strong and well-accepted privacy guarantees. Enforcing DP on VFL turns out to be highly challenging, and there does not yet exist an effective solution that avoids reliance on any trusted party. Consequently, practitioners are left with rather basic approaches for ensuring DP, e.g., each data owner perturbs her local data with additive noises, leading to suboptimal model utility. Can we achieve privacy-utility trade-offs for VFL with DP comparable to the centralized setting, without trusting any party? In this paper, we make a significant step towards providing a positive answer to this question. We focus on a subset of the data analysis and machine learning tasks-the class of tasks where the sensitive information to release can be expressed as a polynomial function of the input. Following the distributed DP framework that does not require any trusted party, we propose a generic mechanism to solve this class of problems, called the Skellam Quantization Mechanism (SQM). We formally prove the privacy guarantee of our solution, and show that it is able to match the privacy-utility trade-offs in the centralized setting. We then instantiate SQM on two classical tasks, principal component analysis and logistic regression. Extensive experiments on real-world datasets confirm the strong performance of SQM.

Towards Learning on Vertically Partitioned Data with Distributed Differential Privacy

Authors

Abstract

Similar Articles