Marginal Value Decomposition of Overlapping Data Sources Under Disclosure Constraints
A cooperative game-theoretic framework for computing each data source's marginal contribution via Shapley values, with threshold cryptographic attestation and privacy-preserving comparison.
When multiple overlapping data sources contribute to a joint outcome, the marginal contribution of each source is unknown — the sources overlap, signals are correlated, and the linking step is constrained by trust, regulation, or consent.
This paper uses the Shapley value to compute each source's marginal contribution, net of cost, and decomposes unrealized value into four actionable components: coverage gaps, signal quality gaps, signal decay, and cost inefficiency. A latent variable generative model unifies the framework: each source observes a noisy projection of a shared latent entity, and the Shapley value reduces to expected posterior variance reduction — connecting the decomposition to Bayesian experimental design and providing closed-form expressions for the Gaussian case.
For environments where linking is further constrained by disclosure requirements, the paper provides two supporting mechanisms: a threshold cryptographic protocol (Contextual Identity Verification) for verifiable multi-party attestation, and a quantized random projection scheme (QJL) for comparing representations without exposing them.
Experimental validation on synthetic networks (10^7 users, 50 parties) confirms tractable Shapley computation at production scale and that the false discovery rate bounds hold under correlated signals.
Request Access to This Paper
Submit your details and we'll follow up with access to the full paper.