The Privacy-Enhancing Tech Stack for Enterprise AI — When You Need What
Differential privacy, homomorphic encryption, federated learning, secure enclaves, zero-knowledge proofs. Five tools, very different use cases. A practical buyer's guide for enterprise AI workloads.
The enterprise AI deals that don't happen — the workloads that never make it past legal — almost always die for the same reason. The data the model needs to be useful is the data the organization isn't allowed to expose.
Three years ago that was the end of the conversation. Today it's the beginning of one. The privacy-enhancing technology (PET) stack has matured enough that most "we can't ship this because of the data" objections have an answer. The problem is that the PET stack has five very different tools in it, and picking the wrong one wastes a year.
Here is how to think about which one fits which workload.
The five tools
| Technique | What it does | Typical cost | Maturity |
|---|---|---|---|
| Differential Privacy (DP) | Adds calibrated noise to outputs so individuals can't be re-identified | Low compute, accuracy cost | High — production at Apple, US Census |
| Federated Learning (FL) | Trains a model across distributed datasets without centralizing the data | Moderate compute, high coordination | Medium — production at Google, several banks |
| Secure Enclaves (TEEs) | Hardware-isolated execution; data is encrypted except inside the enclave | Low overhead inside, deployment cost | High — AWS Nitro, Azure Confidential Computing |
| Homomorphic Encryption (FHE) | Computation directly on encrypted data | Very high (100–1000× slower) | Medium — narrow workloads only |
| Zero-Knowledge Proofs (ZKPs) | Prove a property without revealing the underlying data | Moderate, growing fast | Medium — production in blockchains, emerging in enterprise |
These are not interchangeable. They solve different problems with different cost structures. The first failure mode in any PET conversation is treating them as a menu of equivalent options.
When you want differential privacy
Differential privacy fits when you want to release statistics, aggregates, or model outputs from sensitive data, in a form that won't leak any individual's record — even to an adversary with arbitrary auxiliary information.
Use it for:
- Publishing model predictions or aggregate analytics derived from sensitive customer data
- Training models on user behavior where individual re-identification is the threat
- Synthetic data generation with provable individual-record protection
- Telemetry from customer-facing AI features (which prompts users send, which features they engage)
Don't use it for: protecting data in transit or at rest (DP doesn't encrypt; it adds noise). Don't use it when you need exact answers — there is always an accuracy cost, parameterized by the privacy budget ε. Tuning ε is a real and ongoing exercise, not a one-time decision.
Practical note: differential privacy composes. If you make 10 DP queries against the same dataset at ε=1, your effective privacy loss is closer to ε=10 (slightly less with advanced composition). Track the budget across your whole organization, not per-query.
When you want federated learning
Federated learning fits when you have multiple data silos that can't be merged, and you want a single model that benefits from all of them.
Use it for:
- Cross-portfolio model training across PE-owned companies that can't share customer data with each other
- Multi-hospital collaboration on clinical models without moving patient records
- Cross-bank fraud detection without exchanging transaction records
- Multi-tenant enterprise SaaS where each tenant's data trains a shared model but cannot be exposed to other tenants
Don't use it as a privacy band-aid for centralized training that could just as well happen on a single server. The coordination overhead is real and the threat model is subtle — without additional protection, model updates can leak training data. Combine federated learning with differential privacy (DP-SGD on each node) and secure aggregation if the data is genuinely sensitive.
Federated learning is also the wrong answer when the data silos are small or homogeneous. If three of your five sites have nearly-identical data distributions, the gain over centralized training (or pooled data with DP) is marginal.
When you want secure enclaves
Trusted Execution Environments (TEEs, aka secure enclaves) — AWS Nitro Enclaves, Azure Confidential Computing, Intel SGX, AMD SEV — fit when you need to run untrusted code on sensitive data, or expose data to a third party temporarily without giving them durable access.
Use it for:
- Running vendor-supplied AI models on your data where you don't want the vendor to ever see the data
- Multi-party computation where each party contributes data but no party should see the others' contributions
- Compliance-driven workloads where you need attestation that the data was processed by exactly the approved code, on approved hardware, with no exfiltration path
Don't use enclaves as the primary privacy story for very large workloads — they have memory constraints (though Nitro and SEV are much better than SGX) and the deployment surface adds operational complexity. Don't trust enclaves alone for adversaries with physical hardware access; recent side-channel attacks on TEEs are a real (and active) research area.
For enterprise AI, the typical pattern is: TEE for the inference container, DP for the outputs that leave the TEE. The combination is much stronger than either alone.
When you want homomorphic encryption
Fully homomorphic encryption fits almost no enterprise AI workloads today, and recognizing this is worth the cost of this sentence. FHE is genuinely magical — computation on encrypted data with no intermediate decryption — but the performance cost (100× to 1000× slowdown over plaintext) is prohibitive for most ML workloads.
Use it for:
- Single-shot inference on tiny models (logistic regression, small decision trees) over encrypted features
- Encrypted search on small datasets with rare, high-value queries
- Pilot/proof-of-concept work where the workload is small and the cryptographic story is the point
Don't use it for: foundation model inference, anything involving deep neural networks at meaningful scale, or any workload where latency matters. The state of the art is improving rapidly, but a five-year horizon is more realistic than a one-year one for general-purpose FHE in enterprise AI.
The exception worth watching: partially-homomorphic and somewhat-homomorphic schemes that support a fixed set of operations efficiently. For specific narrow workloads — encrypted dot products against an encrypted index, for example — these can be production-viable today.
When you want zero-knowledge proofs
Zero-knowledge proofs fit when you want to prove a property about hidden data, without revealing the data itself.
Use it for:
- Proving a model satisfies fairness constraints without revealing the model weights or training data
- Proving compliance with a policy without exposing the underlying records
- Cross-organization audit trails: proving you ran an approved model on approved data without exposing either
- Capability attestation: an AI agent provider proves performance characteristics without exposing prompts, outputs, or proprietary system internals
Don't use them as a general privacy layer — ZKPs prove specific statements, not arbitrary computation (though zk-SNARKs over general circuits are improving fast). The proving cost can be significant; the verification cost is usually small. For workloads where the verifier is a regulator or counterparty and the prover is your system, that asymmetry works in your favor.
A decision tree for picking the right tool
When a workload is blocked by privacy or compliance concerns, ask in this order:
- Is the issue "individual re-identification in published outputs"? → Differential privacy.
- Is the issue "data lives in silos that can't be merged"? → Federated learning (with DP for the updates).
- Is the issue "we need to expose data to a vendor or untrusted code, briefly"? → Secure enclaves.
- Is the issue "we need to prove a property without exposing data"? → Zero-knowledge proofs.
- Is the issue "we need computation on data that's encrypted at every moment, including during inference"? → Maybe FHE, maybe not. Talk to someone before committing.
Often the right answer is two tools composed. DP outputs from inference running inside a TEE. FL with DP-SGD and secure aggregation. ZKP-attested compliance over FL-trained models. The PET stack is layered, and the layering matters.
The cost of getting it wrong
The cost of picking the wrong PET tool isn't just a project delay. It's the credibility cost of having promised privacy and then having to walk it back six months later, in front of legal, security, and the customer or regulator who's been holding you to the promise.
The good news is that all five tools are now production-grade for their respective use cases. The bad news is that there's no shortcut around understanding which is which. If you're considering a meaningful AI workload that touches sensitive data, the PET fit is part of the architecture decision — not a constraint to apply after.
If you want a tighter scoping conversation about which of these fits a specific workload you're holding back, that's exactly what the Value Thread Audit digs into. The PET fit analysis is a core deliverable, not a side note.