Hadal: A Framework for Homomorphic Encryption in TensorFlow

Computing on encrypted data has always been one of cryptography’s most tantalizing promises. Homomorphic Encryption (HE) makes this possible, but building practical systems with HE remains challenging. Traditional HE frameworks take a compiler-centric approach, translating high-level operations down to low-level circuits. While this works, it comes at a cost: you lose domain-level semantic information, making it difficult to debug, profile performance, or identify bottlenecks. Meanwhile, existing machine learning frameworks don’t integrate well with cryptographic primitives.

We built Hadal to bridge this gap. Named after the Hadal zone—the deepest region of the ocean where no light penetrates—Hadal enables computation in darkness, where data remains encrypted throughout processing. The framework consists of two components: hadal-flow, a general-purpose framework for encrypted computation built on TensorFlow, and hadal-ml, which implements specific protocols for privacy-preserving machine learning.

Hadal makes HE accessible to ML practitioners without requiring deep cryptography expertise. More importantly, it brings performance profiling and optimization tools from the ML world into the cryptography domain, enabling a new kind of co-design between systems and cryptographic protocols.

Hadal is published at IEEE Symposium on Security and Privacy (S&P) 2026 (ePrint) and available on GitHub under the Apache 2.0 license https://github.com/google/hadal-flow

Framework-Cryptography Co-Design

Most HE frameworks follow a familiar pattern: take a high-level computation, compile it down to a circuit of boolean or arithmetic gates, and execute that circuit on encrypted data. This compiler-centric approach loses important information along the way. Once your computation is reduced to low-level gates, it’s hard to understand which parts of your protocol are slow, why they’re slow, or how to make them faster.

Hadal takes a different approach. Instead of building yet another compiler, we tightly integrate HE operations into TensorFlow’s dataflow model. This lets us reuse TensorFlow’s ecosystem of tools—profiling, optimization, distributed execution—for encrypted computation. The result is a framework where protocol design is informed by profiling data, and the framework design is shaped by cryptographic constraints.

This co-design philosophy turned out to be crucial. When we implemented the Postscale protocol for privacy-preserving training (which we’ll discuss later), profiling tools revealed that certain operations should be moved from the encrypted domain to plaintext. Graph optimizations reduced cryptographic overhead through arithmetic transformations. Automated parameter selection enabled systematic hyperparameter tuning. The framework wasn’t just convenient, it was a prerequisite for designing an efficient protocol.

Core Framework Features

Working with Encrypted Data

Let’s start with the basics. Here’s how you encrypt and compute on data with Hadal:

import hadal_flow
import tensorflow as tf

# Set up encryption parameters
context = hadal_flow.create_context64(
    log_n=10,  # Ring degree (2^10 slots per ciphertext)
    main_moduli=[8556589057, 8388812801],  # Ciphertext moduli
    plaintext_modulus=40961,
    scaling_factor=3,
)

# Generate a secret key
secret_key = hadal_flow.create_key64(context)

# Encrypt some data
data = tf.random.uniform([context.num_slots, 2], dtype=tf.float32, maxval=10)
encrypted = hadal_flow.to_encrypted(data, secret_key, context)

# Perform homomorphic operations
result = encrypted * 3 + encrypted  # Multiply by 3, then add original

# Decrypt to see the result
decrypted = hadal_flow.to_tensorflow(result, secret_key)

The API is deliberately simple and Pythonic. Encrypted data is represented as ShellTensor objects that work with standard TensorFlow operations. You can add, subtract, and multiply encrypted values with each other, with plaintexts, or with TensorFlow tensors. The framework automatically tracks metadata like scaling factors and modulus levels, so you don’t have to manage these details manually.

Automatic Parameter Selection

Those encryption parameters we specified above—the moduli, ring degree, scaling factor—are actually quite tricky to choose. They depend on each other in complex ways, and they depend on the depth of your computation, which you don’t know until you’ve built the computation graph. Get them wrong and your computation might overflow, lose precision, or be insecure.

Hadal solves this with automated parameter selection:

@tf.function
def compute(cleartext_a, cleartext_b):
    # Let Hadal choose parameters automatically
    context = hadal_flow.create_autocontext64(
        log2_cleartext_sz=4,  # Max size of inputs
        scaling_factor=1,
        noise_offset_log2=0,  # Extra noise budget
    )
    key = hadal_flow.create_key64(context)

    a = hadal_flow.to_encrypted(cleartext_a, key, context)
    b = hadal_flow.to_shell_plaintext(cleartext_b, context)

    result = a * b
    return hadal_flow.to_tensorflow(result, key)

# Enable graph optimizations
hadal_flow.enable_optimization()

# The first call will trace the function and select parameters
result = compute([1, 2, 3], [4, 5, 6])

When you use create_autocontext64, Hadal statistically estimates the noise growth in your computation by analyzing the TensorFlow graph. It then selects ciphertext moduli and polynomial degree that satisfy security requirements (verified against the Lattice Estimator) while minimizing overhead. This happens in a custom compiler pass we added to Grappler, TensorFlow’s graph compiler.

This automation closes the loop for hyperparameter tuning. You can use Keras Tuner to explore both traditional hyperparameters (learning rate, momentum) and HE-specific parameters (scaling factor, plaintext modulus) in a single unified search. The framework will ensure each configuration is secure and functional.

Dual Execution Modes: Eager and Deferred

ML frameworks evolved to support both eager and deferred execution for good reason. Eager mode is essential for prototyping, testing, and debugging—you write code that executes immediately, just like normal Python. Deferred mode (graph mode) is necessary for optimization and performance—your computation is first traced into a graph, then optimized before execution.

Hadal fully supports both modes through its TensorFlow integration. You can develop and debug with eager execution, then deploy with graph mode for production performance. The code above using @tf.function demonstrates deferred execution—the function is traced once to build a graph, which is then optimized and reused for subsequent calls.

The performance impact is significant. Graph mode enables compiler optimizations like constant folding, common subexpression elimination, and dead code elimination. For HE-based computation, where cryptographic operations are expensive, these optimizations can make a substantial difference. In our benchmarks, deferred execution often provides greater absolute speedup for encrypted computation than for plaintext, because there’s more overhead to optimize away.

HE-Specific Graph Optimizations

Beyond TensorFlow’s standard optimizations, Hadal adds HE-specific transformations. Consider this sequence of operations:

(ciphertext * plaintext1) * plaintext2

Since multiplication is associative, we can rewrite this as:

ciphertext * (plaintext1 * plaintext2)

This trades one expensive ciphertext-plaintext multiplication for one cheap plaintext-plaintext multiplication. It also eliminates a Number Theoretic Transform (NTT), which is implicitly required to encode values for ciphertext operations.

Hadal implements these arithmetic optimizations as custom compiler passes in Grappler. The passes analyze the dataflow graph, identify opportunities to hoist plaintext operations, and rewrite the graph before execution. This only works in deferred mode – another reason why supporting both execution modes matters.

Distributed Execution

Cryptographic protocols often involve multiple parties. Alice might hold a secret key and encrypt data, Bob might compute on that encrypted data, and Alice might decrypt the result. TensorFlow’s device placement model makes expressing these protocols natural:

@tf.function
def secure_protocol(x):
    with tf.device(alice):
        context = hadal_flow.create_autocontext64(
            log2_cleartext_sz=6,
            scaling_factor=1,
        )
        key = hadal_flow.create_key64(context)
        encrypted = hadal_flow.to_encrypted(x, key, context)

    with tf.device(bob):
        # Bob computes on encrypted data
        squared = encrypted * encrypted

    with tf.device(alice):
        # Alice decrypts the result
        return hadal_flow.to_tensorflow(squared, key)

result = secure_protocol(tf.constant([5.0]))

The alice and bob device strings would normally point to different machines in a TensorFlow cluster. For security, you want to ensure operations are only placed on explicitly specified devices, which you can enforce with tf.config.set_soft_device_placement(False). This prevents accidental exposure of cryptographic material. The framework figures out what to compute before sending between parties, and how to send it.

Performance Profiling

Here’s where Hadal really shines. Because HE operations are implemented as TensorFlow ops, they integrate seamlessly with TensorFlow Profiler and TensorBoard. You can visualize your encrypted computation as a dataflow graph, see per-operation timing and memory usage, and identify bottlenecks—all using the same tools you’d use for plaintext ML.

This capability enabled systematic evaluation of computational trade-offs during protocol design. Instead of relying on theoretical estimates of cryptographic overhead, we could profile real implementations with encryption parameters tailored to the specific computation. This data-driven approach revealed opportunities for optimization that wouldn’t have been obvious otherwise.

The profiling tools are not just for performance tuning. They’re also invaluable for debugging. When something goes wrong in an HE computation, it’s often not obvious where or why. Being able to step through a graph, inspect intermediate values (in encrypted form), and see which operations are executing is essential for practical development.

Parallelization

Hadal supports two forms of parallelism. First, individual operations parallelize internally over tensor dimensions using TensorFlow’s thread pool. When you multiply two tensors of encrypted values, those multiplications happen in parallel.

Second, graph-level parallelism allows multiple independent operations to run concurrently. In deferred mode, TensorFlow can identify operations with no data dependencies and execute them simultaneously. This is particularly effective for cryptographic protocols, where you might be computing multiple independent ciphertexts or preparing data for different steps of a protocol.

The benchmarks included in the repository demonstrate these speedups. Computing on a vector of 8 ciphertexts doesn’t take 8 times longer than computing on a single ciphertext, thanks to parallelization.

Technical Architecture

Extension-Based Design

Hadal extends TensorFlow through its Custom Op interface. We implement HE operations as C++ kernels that call into Google’s SHELL library for the underlying BGV cryptographic primitives. These kernels are precompiled and distributed as a Python package, just like TensorFlow’s built-in ops that invoke Eigen for CPU or CUDA for GPU operations.

This design integrates deeply with TensorFlow’s ecosystem. The graph compiler sees HE ops as first-class citizens alongside other TensorFlow operations. Profiling tools can attribute costs to specific ops. The distributed execution system can schedule ops across machines. We’re not building on top of TensorFlow; we’re extending it.

The SHELL library implements the BGV (Brakerski-Gentry-Vaikuntanathan) homomorphic encryption scheme, which is particularly efficient for arithmetic operations. BGV works over polynomial rings and uses residue number system (RNS) representation for efficiency. For the kinds of affine transformations common in ML (and in our privacy-preserving training protocols), BGV is an excellent fit.

Why TensorFlow?

TensorFlow turns out to be remarkably well-suited for cryptographic protocols. The distributed device placement model naturally expresses multi-party computation. The profiling tools help optimize expensive cryptographic operations. The graph-based structure enables analysis and optimization before execution. The pre-compiled op kernel model provides the right level of abstraction—not so low-level that you’re dealing with individual gates, but not so high-level that you lose control.

More specifically, TensorFlow has mature C++ extension interfaces and, crucially, allows custom compiler passes through Grappler. This is essential for automated parameter selection, where we need to analyze the graph, estimate noise growth, and inject generated parameters before execution.

TensorFlow also supports shape inference with dynamic shapes in the first dimension, which is critical for batch-axis packing—an optimization where we pack multiple values into the slots of a single ciphertext. Due to how parameter selection works, the batch size isn’t known until after we’ve analyzed the graph and chosen parameters. JAX, in contrast, requires static shapes at compile time, creating a chicken-and-egg problem: you can’t compile the graph without knowing the shapes, but you can’t know the shapes without analyzing the graph.

Could Hadal be built on PyTorch or JAX? Possibly, but TensorFlow’s architecture makes it significantly easier. The graph compiler extension points, the profiling infrastructure, and the flexible shape handling all contribute to making TensorFlow the right foundation for this work.

Privacy-Preserving Machine Learning with Hadal

While hadal-flow is a general-purpose framework for encrypted computation, we built it alongside a specific application: privacy-preserving machine learning with label differential privacy. The hadal-ml library implements the Postscale protocol, which enables training ML models when features and labels are held by different parties.

The Setting

Imagine a medical lab that has developed sophisticated diagnostic tests (the features) and wants to train a model to predict patient outcomes (the labels). But the outcomes are private to patients. Or consider online advertising, where one party has user attributes and ad impressions (features) while another holds conversion data (labels). In recommendation systems, user preferences might be available to a service provider while individual ratings remain confidential.

This setting—features and model held by one party, labels held by another—is common enough to deserve its own name. We call it Features-and-Model-vs-Labels (FAML) partitioning. The goal is to train a model that is differentially private with respect to the labels, without requiring a trusted third party.

The Postscale Protocol

Traditional differential privacy (DP) for machine learning adds noise to gradients during training. This requires knowing the gradients, which in the FAML setting means computing them securely. The naive approach would use multi-party computation (MPC) to compute the entire backward pass, but this is expensive—one baseline MPC approach requires 1 TB of communication per batch and takes nearly an hour.

Postscale restructures the computation. The key observation is that for many models (those with softmax activations and cross-entropy loss, for instance), the gradient is an affine function of the labels. The feature-holding party can compute a gradient for every possible output class in plaintext, then homomorphically scale them using the encrypted label. A single ciphertext multiplication replaces an entire encrypted backward pass.

The protocol requires only depth-2 multiplication (shallow circuits are more efficient for HE) and avoids expensive ciphertext rotations. For binary classification, the overhead is minimal. DP noise is added before decryption to prevent label recovery.

Here’s where the framework mattered: profiling revealed that computing the per-class gradients (the Jacobian) took only 3% of training time when accelerated with a GPU, while the alternative—clipping gradients under encryption—would have been far more expensive. This data-driven insight shaped the protocol design. Without the profiling tools, we would have been guessing.

Performance

The hadal-ml implementation achieves dramatic improvements over MPC baselines:

99% reduction in training time (from 54 minutes to 33 seconds per batch)
Over 90% reduction in communication (from 1 TB to 8 GB per batch)
Enables training larger models like SqueezeNet and BERT-tiny with practical performance

These aren’t just microbenchmarks. We trained real models on real datasets (MNIST, IMDB sentiment analysis, image classification) with meaningful privacy guarantees (ε ≤ 1). The framework’s automated parameter selection and hyperparameter tuning made it possible to explore the accuracy-performance trade-off systematically.

Getting Started

Installation is straightforward:

pip install hadal_flow

The repository includes several Jupyter notebooks that demonstrate different aspects of the framework:

intro.ipynb - Basic encryption, operations, scaling factors, modulus switching
automatic_parameters_demo.ipynb - Using autocontext for parameter selection
distributed_demo.ipynb - Multi-party protocols with device placement
parallelization_demo.ipynb - Performance impact of eager vs. deferred execution
benchmark.ipynb - Detailed performance measurements

These notebooks are the best way to get hands-on experience with the framework. They show not just how to use the API, but how to think about encrypted computation in the TensorFlow model.

For ML applications, hadal-ml provides Keras-compatible layers for common operations like dense layers, convolutions, and pooling. These are “slot-aware”—they can skip expensive reductions when intermediate sums can be deferred to the plaintext domain, a key optimization for the FAML setting.

Looking Forward

Hadal demonstrates that tight integration between cryptographic primitives and ML frameworks enables new kinds of systems. The profiling and optimization tools that ML practitioners take for granted turn out to be equally valuable for cryptographic protocol design.

There are clear directions for future work. Currently, HE operations run on CPU, but the architecture is designed to support accelerators. The techniques for automated parameter selection and graph optimization could apply to other HE schemes beyond BGV. As PyTorch and JAX mature their compiler infrastructure, similar approaches might become feasible there.

More broadly, we hope Hadal encourages more co-design between cryptography and systems. Cryptographic protocols shouldn’t be designed in isolation from the systems that will implement them. Systems for privacy-preserving computation shouldn’t treat cryptography as a black box. The best results come from designing them together.

Try It Yourself

Hadal is open source and available on GitHub under the Apache 2.0 license: https://github.com/google/hadal-flow

The repository includes:

Complete source code for hadal-flow and hadal-ml
Jupyter notebooks demonstrating all major features
Documentation and API reference
Example implementations of cryptographic protocols

Start with the intro notebook to get a feel for the framework. Experiment with different encryption parameters and see how they affect performance. Try implementing your own protocols using the distributed execution model. Profile your encrypted computations and see where the time goes.

Whether you’re an ML practitioner interested in privacy-preserving computation, a cryptographer looking for practical HE implementations, or a researcher exploring new protocols, Hadal provides the tools to turn ideas into working systems. Computing in darkness doesn’t have to mean working in the dark.