Skip to content

Support NBDE (Network-Bound Disk Encryption) eg: Clevis + Tang #9097

@nazarewk

Description

@nazarewk

Feature Request

as a follow-up from now closed issues: #2786 & #2785

Description

I would love to see automated network-presence unlocking of drives using Clevis with Tang server (see below).
It would allow to automatically & securely unlock the system as long as it stays connected to the desired network.

The whole concept is really simple:

  1. the Client (clevis) uses Servers (Tang) to convert a secret into a JWE token decryptable by presence of X out of Y servers
  2. the Client retrieves secret of of JWE token using Servers
  3. the Client uses secret to unlock the LUKS volume as either password or key file

It would allow me to set up following:

  1. put a single secure Tang server into my network, all Talos boxes would use only this server
  2. deploy Tang into Kubernetes and use it to easily unlock each Talos box as long as any of those are running
  3. mix above: 1x Tang and 3x Talos boxes unlock the disks using any 2 out of 4 servers

Activity

changed the title [-]Support NBDE (Network-Bound Disk Encryption)[/-] [+]Support NBDE (Network-Bound Disk Encryption) eg: Clevis + Tang[/+] on Aug 2, 2024
smira

smira commented on Aug 2, 2024

@smira
Member

Talos supports its own protocol for KMS encryption: https://www.talos.dev/v1.7/talos-guides/configuration/disk-encryption/

The gRPC proto: https://github.com/siderolabs/kms-client/blob/main/api/kms/kms.proto

So you can build your own proxy which talks to the KMS of your choice (e.g. Clevis/Tang).

TrooperT

TrooperT commented on Aug 8, 2024

@TrooperT

@smira , I haven't looked over the .proto you linked to yet, but how much work would you estimate it would be to changeover/ or ADD support for KMIP?

smira

smira commented on Aug 8, 2024

@smira
Member

We don't envision right now supporting tons of different disk encryption KMS APIs in the operating system itself. KMIP is one of the worst across the board in terms of added complexity.

TrooperT

TrooperT commented on Aug 9, 2024

@TrooperT

Fair enough

stevefan1999-personal

stevefan1999-personal commented on Sep 13, 2024

@stevefan1999-personal

Or better yet, we can leverage dropbear and define a static list of public keys that allows further LUKS unlocking.
On initramfs, only the dropbear will listen and you can SSH in to unlock the root filesystem, which continues the Talos booting processing from then on.

No further SSH interaction after cryptroot-unlock because the kernel init is transferred to /init of Talos from that point and you can only interact with Talos API.

It is also possible to HTTP GET a manifest of public keys and set your own static CA to use your own HTTPS connection for that private web of trust, but this is very complicated and not recommended.

This scheme is far simpler, without having to rely on too much third-party components (I don't see any high-profile security review for Clevis + Tang setup? I bet dropbear has way more), and it also provides the highest compatibility to everyday programmers, since we all have to use SSH anyway. The downside to this is that it is not really an industry standard and some people (especially upper managements) maybe skeptical of this. To that I recommend russh if you want to truly have the safety you want with Rust, or stay with gliderlabs/ssh in Golang.

everflux

everflux commented on Sep 22, 2024

@everflux

I really like the dropbear proposal. While not for every usecase it would allow me to run talos in less trusted environments (dedicated servers hosted by 3rd party) with a reasonable expectation of security.

github-actions

github-actions commented on Mar 22, 2025

@github-actions

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

stevefan1999-personal

stevefan1999-personal commented on Mar 22, 2025

@stevefan1999-personal

no-stale

vicaya

vicaya commented on May 13, 2025

@vicaya

The current kms protocol is flawed (see #10871), which the tang server protocol solves elegantly, i.e. no cred is ever stored in clear text per node. clevis also support more pin scheme like "Shamir Secret Sharing", which could require multiple tang servers to unlock the disk.

vicaya

vicaya commented on May 13, 2025

@vicaya

NBDE is the main thing (the other would be cloudflared integration) that's missing from Talos to replace my ubuntu server autoinstall.. I very much like the philosophy of Talos and would rather use it if it can support my use cases (rented (virtual) servers), where you cannot trust your providers to handle data at rest properly.

BTW, TPM doesn't work for cloud environment as vTPM is just a plaintext file on the host.

nikogura

nikogura commented on May 20, 2025

@nikogura

The big problem that I see in the KMS system is the whole setup hinges on 1 or 2 pieces of information: The UUID generated by Talos (I haven't found a way to set it), and the metadata around the gRPC call such as IP address, and maybe geolocation.

Presumably, the UUID is written in the META partition after generation (this is an assumption). If that's true, then stealing the server or stealing the disk means you have the UUID and can query the KMS server - whose address is also on the the META partition.

I can somewhat restrict my KMS server to whitelist the IP, but even then, if you have my disk, and control of that IP, you can talk to my KMS server and decrypt my disk.

What we need is the ability for a Talos node, when STATE is encrypted, to hold in maintenance mode until an admin supplies a 'secret', which, combined with the UUID of the server and other metadata from the gRPC call (like IP address), would allow it to authenticate to the KMS server.

This same mechanism could be used to fulfill #10869 and solve the issues in #10871 as well as resolving this ticket.

vicaya

vicaya commented on May 21, 2025

@vicaya

What we need is the ability for a Talos node, when STATE is encrypted, to hold in maintenance mode

You can already do that via IPMI/remote console access for any OS to supply the secret. The entire point of having a KMS is that admin doesn't have to access each node to supply a secret, and that you would have a centralized audit trail of disk access. Data at rest encryption mostly solves the problem when you need to recycle disks/volumes without having to meticulously shred/wipe them.

To shrink the attack window further, one can have node/cluster specific KMS/tang URLs that are only active during reboots and you can use counters to detect any unauthorized access.

With clevis luks bind (or equivalent), you can use the sss pin to require both TPM and NBDE to further minimize the attack surface.

nikogura

nikogura commented on May 21, 2025

@nikogura

You can already do that via IPMI/remote console access for any OS to supply the secret.

Really? That's not in the docs anywhere that I've been able to find.

That pretty much removes the need for #10869.

Can you share an example or a screenshot?

nikogura

nikogura commented on May 21, 2025

@nikogura

To shrink the attack window further, one can have node/cluster specific KMS/tang URLs that are only active during reboots and you can use counters to detect any unauthorized access.

My KMS plan is to do more or less this - Require operator locking/unlocking of the KMS to allow new nodes to join or decrypt. In my case, I'd rather have a node stay offline until an operator can get to it than risk allowing anyone with the Node ID to be able to attempt to decrypt the key. Safer is better than sorry in my case.

github-actions

github-actions commented on Nov 17, 2025

@github-actions

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

stevefan1999-personal

stevefan1999-personal commented on Nov 17, 2025

@stevefan1999-personal

no-stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @vicaya@smira@everflux@TrooperT@nazarewk

        Issue actions

          Support NBDE (Network-Bound Disk Encryption) eg: Clevis + Tang · Issue #9097 · siderolabs/talos