- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 747
Open
Description
Feature Request
as a follow-up from now closed issues: #2786 & #2785
Description
I would love to see automated network-presence unlocking of drives using Clevis with Tang server (see below).
It would allow to automatically & securely unlock the system as long as it stays connected to the desired network.
The whole concept is really simple:
- the Client (clevis) uses Servers (Tang) to convert a
secretinto a JWE token decryptable by presence of X out of Y servers - the Client retrieves
secretof of JWE token using Servers - the Client uses
secretto unlock the LUKS volume as either password or key file
It would allow me to set up following:
- put a single secure Tang server into my network, all Talos boxes would use only this server
- deploy Tang into Kubernetes and use it to easily unlock each Talos box as long as any of those are running
- mix above: 1x Tang and 3x Talos boxes unlock the disks using any 2 out of 4 servers
Activity
[-]Support NBDE (Network-Bound Disk Encryption)[/-][+]Support NBDE (Network-Bound Disk Encryption) eg: Clevis + Tang[/+]smira commentedon Aug 2, 2024
Talos supports its own protocol for KMS encryption: https://www.talos.dev/v1.7/talos-guides/configuration/disk-encryption/
The gRPC proto: https://github.com/siderolabs/kms-client/blob/main/api/kms/kms.proto
So you can build your own proxy which talks to the KMS of your choice (e.g. Clevis/Tang).
TrooperT commentedon Aug 8, 2024
@smira , I haven't looked over the .proto you linked to yet, but how much work would you estimate it would be to changeover/ or ADD support for KMIP?
smira commentedon Aug 8, 2024
We don't envision right now supporting tons of different disk encryption KMS APIs in the operating system itself. KMIP is one of the worst across the board in terms of added complexity.
TrooperT commentedon Aug 9, 2024
Fair enough
stevefan1999-personal commentedon Sep 13, 2024
Or better yet, we can leverage dropbear and define a static list of public keys that allows further LUKS unlocking.
On initramfs, only the dropbear will listen and you can SSH in to unlock the root filesystem, which continues the Talos booting processing from then on.
No further SSH interaction after
cryptroot-unlockbecause the kernel init is transferred to/initof Talos from that point and you can only interact with Talos API.It is also possible to HTTP GET a manifest of public keys and set your own static CA to use your own HTTPS connection for that private web of trust, but this is very complicated and not recommended.
This scheme is far simpler, without having to rely on too much third-party components (I don't see any high-profile security review for Clevis + Tang setup? I bet dropbear has way more), and it also provides the highest compatibility to everyday programmers, since we all have to use SSH anyway. The downside to this is that it is not really an industry standard and some people (especially upper managements) maybe skeptical of this. To that I recommend russh if you want to truly have the safety you want with Rust, or stay with gliderlabs/ssh in Golang.
everflux commentedon Sep 22, 2024
I really like the dropbear proposal. While not for every usecase it would allow me to run talos in less trusted environments (dedicated servers hosted by 3rd party) with a reasonable expectation of security.
github-actions commentedon Mar 22, 2025
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.
stevefan1999-personal commentedon Mar 22, 2025
no-stale
vicaya commentedon May 13, 2025
The current kms protocol is flawed (see #10871), which the tang server protocol solves elegantly, i.e. no cred is ever stored in clear text per node. clevis also support more pin scheme like "Shamir Secret Sharing", which could require multiple tang servers to unlock the disk.
vicaya commentedon May 13, 2025
NBDE is the main thing (the other would be cloudflared integration) that's missing from Talos to replace my ubuntu server autoinstall.. I very much like the philosophy of Talos and would rather use it if it can support my use cases (rented (virtual) servers), where you cannot trust your providers to handle data at rest properly.
BTW, TPM doesn't work for cloud environment as vTPM is just a plaintext file on the host.
nikogura commentedon May 20, 2025
The big problem that I see in the KMS system is the whole setup hinges on 1 or 2 pieces of information: The UUID generated by Talos (I haven't found a way to set it), and the metadata around the gRPC call such as IP address, and maybe geolocation.
Presumably, the UUID is written in the META partition after generation (this is an assumption). If that's true, then stealing the server or stealing the disk means you have the UUID and can query the KMS server - whose address is also on the the META partition.
I can somewhat restrict my KMS server to whitelist the IP, but even then, if you have my disk, and control of that IP, you can talk to my KMS server and decrypt my disk.
What we need is the ability for a Talos node, when STATE is encrypted, to hold in maintenance mode until an admin supplies a 'secret', which, combined with the UUID of the server and other metadata from the gRPC call (like IP address), would allow it to authenticate to the KMS server.
This same mechanism could be used to fulfill #10869 and solve the issues in #10871 as well as resolving this ticket.
vicaya commentedon May 21, 2025
You can already do that via IPMI/remote console access for any OS to supply the secret. The entire point of having a KMS is that admin doesn't have to access each node to supply a secret, and that you would have a centralized audit trail of disk access. Data at rest encryption mostly solves the problem when you need to recycle disks/volumes without having to meticulously shred/wipe them.
To shrink the attack window further, one can have node/cluster specific KMS/tang URLs that are only active during reboots and you can use counters to detect any unauthorized access.
With
clevis luks bind(or equivalent), you can use the sss pin to require both TPM and NBDE to further minimize the attack surface.nikogura commentedon May 21, 2025
Really? That's not in the docs anywhere that I've been able to find.
That pretty much removes the need for #10869.
Can you share an example or a screenshot?
nikogura commentedon May 21, 2025
My KMS plan is to do more or less this - Require operator locking/unlocking of the KMS to allow new nodes to join or decrypt. In my case, I'd rather have a node stay offline until an operator can get to it than risk allowing anyone with the Node ID to be able to attempt to decrypt the key. Safer is better than sorry in my case.
github-actions commentedon Nov 17, 2025
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.
stevefan1999-personal commentedon Nov 17, 2025
no-stale