In the previous post I looked at how you build an encrypted image that can maintain its confidentiality inside AMD SEV or Intel TDX. In this post I’ll discuss how you actually bring up a confidential VM from an encrypted image while preserving secrecy. However, first a warning: This post represents the state of the art and includes patches that are certainly not deployed in distributions and may not even be upstream, so if you want to follow along at home you’ll need to patch things like qemu, grub and OVMF. I should also add that, although I’m trying to make everything generic to confidential environments, this post is based on AMD SEV, which is the only confidential encrypted1 environment currently shipping.
The Basics of a Confidential Computing VM
At its base, current confidential computing environments are about using encrypted memory to run the virtual machine and guarding the encryption key so that the owner of the host system (the cloud service provider) can’t get access to it. Both SEV and TDX have the encryption technology inside the main memory controller meaning the L1 cache isn’t encrypted (still vulnerable to cache side channels) and DMA to devices must also be done via unencryped memory. This latter also means that both the BIOS and the Operating System of the guest VM must be enlightened to understand which pages to encrypted and which must not. For this reason, all confidential VM systems use OVMF2 to boot because this contains the necessary enlightening. To a guest, the VM encryption looks identical to full memory encryption on a physical system, so as long as you have a kernel which supports Intel or AMD full memory encryption, it should boot.
Each confidential computing system has a security element which sits between the encrypted VM and the host. In SEV this is an aarch64 processor called the Platform Security Processor (PSP) and in TDX it is an SGX enclave running Intel proprietary code. The job of the PSP is to bootstrap the VM, including encrypting the initial OVMF and inserting the encrypted pages. The security element also includes a validation certificate, which incorporates a Diffie-Hellman (DH) key. Once the guest owner obtains and validates the DH key it can use it to construct a one time ECDH encrypted bundle that can be passed to the security element on bring up. This bundle includes an encryption key which can be used to encrypt secrets for the security element and a validation key which can be used to verify measurements from the security element.
The way QEMU boots a Q35 machine is to set up all the configuration (including a disk device attached to the VM Image) load up the OVMF into rom memory and start the system running. OVMF pulls in the QEMU configuration and constructs the necessary ACPI configuration tables before executing grub and the kernel from the attached storage device. In a confidential VM, the first task is to establish a Guest Owner (the person whose encrypted VM it is) which is usually different from the Host Owner (the person running or controlling the Physical System). Ownership is established by transferring an encrypted bundle to the Secure Element before the VM is constructed.
The next step is for the VMM (QEMU in this case) to ask the secure element to provision the OVMF Firmware. Since the initial OVMF is untrusted, the Guest Owner should ask the Secure Element for an attestation of the memory contents before the VM is started. Since all paths lead through the Host Owner, who is also untrusted, the attestation contains a random nonce to prevent replay and is HMAC’d with a Guest Supplied key from the Launch Bundle. Once the Guest Owner is happy with the VM state, it supplies the Wrapped Key to the secure element (along with the nonce to prevent replay) and the Secure Element unwraps the key and provisions it to the VM where the Guest OS can use it for disc encryption. Finally, the enlightened guest reads the encrypted disk to unencrypted memory using DMA but uses the disk encryptor to decrypt it to encrypted memory, so the contents of the Encrypted VM Image are never visible to the Host Owner.
The Gaps in the System
The most obvious gap is that EFI booting systems don’t go straight from the OVMF firmware to the OS, they have to go via an EFI bootloader (grub, usually) which must be an efi binary on an unencrypted vFAT partition. The second gap is that grub must be modified to pick the disk encryption key out of wherever the Secure Element has stashed it. The third is that the key is currently stashed in VM memory before OVMF starts, so OVMF must know not to use or corrupt the memory. A fourth problem is that the current recommended way of booting OVMF has a flash drive for persistent variable storage which is under the control of the host owner and which isn’t part of the initial measurement.
Plugging The Gaps: OVMF
To deal with the problems in reverse order: the variable issue can be solved simply by not having a persistent variable store, since any mutable configuration information could be used to subvert the boot and leak the secret. This is achieved by stripping all the mutable variable handling out of OVMF. Solving key stashing simply means getting OVMF to set aside a page for a secret area and having QEMU recognise where it is for the secret injection. It turns out AMD were already working on a QEMU configuration table at a known location by the Reset Vector in OVMF, so the secret area is added as one of these entries. Once this is done, QEMU can retrieve the injection location from the OVMF binary so it doesn’t have to be specified in the QEMU Machine Protocol (QMP) command. Finally OVMF can protect the secret and package it up as an EFI configuration table for later collection by the bootloader.
The final OVMF change (which is in the same patch set) is to pull grub inside a Firmware Volume and execute it directly. This certainly isn’t the only possible solution to the problem (adding secure boot or an encrypted filesystem were other possibilities) but it is the simplest solution that gives a verifiable component that can be invariant across arbitrary encrypted boots (so the same OVMF can be used to execute any encrypted VM securely). This latter is important because traditionally OVMF is supplied by the host owner rather than being part of the VM image supplied by the guest owner. The grub script that runs from the combined volume must still be trusted to either decrypt the root or reboot to avoid leaking the key. Although the host owner still supplies the combined OVMF, the measurement assures the guest owner of its correctness, which is why having a fairly invariant component is a good idea … so the guest owner doesn’t have potentially thousands of different measurements for approved firmware.
Plugging the Gaps: QEMU
The modifications to QEMU are fairly simple, it just needs to scan the OVMF file to determine the location for the injected secret and inject it correctly using a QMP command.. Since secret injection is already upstream, this is a simple find and make the location optional patch set.
Plugging the Gaps: Grub
Grub today only allows for the manual input of the cryptodisk password. However, in the cloud we can’t do it this way because there’s no guarantee of a secure tty channel to the VM. The solution, therefore, is to modify grub so that the cryptodisk can use secrets from a provider, in addition to the manual input. We then add a provider that can read the efi configuration tables and extract the secret table if it exists. The current incarnation of the proposed patch set is here and it allows cryptodisk to extract a secret from an efisecret provider. Note this isn’t quite the same as the form expected by the upstream OVMF patch in its grub.cfg because now the provider has to be named on the cryptodisk command line thus
cryptodisk -s efisecret
but in all other aspects, Grub/grub.cfg works. I also discovered several other deviations from the initial grub.cfg (like Fedora uses /boot/grub2 instead of /boot/grub like everyone else) so the current incarnation of grub.cfg is here. I’ll update it as it changes.
Putting it All Together
Once you have applied all the above patches and built your version of OVMF with grub inside, you’re ready to do a confidential computing encrypted boot. However, you still need to verify the measurement and inject the encrypted secret. As I said before, this isn’t easy because, due to replay defeat requirements, the secret bundle must be constructed on the fly for each VM boot. From this point on I’m going to be using only AMD SEV as the example because the Intel hardware doesn’t yet exist and AMD kindly gave IBM research a box to play with (Anyone with a new EPYC 7xx1 or 7xx2 based workstation can likely play along at home, but check here). The first thing you need to do is construct a launch bundle. AMD has a tool called sev-tool to do this for you and the first thing you need to do is obtain the platform Diffie Hellman certificate (pdh.cert). The tool will extract this for you
sevtool --pdh_cert_export
Or it can be given to you by the cloud service provider (in this latter case you’ll want to verify the provenance using sevtool –validate_cert_chain, which contacts the AMD site to verify all the details). Once you have a trusted pdh.cert, you can use this to generate your own guest owner DH cert (godh.cert) which should be used only one time to give a semblance of ECDHE. godh.cert is used with pdh.cert to derive an encryption key for the launch bundle. You can generate this with
sevtool --generate_launch_blob <policy>
The gory details of policy are in the SEV manual chapter 3, but most guests use 1 which means no debugging. This command will generate the godh.cert, the launch_blob.bin and a tmp_tk.bin file which you must save and keep secure because it contains the Transport Encryption and Integrity Keys (TEK and TIK) which will be used to encrypt the secret. Figuring out the qemu command line options needed to launch and pause a SEV guest is a bit of a palaver, so here is mine. You’ll likely need to change things, like the QMP port and the location of your OVMF build and the launch secret.
Finally you need to get the launch measure from QMP, verify it against the sha256sum of OVMF.fd and create the secret bundle with the correct GUID headers. Since this is really fiddly to do with sevtool, I wrote this python script3 to do it all (note it requires qmp.py from the qemu git repository). You execute it as
sevsecret.py --passwd <disk passwd> --tiktek-file <location of tmp_tk.bin> --ovmf-hash <hash> --socket <qmp socket>
And it will verify the launch measure and encrypt the secret for the VM if the measure is correct and start the VM. If you got everything correct the VM will simply boot up without asking for a password (if you inject the wrong secret, it will still ask). And there you have it: you’ve booted up a confidential VM from an encrypted image file. If you’re like me, you’ll also want to fire up gdb on the qemu process just to show that the entire memory of the VM is encrypted …
Conclusions and Caveats
The above script should allow you to boot an encrypted VM anywhere: locally or in the cloud, provided you can access the QMP port (most clouds use libvirt which introduces yet another additional layering pain). The biggest drawback, if you refer to the diagram, is the yellow box: you must trust the secret element, which in both Intel and AMD is proprietary4, in order to get confidential computing to work. Although there is hope that in future the secret element could be fully open source, it isn’t today.
The next annoyance is that launching a confidential VM is high touch requiring collaboration from both the guest owner and the host owner (due to the anti-replay nonce). For a single launch, this is a minor annoyance but for an autoscaling (launch VMs as needed) platform it becomes a major headache. The solution seems to be to have some Hardware Security Module (HSM), like the cloud uses today to store encryption keys securely, and have it understand how to measure and launch encrypted VMs on behalf of the guest owner.
The final conclusion to remember is that confidentiality is not security: your VM is as exploitable inside a confidential encrypted VM as it was outside. In many ways confidentiality and security are opposites, in that security in part requires reducing the trusted code and confidentiality requires pulling as much as possible inside. Confidential VMs do have an answer to the Cloud trust problem since the enterprise can now deploy VMs without fear of tampering by the cloud provider, but those VMs are as insecure in the cloud as they were in the Enterprise Data Centre. All of this argues that Confidential Computing, while an important milestone, is only one step on the journey to cloud security.
Patch Status
The OVMF patches are upstream (including modifications requested by Intel for TDX). The QEMU and grub patch sets are still on the lists.
- Encrypted memory makes it a distinct class from Trusted Execution Environments like TXT and TrustZone
- Intel is currently calling its OVMF TDVF for some reason
- This was before I found out how excruciatingly difficult to use and mutable python cryptography is, but I’ve done it now, and I didn’t want to rewrite it all in perl
- In Intel it’s an SGX enclave containing Intel Proprietary code and in AMD it’s the firmware of the aarch64 platform security processor
Very, very interesting! I was led here by LWN’s related article. https://lwn.net/Articles/838488/
It appears that this scheme for encrypting a VM will allow VM/guest owners to depend very little on the trustworthiness of their cloud/hosting provider. At the same time, it will magnify and centralizes the trust required of the CPU/SoC manufacturers.
The CPU manufacturers must be trusted to implement the secure elements without backdoors or bugs, and to store the validation certificates’ private keys securely.
This centralization seems ripe for coercion/co-opting/physical analysis by governments. Have AMD or Intel announced any precautions that they’re taking to limit this?
, and in a way that’s immune to physical tampering.
Nice Article, James
But one thing is not entirely clear to me when you write about the need to use a one-time godh cert.
Why not use the following way:
1. The GO takes OVMF and the public key of AMD-PSP, then calculates the hash of OVMF and encrypts the luks key of encrypted disk using the public key of AMD-PSP.
2. GO Sends the OVMF, OVMF hash, encrypted (using the AMD-PSP public key) luks key, and the luks encrypted disk itself to PO.
3. PO starts the VM using the OVMF + the luks encrypted disk, and also PO provide the luks encrypted key (received from the GO and encrypted by AMD publickey) and the hash of OVMF (also received from the GO) to qemu,
it can be achieved by a qemu startup option or using qmp.
5. The PSP checks OVMF hash against the hash that was specified when the VM was started (received from the GO), and if it matches, then the PSP will check if debugging is enabled for this VM (or any other options/policies that allow to disable memory encryption) and if everything is in fine, then it decrypts the luks key using the private AMD-PSP key and injects it into memory, where it can be read by the OVMF.
Further, the OVMF/grub can decrypt the encrypted disk and run VM.
This whole procedure seems safe to me.
As a result, we get that the luks encrypted key can be safely stored on the PO server and there is no need to generate godh cert and provide it from GO to PO each time upon reboot.
Where is the mistake in my reasoning?
When you say “encrypts the luks key of encrypted disk using the public key of AMD-PSP” what actually happens is that the AMD-PSP has a Platform Diffie Hellman certificate PDH. This, on its own, isn’t an encryption key. The Guest Owner supplies their own public key, the GODH and then the GO and the PSP derive an encryption key using ECDH for the launch bundle from both Diffie Hellman certificates, so generating a GODH is a requirement of setting up the initial encrypted session with the PSP.
You can, of course, use the same GODH every time you launch a VM, but that would mean the same encryption key would be derived each time. It’s somewhat better security practice to use a different encryption key for each launch (so some accidental compromise of the key for one launch doesn’t compromise all other launches) which means a different GODH for each launch since the PDH remains the same.
In advance I want to note that I want to discuss only the issue of injecting luks key, as a secret data, into VM. Any other type of keys may require a more careful approach
Please correct me if I’m wrong about something.
On the GO side, encryption key compromising problem can be solved by deleting the encryption key immediately after the blob, godh cert and lunch secret header/data are generated.
If we talk about the problem of key leakage can occur on the PSP side, then it doesn’t matter if someone can decrypt the secret data of previous launches or not, because the secrets for both the current run and the previous ones contain the same luks key.
Anyway, as far as I understand, even if the godh is left the same, then the lunch secret should still be generated on every VMs boot, since it using PSP’s nonce,
thus, currently, there is no way to inject the luks key into VM, without interacting with the GO side.
Ah, no this is the difference between RSA and EC keys: with RSA you choose a random key and encrypt it to the cert public key, so you can delete this random key. With EC you have to generate a new EC public/private key (the GODH) and then derive the encryption key from it and the public EC key of the other party using Diffie-Hellman. So while you have the GODH you can always re-derive the encryption key, you can’t delete it … that’s why you have to throw away the GODH instead.
Well, okay, in this case we can delete the secret key + godh private key, after generating the blob, godh cert and launch secret header/data.
Is this a suitable solution ?
1. If OVMF is provided by the host, this may not be FIPS certifiable since the key is traversing the module boundary. This is a problem I have yet to see addressed in any proposal.
2. OVMF cannot pass the key to any code whose measurement isn’t validated. This includes both grub and the kernel. In current systems, both grub and the kernel reside on /boot which has neither encryption nor integrity protection and are thus unmeasured.
3. Assuming that somehow we can validate grub’s measurement, grub is modular and can load additional functionality from the disk which, you guessed it, has no integrity protection. These modules execute with the same privileges to the key as core grub.
4. Grub can theoretically load the kernel from an encrypted and integrity-protected volume. However, LUKS has no integrity protection and LUKS2 has only optional integrity protection. Therefore, grub must refuse to hand over the key to any kernel which is loaded from a volume without integrity protection.
5. All of this is just reimplementing kernel functionality earlier in the boot process. Why not focus instead on having the VMM load the guest kernel directly? The guest kernel is thus included in the measurement provided to the VM owner. Further, the key never needs to be passed anywhere else.
6. This design seems, to me, highly optimized AMD SEV and SEV-ES. In a post-attestation world (SEV-SNP, TDX, etc) it is likely that attestation will be desirable from userspace as a syscall (or ioctl, etc). If you booted Linux directly and had attestation available from userspace, all of the disk unlocking could be implemented in initramfs in userspace.
Thoughts?
1. FIPS isn’t really what I was aiming for and NIST is still mulling trusted execution envrionments, so the whole thing is a bit in flux. The aim of this proposal was to produce an OVMF that could be delivered by a distribution. Since it would then have a standard measurement, that could alleviate the who provides it problem
2. I think you missed the part of the proposal about combining OVMF and Grub? OVMF passes the key only to grub.efi which is measured because it’s inside OVMF in a Firmware Volume. In this proposal grub is partly inside OVMF but /boot is on the encrypted image so the grub modules and the kernel are encrypted but not necessarily integrity protected
4. This is a case of using what’s currently there: all distributions are shipping grub 2.04 which only has luks1. When they ship grub 2.06 which has luks2 then you’ll get integrity protected as well as encrypted images
5. It is possible for OVMF to do direct execution of a signed kernel from an unencrypted location but then the public key has to be built into OVMF making for a delivery problem since OVMF is currently host supplied.
6. Well this is sort of the difference between secure boot and measured boot. The former is guarantee up front and the latter is detect after the fact. In the end, both will be desirable but for now the former is easier.
1. I don’t disagree. But in general, OVMF provided by the host is a backdoor. And all the major cloud vendors won’t use the distro OVMF. It is a problem that we have to consider.
2. I think I glossed over your meaning. Embedding grub just increases the size of the host-provided code; which IMHO, should be zero. It also won’t work with modules, since I don’t believe grub has a stable module ABI.
4. I think this misses my point. Let me argue it from a different angle.
Crypto libraries used to provide a bundle of ciphers that consumers could combine however they wanted. Certain combinations of these were secure while others weren’t. Nobody builds things like that anymore. New implementations focus only on combinations that are known to not only work but provide high degrees of security. So while you’ll get AES-GCM and ChaCha20-Poly1305, you’ll never see AES-Poly1305.
My point is that if grub is operating in a mode where it is handling this key material, it MUST NOT allow boot on luks1 or luks2 if there is no integrity protection. We know that mode of operation is not secure and we must preclude its use.
5. That’s not what I’m suggesting. I’m suggesting ditching OVMF and grub entirely. Put the kernel/initramfs on /boot without encrypt or integrity protection. The VMM can read the kernel/initramfs from the image directly and boot it without OVMF or grub. Nearly all the VMMs, including QEMU, can do something close to this already.
Under this mode, the kernel/initramfs is part of the initial measurement. No signature is necessary. All code for handling the secret lives inside the Linux kernel. It is never passed anywhere else. Everything else lives inside luks2 w/ integrity protection.
Importantly, this solves the “no code in the guest not provided by the VM owner” problem.
6. I agree with part of this. I think that, in the end, only the latter is desirable. The former is only tempting because there is some hardware available today.
7. I forgot to mention. Would you be willing to make this process work with https://github.com/enarx/sevctl instead of sevtool? I think we provide most of what you need and I’d be interested to know what we’re missing.
Heh, I wish reply inline worked on blogs …
1. OK
2. the ABI seems stable enough that I can boot RH with an ubuntu grub, so it does seem to work in spite of the lack of a stability guarantee
4. I do understand why integrity is necessary. It’s just that in the current short term we’re hampered by only having encrypted boot with luks1. When grub moves to 2.06 we get integrity with this method.
5. I understand, but the problem is the -kernel/-cmdline/-initrd options of qemu don’t work with encrypted memory today, so OVMF pulls the binaries through unencrypted and unattested. I think it might be possible to modify the qemu elf loader and OVMF to make them passed via encryption and attestation, but it would be quite a large engineering effort. The other issue is the pebkac one: if builders of encrypted images have to put everything inside the encrypted partition its an easy rule. If we start doing the image partially encrypted partially unencrypted mistakes get easier to make … particularly as we can’t have the grub modules outside the encryption envelope so it would just be the kernel, initrd and command line in the unencrypted partition, which is a deviation from how images work today.
7. Sure … although you understand I only use AMDESE/sev-tool for the launch blob and keys. I did my own python script for measurement and secret injection because using the tool is so cumbersome.
2. Works != supportable. But I don’t think you’re arguing otherwise. So I think we probably agree.
4. It is important to note that we don’t get integrity for free. It has to be enabled. And we need to validate its enablement. My major worry here isn’t avoiding incremental improvement. It is giving people a false sense of security. It is *very common* for people to hear “encrypted” and think “integrity protected.”
Perhaps a good way to approach this is to enforce the same protection profile as the underlying technology. That is, SEV and SEV-ES would allow encrypted disks without integrity but SEV-SNP would only allow luks2 with integrity. Thoughts?
5. I care a lot about pebkac problems in security. However, in this case the common expectation is that / is encrypted and /boot is not. I think, if anything, my proposal reinforces the status quo more than anything else. Regardless, it might be a lot of engineering, but we have to evaluate the host/guest interface as part of a transition to confidential VMs. We simply can’t avoid it except to our peril.
7. I’d love to have both sevtool and your python script integrated into sevctl. IMHO, sevctl should be the go to usable tool for SEV. I’m happy to work with you on that.
8.
2. Yes let’s agree here: at worst there’d be one OVMF per distro
4. I can support that. Integrity primarily protects against substitution attacks so there doesn’t seem a lot of point in having it in the image file if the underlying memory encryption doesn’t support it because you simply delay your attack until the image is realised in memory.
5. The main problem I see is simply getting qemu to use /boot properly. Using an image file with -kernel/-cmdline/-initrd isn’t supported by libvirt which means it’s not done in most clouds, so this boot directly from unencrypted /boot via these options is highly non standard.
7. Sure. What I need is something that builds the launch bundle, which sevctl doesn’t currently seem to do. After that the python script in the article can take over, but there’s no reason it can’t be integrated into your sevctl
8. There looks to be a SEV specific problem with adding encryption to -kernel/-cmdline/-initrd. I’ll take that one to email
I think we’ve narrowed down to the core question: how to boot this thing.
QEMU doesn’t need support for extrapolating from the image directly. The host already can do a loopback mount, find /boot and mount it. The point is that we’re using the host’s filesystem drivers rather than having filesystem drivers in both OVMF and grub. Because, at its core, that is all a bootloader is: a collection of filesystem drivers to find and load the kernel.
Since the host can read the kernel directly from the image file with no changes to qemu and since the kernel is measured and everything else is encrypted, this shouldn’t be a huge engineering effort if -kernel/-cmdline/-initrd work and are measured.
Thank you for this article! There is not much information to be found online about running encrypted VMs using the AMD SEV technology, let alone making use of remote attestation and its secret injection functionality.
I was looking at your python script to inject the launch secret which cleared up some open questions, but I could not figure out where the hardcoded GUIDs which encase the secret come from and where/when they come into play during VM startup…?
Cheers,
Jo
GUIDs (and UUIDs) are just things you make up yourself (on linux we even have a utility: uuidgen, to do this for you which taps into our strongest source of randomness). They’re designed so if generated randomly there’s pretty much no chance of someone else coming up with the same GUID this side of the end of the universe. This allows elimination of the usual registry in favour of simply publishing the use cases.
So the configuration table passed up to grub is done with the guid
adf956ad-e98c-484c-ae11-b51c7d336447
As defined here:
https://github.com/tianocore/edk2/commit/01726b6d23d4c8a870dbd5b96c0b9e3caf38ef3c
The contents of the secret area are described in the patch sent to the grub list:
https://lists.gnu.org/archive/html/grub-devel/2020-12/msg00260.html
Which introduces the structure of the whole table and the table header guid:
1e74f542-71dd-4d66-963e-ef4287ff173b
And a GUID which identifies the disk passphrase:
736869e5-84f0-4973-92ec-06879ce3da0b
And thank you for the detailed explanation! You beat me to the answer by seconds.
I just found the link to the grub patches which define the respective GUIDs (they got a bit lost in edk / OVMF patches and I accidentally overlooked them), so I am sorry for the unnecessary question. Thanks again for the detailed article!
Cheers,
Jo
Thanks for the awesome article. Very insightful! In extension, one can also refer “ ” for additional information
Would SEV-SNP finally allow on-demand attestation? So no access to the QMP port is required, I still do not see how access to the QMP port could be secured from the host (say the host is malicious and you want to resume+decrypt disk of a VM you let the host run).