Category Archives: Security

Securing a Rooted Android Phone

This article will discuss securing your phone after you’ve rooted it and installed your preferred os (it will not discuss how to root your phone or change the OS). Re-securing your phone requires the installation of a custom AVB key, which not all phones support, so I’ll only be discussing Google Pixel phones (which support this) and the LineageOS distribution (which is the one I use). The reason for wanting to do this is that by default LineageOS runs with the debug keys (i.e. known to everyone) with an unlocked bootloader, meaning OS updates and even binary changes to the system partition are easy to do. Since most android phones are fully locked, this isn’t a standard attack vector for malicious apps, but if someone is targetting you directly it may become one.

This article will cover how android verified boot (AVB) works, how to install your own custom AVB key and get a stock LineageOS distribution to use it and how to turn DM verity back on to make /system immutable.

A Brief Tour of Android Verified Boot (AVB)

We’ll actually be covering the 2.0 version of AVB, but that’s what most phones use today. The proprietary bootloader of a Pixel (the fastboot capable one you get to with adb reboot bootloader) uses a vbmeta partition to find the boot/recovery system and from there either enter recovery or boot the standard OS. vbmeta contains hashes for both this boot partition and the system partition. If your phone is unlocked the bootloader will simply boot the partitions vbmeta points to without any verification. If it is locked, the vbmeta partition must be signed by a key the phone knows. Pixel phones have two keyslots: a built in one which houses either the Google key or an OEM one and the custom slot, which is blank. In the unlocked mode, you can flash your own key into the custom slot using the fastboot flash avb_custom_key command.

The vbmeta partition also contains a boot flags region which tells the bootloader how to boot the OS. The two flags the OS knows about are in external/avb/libavb/avb_vbmeta_image.h:

/* Flags for the vbmeta image.
 *
 * AVB_VBMETA_IMAGE_FLAGS_HASHTREE_DISABLED: If this flag is set,
 * hashtree image verification will be disabled.
 *
 * AVB_VBMETA_IMAGE_FLAGS_VERIFICATION_DISABLED: If this flag is set,
 * verification will be disabled and descriptors will not be parsed.
 */
typedef enum {
  AVB_VBMETA_IMAGE_FLAGS_HASHTREE_DISABLED = (1 << 0),
  AVB_VBMETA_IMAGE_FLAGS_VERIFICATION_DISABLED = (1 << 1)
} AvbVBMetaImageFlags;

if the first flag is set then dm-verity is disabled and if the second one is set, the bootloader doesn’t pass the hash of the vbmeta partition on the kernel command line. In a standard LineageOS build, both these flags are set.

The reason for passing the vbmeta hash on the command line is so the android init process can load the vbmeta partition, hash it and verify against what the bootloader passed in, thus confirming there hasn’t been a time of check to time of use (TOCTOU) security breach. The init process cannot verify the signature for itself because the vbmeta signing public key isn’t built into the OS (which allows the OS to be signed after the images are build).

The description of the AVB_VBMETA_IMAGE_FLAGS_HASHTREE_DISABLED flag is slightly wrong. A standard android build always seems to calculate the dm-verity hash tree and insert it into the vbmeta partition (where it is verified by the vbmeta signature) it’s just that if this flag is set, the android init process won’t load the dm-verity hash tree and the system partition will thus be mutable.

Creating and Using your own custom Boot Key

Obviously android doesn’t use any standard tool form for its keys, so you have to create your own RSA 2048 (the literature implies it will work with 4096 as well but I haven’t tried it) AVB custom key using say openssl, then use avbtool (found in external/avb as a python script) to convert your RSA public key to a form that can be flashed in the phone:

avbtool extract_public_key --key pubkey.pem --output pkmd.bin

This can then be flashed to the unlocked phone (in the bootloader fastboot) with

fastboot flash avb_custom_key pkmd.bin

And you’re all set up to boot a custom signed OS.

Signing your LineageOS

There is a wrinkle to this: to re-sign the OS, you need the target-files.zip intermediate build, not the ROM install file. Unfortunately, this is pretty big (38GB for lineage-19.1) and doesn’t seem to be available for download any more. If you can find it, you can re-sign the stock LineageOS, but if not you have to build it yourself. Instructions for both building and re-signing can be found here. You need to follow this but in addition you must add two extra flags to the sign_target_files_apks command:

--avb_vbmeta_key=/path/to/private/avb.key
--avb_vbmeta_algorithm=SHA256_RSA2048

Which will ensure the vbmeta partition is signed with the key you created above.

Optionally Enabling dm-verity

If you want to enable dm-verity, you have to change the vbmeta flags to 0 (enable both hashtree and vbmeta verification) before you execute the signing command above. These flags are stored in the META/misc_info.txt file which you can extract from target-files.zip with

unzip target-files.zip META/misc_info.txt

And once done you can vi this file to find the line

avb_vbmeta_args=--flags 3 --padding_size 4096 --rollback_index 1804212800

If you update the 3 to 0 this will unset the two disable flags and allow you to do a dm-verity verified boot. Then use zip to replace this updated file

zip -u target-files.zip META/misc_info.txt

And then proceed with signing the updated target-files.zip

Wrinkle for Android-12 (lineage-19.1) and above

For all these versions, this patch ensures that if the vbmeta was signed then the vbmeta hash must be verified, otherwise the system will crash in early init, so you have no choice and must alter the avb_vbmeta_args above to either --flags 1 or --flags 0 so the vbmeta hash is passed in to init. Since you have to alter the flags anyway, you might as well enable dm-verity (set to 0) at the same time.

Re-Lock the Bootloader

Once you have installed both your custom keys and your custom signed boot image, you are ready to re-lock the bootloader. Beware that some phones will erase your data partition when you do this (the Google advice says they shouldn’t, but not all manufacturers bother to follow it), so make sure you have a backup (or are playing with a newly rooted phone).

fastboot flashing lock

Check with a reboot (the phone should now display a yellow warning triangle saying it is booting a custom OS rather than the orange unsigned OS one). If everything goes according to plan you can enter the developer settings and click the “OEM Unlocking” settings to disabled and your phone can no longer be unlocked without your say so.

Conclusions and Caveats

Following the above instructions, you can updated your phone so it will verify images you signed with your AVB key, turn on dm-verity if you wish and generally make your phone much more secure. However, remember that you haven’t erased the original AVB key, so the phone can still be updated to an image signed with that key and, worse, the recovery partition of LineageOS is modified to allow rollback, so it will allow the flashing of any signed image without triggering an erase of the data partition. There are also a few more problems like, thanks to a bug in AOSP, the recovery version of fastboot will actually allow commands that are usually forbidden if the phone is locked.

Securing the Google SIP Stack

A while ago I mentioned I use Android-10 with the built in SIP stack and that the Google stack was pretty buggy and I had to fix it simply to get it to function without disconnecting all the time. Since then I’ve upported my fixes to Android-11 (the jejb-11 branch in the repositories) by using LineageOS-19.1. However, another major deficiency in the Google SIP stack is its complete lack of security: both the SIP signalling and the media streams are all unencrypted meaning they can be intercepted and tapped by pretty much anyone in the network path running tcpdump. Why this is so, particularly for a company that keeps touting its security credentials is anyone’s guess. I personally suspect they added SIP in Android-4 with a view to basing Google Voice on it, decided later that proprietary VoIP protocols was the way to go but got stuck with people actually using the SIP stack for other calling services so they couldn’t rip it out and instead simply neglected it hoping it would die quietly due to lack of features and updates.

This blog post is a guide to how I took the fully unsecured Google SIP stack and added security to it. It also gives a brief overview of some of the security protocols you need to understand to get secure VoIP working.

What is SIP

What I’m calling SIP (but really a VoIP system using SIP) is a protocol consisting of several pieces. SIP (Session Initiation Protocol), RFC 3261, is really only one piece: it is the “signalling” layer meaning that call initiation, response and parameters are all communicated this way. However, simple SIP isn’t enough for a complete VoIP stack; once a call moves to in progress, there must be an agreement on where the media streams are and how they’re encoded. This piece is called a SDP (Session Description Protocol) agreement and is usually negotiated in the body of the SIP INVITE and response messages and finally once agreement is reached, the actual media stream for call audio goes over a different protocol called RTP (Real-time Transport Protocol).

How Google did SIP

The trick to adding protocols fast is to take them from someone else (if you’re open source, this is encouraged) so google actually chose the NIST-SIP java stack (which later became the JAIN-SIP stack) as the basis for SIP in android. However, that only covered signalling and they had to plumb it in to the android Phone model. One essential glue piece is frameworks/opt/net/voip which supplies the SDP negotiating layer and interfaces the network codec to the phone audio. This isn’t quite enough because the telephony service and the Dialer also need to be involved to do the account setup and call routing. It always interested me that SIP was essentially special cased inside these services and apps instead of being a plug in, but that’s due to the fact that some of the classes that need extending to add phone protocols are internal only; presumably so only manufacturers can add phone features.

Securing SIP

This is pretty easy following the time honoured path of sending messages over TLS instead of in the clear simply by using a TLS wrappering technique of secure sockets and, indeed, this is how RFC 3261 says to do it. However, even this minor re-engineering proved unnecessary because the nist-sip stack was already TLS capable, it simply wasn’t allowed to be activated that way by the configuration options Google presented. A simple 10 line patch in a couple of repositories (external/nist_sip, packages/services/Telephony and frameworks/opt/net/voip) fixed this and the SIP stack messaging was secured leaving only the voice stream insecure.

SDP

As I said above, the google frameworks/opt/net/voip does all the SDP negotiation. This isn’t actually part of SIP. The SDP negotiation is conducted over SIP messages (which means it’s secured thanks to the above) but how this should be done isn’t part of the SIP RFC. Instead SDP has its own RFC 4566 which is what the class follows (mainly for codec and port negotiation). You’d think that if it’s already secured by SIP, there’s no additional problem, but, unfortunately, using SRTP as the audio stream requires the exchange of additional security parameters which added to SDP by RFC 4568. To incorporate this into the Google SIP stack, it has to be integrated into the voip class. The essential additions in this RFC are a separate media description protocol (RTP/SAVP) for the secure stream and the addition of a set of tagged a=crypto: lines for key negotiation.

As will be a common theme: not all of RFC 4568 has to be implemented to get a secure RTP stream. It goes into great detail about key lifetime and master key indexes, neither of which are used by the asterisk SIP stack (which is the one my phone communicates with) so they’re not implemented. Briefly, it is best practice in TLS to rekey the transport periodically, so part of key negotiation should be key lifetime (actually, this isn’t as important to SRTP as it is to TLS, see below, which is why asterisk ignores it) and the people writing the spec thought it would be great to have a set of keys to choose from instead of just a single one (The Master Key Identifier) but realistically that simply adds a load of complexity for not much security benefit and, again, is ignored by asterisk which uses a single key.

In the end, it was a case of adding a new class for parsing the a=crypto: lines of SDP and doing a loop in the audio protocol for RTP/SAVP if TLS were set as the transport. This ended up being a ~400 line patch.

Secure RTP

RTP itself is governed by RFC 3550 which actually contains two separate stream descriptions: the actual media over RTP and a control protocol over RTCP. RTCP is mostly used for multi-party and video calls (where you want reports on reception quality to up/downshift the video resolution) and really serves no purpose for audio, so it isn’t implemented in the Google SIP stack (and isn’t really used by asterisk for audio only either).

When it comes to securing RTP (and RTCP) you’d think the time honoured mechanism (using secure sockets) would have applied although, since RTP is transmitted over UDP, one would have to use DTLS instead of TLS. Apparently the IETF did consider this, but elected to define a new protocol instead (or actually two: SRTP and SRTCP) in RFC 3711. One of the problems with this new protocol is that it also defines a new ciphersuite (AES_CM_…) which isn’t found in any of the standard SSL implementations. Although the AES_CM ciphers are very similar in operation to the AES_GCM ciphers of TLS (Indeed AES_GCM was adopted for SRTP in a later RFC 7714) they were never incorporated into the TLS ciphersuite definition.

So now there are two problems: adding code for the new protocol and performing the new encyrption/decryption scheme. Fortunately, there already exists a library (libsrtp) that can do this and even more fortunately it’s shipped in android (external/libsrtp2) although it looks to be one of those throwaway additions where the library hasn’t really been updated since it was added (for cuttlefish gcastv2) in 2019 and so is still at a pre 2.3.0 version (I did check and there doesn’t look to be any essential bug fixes missing vs upstream, so it seems usable as is).

One of the really great things about libsrtp is that it has srtp_protect and srtp_unprotect functions which transform SRTP to RTP and vice versa, so it’s easily possible to layer this library directly into an existing RTP implementation. When doing this you have to remember that the encryption also includes authentication, so the size of the packet expands which is why the initial allocation size of the buffers has to be increased. One of the not so great things is that it implements all its own crypto primitives including AES and SHA1 (which most cryptographers think is always a bad idea) but on the plus side, it’s the same library asterisk uses so how much of a real problem could this be …

Following the simple layering approach, I constructed a patch to do the RTP<->SRTP transform in the JNI code if a key is passed in, so now everything just works and setting asterisk to SRTP only confirms the phone is able to send and receive encrypted audio streams. This ends up being a ~140 line patch.

So where does DTLS come in?

Anyone spending any time at all looking at protocols which use RTP, like webRTC, sees RTP and DTLS always mentioned in the same breath. Even asterisk has support for DTLS, so why is this? The answer is that if you use RTP outside the SIP framework, you still need a way of agreeing on the keys using SDP. That key agreement must be protected (and can’t go over RTCP because that would cause a chicken and egg problem) so implementations like webRTC use DTLS to exchange the initial SDP offer and answer negotiation. This is actually referred to as DTLS-SRTP even though it’s an initial DTLS handshake followed by SRTP (with no further DTLS in sight). However, this DTLS handshake is completely unnecessary for SIP, since the SDP handshake can be done over TLS protected SIP messaging instead (although I’ve yet to find anyone who can convincingly explain why this initial handshake has to go over DTLS and not TLS like SIP … I suspect it has something to do with wanting the protocol to be all UDP and go over the same initial port).

Conclusion

This whole exercise ended up producing less than 1000 lines in patches and taking a couple of days over Christmas to complete. That’s actually much simpler and way less time than I expected (given the complexity in the RFCs involved), which is why I didn’t look at doing this until now. I suppose the next thing I need to look at is reinserting the SIP stack into Android-12, but I’ll save that for when Android-11 falls out of support.

Converting Engines to OpenSSL-3 Providers

Engines in OpenSSL have a long history of providing new algorithms (Russian GOST hash/signature etc) but they can also be used to interface external crypto tokens (pkcs#11) or even key managers like my own TPM engine. I’ve actually been using my TPM2 engine for nearly a decade so that I no longer have to have an unprotected private keys anywhere on my laptops (including for ssh). The purpose of this post is to look at the differences between Providers and Engines and give advice on the minimum necessary Provider implementation to give back all the Engine functionality. So this post is aimed at Engine developers who wish to convert to Providers rather than giving user advice for either.

TPMs and Engines

TPM2 actually has a remarkable number of algorithms: hashing, symmetric encryption, asymmetric signatures, key derivation, etc. However, most TPMs are connected to the host over very slow busses (usually serial), which means that no-one in their right mind would use a TPM for bulk data operations (like hashing or symmetric encryption) since it will take orders of magnitude longer than if the native CPU did it. Thus from an Engine point of view, the TPM is really only good for guarding private asymmetric keys and doing sign or decrypt operations on them, which are the only capabilities the TPM engine has.

Hashes and Signatures

Although I said above we don’t use the TPM for doing hashes, the TPM2_Sign() routines insist on knowing which hash they’re signing. For ECDSA signatures, this is irrelevant, since the hash type plays no part in the signature (it’s always truncated to key length and converted to a bignum) but for RSA the ASN.1 form of the hash description is part of the toBeSigned data. The problem now is that early TPM2’s only had two hash algorithms (sha1 and sha256) and the engine wanted to be able to use larger hash sizes. The solution was actually easy: lie about the hash size for ECDSA, so always give the hash that’s the width of the key (sha256 for NIST P-256 and sha384 for NIST P-384) and left truncate the passed in hash if larger or left zero pad if smaller.

For RSA, the problem is more acute, since TPM2_Sign() actually takes a raw digest and adds the hash description but the engine code sends down the fully described hash which merely needs to be padded if PKCS1 (PSS data is fully padded when sent down) and encrypted with the private key. The solution to this taken years ago was not to bother with TPM2_Sign() at all for RSA keys but instead to do a Decrypt operation1. This also means that TPM RSA engine keys are marked as decryption keys, not signing keys.

The Engine Itself

Given that the TPM is really only guarding the private keys, it only makes sense to substitute engine functions for the private key operations. Although the TPM can do public key operations, the core OpenSSL routines do them much faster and no information is leaked about the private key by doing them through OpenSSL, so Engine keys were constructed from standard OpenSSL keys by substituting a couple of private key methods from the underlying key types. One thing Engines were really bad at was passing additional parameters at key creation time and doing key wrapping. The result is that most Engines already have a separate tool to create engine keys (create_tpm2_key for the TPM2 engine) because complex arguments are needed for TPM specific things like key policy.

TPM keys are really both public and private keys combined and the public part of the key can be accessed without a password (unlike OpenSSL keys) or even access to the TPM that created the key. However, the engine code doesn’t usually know when only the public part of the key will be required and password prompting is done in OpenSSL at key loading (the TPM doesn’t need a password until key use), so usually after a TPM key is created, the public key is also separately derived using a pkey operation and used as a normal public key.

The final, and most problematic Engine feature, is key loading. Engine keys must be loaded using a special API (ENGINE_load_private_key). OpenSSL built in applications require you to specify the key type (-keyform option) but most well written OpenSSL applications simply try loading the PEM key first, then the DER key then the Engine key (since they all have different APIs), but frequently the Engine key is forgotten leading to the application having to be patched if you want to use them with any engine.

Converting Engines to Providers

The provider API has several pieces which apply to asymmetric key handling: Store, Encode/Decode, Key Management, Signing and Decryption (plus many more if you provide hashes or symmetric algorithms). One thing to remember about the store API is that if you only have file based keys, you should use the generic file store instead. Implementing your own store is only necessary if you also have a URI based input (like PKCS#11). In fact the TPM Engine has a URI for persistent keys, so the TPM store implementation will be dealt with later.

Provider Basics

If a provider is specified on the OpenSSL command line, it will become the sole provider of every algorithm. That means that providers like the TPM2 one, which only fill in a subset of functions cannot operate on their own and must always be used with another provider (usually the default one). After initialization (see below) all provider actions are governed by algorithm tables. One of the key questions for any provider is what to do about algorithm names and properties. Because the TPM2 provider relies on external providers for other algorithms, it must use consistent key names (so “EC” for Elliptic curve and “RSA” for RSA), even though it has only a single key type. There are also elements of the provider key managements, like the way Elliptic Curve keys change name to “ECDSA” for signing and “ECDH” for derivation, which is driven by the key management query operation function. As far as I can tell, this provides no benefit and merely serves to add complexity to the provider, so my provider doesn’t implement these functions and uses the same key names throughout.

The most mysterious string of all is the algorithm property one. The manual gives very little clue as to what should be in it besides “provider=<provider name>”. Empirically it seems to have input, output and structure elements, which are primarily used by encoders and decoders: input can be either der or pem and structure must be the same as the OSSL_OBJECT_PARAM_DATA_STRUCTURE string produced by the der decoder (although you are free to choose any name for this). output is even more varied and the best current list is provided by the source; however the only encoder the TPM2 provider actually provides is the text one.

One of the really nice things about providers is that when OpenSSL is presented with a key to load, every provider will be tried (usually in the order they’re specified on the command line) to decode and load the key. This completely fixes the problem with missing ENGINE_load_private_key() functions is applications because now all applications can use any provider key. This benefit alone is enough to outweigh all the problems of doing the actual conversion to a provider.

Replacing Engine Controls

Engine controls were key/value pairs passed into engines. The TPM2 engine has two: “PIN” for the parent authority and “NVPREFIX” for the prefix which identifies a non-volatile key. Although these can be passed in with the ENGINE_ctrl() functions, they were mostly set in the configuration file. This latter mechanism can be replaced with the provider base callback core_get_params(). Most engine controls actually set global variables and with the provider, they could be placed into the provider context. However, for code sharing it’s easier simply to keep the current globals mechanism.

Initialization and Contexts

Every provider has to have an OSSL_provider_init() routine which fills in a dispatch table and allocates a core context, which is passed in to every other context routine. For a provider, there’s really only one instance, so storing variables in the provider context is really no different (except error handling and actually getting destructors) from using static variables and since the engine used static variables, that’s what we’ll stick with. However, pretty much every routine will need an allocated library context, so it’s easiest to allocate at provider init time and pass it through as the provider context. The dispatch routine must contain a query_operation function, and probably needs a teardown function if you need to use a destructor, but nothing else.

All provider function groups require a newctx() and freectx() call. This is not optional because the current OpenSSL code calls them without checking so they cannot be NULL. Thus for function groups (like encoders and key management) where new contexts aren’t really required it makes sense to use pass through context functions that simply pass through the provider context for newctx() and do nothing for freectx().

The man page implies it is necessary to pick a load of functions from the in argument, but it seems unnecessary for those which the OpenSSL library already provides. I assume it’s something to do with a provider not requiring OpenSSL symbols, but it’s impossible to implement a provider today without relying on other OpenSSL functions than those which can be picked out of the in argument.

Decoders

Decoders are used to convert a read file from PEM to DER (this is essentially the same conversion for every provider, so it is strange you have to do this rather than it being done in the core routines) and then DER to an internal key structure. The remaining decoders take DER in and output a labelled key structure (which is used as a component of the EVP_PKEY), if you do both RSA and EC keys, you need one for each key type and, unfortunately, they must be provided and may not cross decode (the RSA decoder must reject EC keys and vice versa). This is actually required so the OpenSSL core can tell what type of key it has but is a royal pain for things like the TPM where the key DER is identical regardless of key type:

const OSSL_ALGORITHM decoders[] = {
	{ "DER", "provider=tpm2,input=pem", decode_pem_fns },
	{ "RSA", "provider=tpm2,input=der,structure=TPM2", decode_rsa_fns },
	{ "EC", "provider=tpm2,input=der,structure=TPM2", decode_ec_fns },
	{ NULL, NULL, NULL }
};

The decode_pem_fns can be cut and pasted from any provider with the sole exception that you probably have a different PEM guard string that you need to check for.

Then a sample decoder function set looks like:

static const OSSL_DISPATCH decode_rsa_fns[] = {
	{ OSSL_FUNC_DECODER_NEWCTX, (void (*)(void))tpm2_passthrough_newctx },
	{ OSSL_FUNC_DECODER_FREECTX, (void (*)(void))tpm2_passthrough_freectx },
	{ OSSL_FUNC_DECODER_DECODE, (void (*)(void))tpm2_rsa_decode },
	{ 0, NULL }
};

The main job of the DECODER_DECODE function is to take the DER form of the key and convert it to an internal PKEY and send that PKEY up by reference so it can be consumed by a key management load.

Encoders

By and large, engines all come with creation tools for key files, which means that while you could now use the encoder routines to create key files, it’s probably better off to stick with what you have (especially for things like the TPM that can have complex policy statements attached to keys), so you can omit providing any encoder functions at all. The only possible exception is if you want the keys pretty printing, you might consider a text output encoder:

const OSSL_ALGORITHM encoders[] = {
	{ "RSA", "provider=tpm2,output=text", encode_text_fns },
	{ "EC", "provider=tpm2,output=text", encode_text_fns },
	{ NULL, NULL, NULL }
};

Which largely follows the format for decoders:

static const OSSL_DISPATCH encode_text_fns[] = {
	{ OSSL_FUNC_ENCODER_NEWCTX, (void (*)(void))tpm2_passthrough_newctx },
	{ OSSL_FUNC_ENCODER_FREECTX, (void (*)(void))tpm2_passthrough_freectx },
	{ OSSL_FUNC_ENCODER_ENCODE, (void (*)(void))tpm2_encode_text },
	{ 0, NULL }
};

Note: there are many more encode/decode function types you could supply, but the above are the essential ones.

Key Management

Nothing in the key management functions requires the underlying key object to be reference counted since it belongs to an already reference counted EVP_PKEY structure in the OpenSSL generic routines. However, the signature operations can’t be implemented without context duplication and the signature context must contain a reference to the provider key so, depending on how the engine implements keys, duplicating via reference might be easier than duplicating via copy. The minimum functionality to implement is LOAD, FREE and HAS. If you are doing Elliptic Curve derive or reference counting your engine keys, you will also need NEW. You also have to provide both GET_PARAMS and GETTABLE_PARAMS (many key management functions have to implement pairs like this) for at least the BITS, SECURITY_BITS and SIZE properties)2.

You must also implement the EXPORT (and EXPORT_TYPES, which must be provided but has no callers) so that you can convert your engine key to an external public key. Note the EXPORT function must fail if asked to export the private key otherwise the default provider will try to do the private key operations via the exported key as well.

If you need to do Elliptic Curve key derivation you must also implement IMPORT (and IMPORT_TYPES) because the creation of the peer key (even though it’s a public one) will necessarily go through your provider key managment functions.

The HAS function can be problematic because OpenSSL doesn’t assume the interchangeability of public and private keys, even if it is true of the engine. Thus the engine must remember in the decode routines what key selector was used (public, private or both) and make sure to condition HAS on that value.

Signatures

This is one of the most confusing areas for simple signing devices (which don’t do hashing) because you’d assume you can implement NEWCTX, FREECTX, SIGN_INIT and SIGN and be done. Unfortunately, in spite of the fact that all the DIGEST_SIGN_… functions can be implemented in terms of the previous functions and generic hashing, they aren’t, so all providers are required to duplicate hashing and signing functions including constructing the binary ASN.1 for the certificate signature function (via GET_CTX_PARAMS and its pair GETTABLE_CTX_PARAMS). Another issue a sign only token will get into is padding: OpenSSL supports a variety of padding schemes (for RSA) but is deprecating their export, so if your token doesn’t do an expected form of padding, you’ll need to implement that in your provider as well. Recalling that the TPM2 provider uses RSA Decryption for signatures means that the TPM2 provider implementation is entirely responsible for padding all signatures. In order to try to come up with a common solution, I added an opensslmissing directory to my provider under the MIT licence that anyone is free to incorporate into their provider if they end up having the same digest and padding problems I did.

Decryption and Derivation

The final thing a private key provider needs to do is decryption. This is a very different operation between Elliptic Curve and RSA keys, so you need two different operations for each (OSSL_OP_ASYM_CIPHER for RSA and OSSL_OP_KEYEXCH for EC). Each ends up being a slightly special snowflake: RSA because it may need OAEP padding (which the TPM does) but with the most usual cipher being md5 (so OAEP padding with arbitrary mask and hash function is also in opensslmissing), which the TPM doesn’t do. and EC because it requires derivation from another public key. The problem with this latter operation is that because of the way OpenSSL works, the public key must be imported into the provider before it can be used, so you must provide NEW, IMPORT and IMPORT_TYPES routines for key management for this to happen.

Store

The store functions only need to be used if you have to load keys that aren’t file based (for file based keys the default provider file store will load them). For the TPM there are a set of NV Keys with 0x81 MSO prefix that aren’t file based. We load these in the engine with //nvkey:<hex> as the designator (and the //nvkey: prefix is overridable in the config file). To get this to work in the Provider is slightly problematic because the scheme (the //nvkey: prefix) must be specified as the provider algorithm_name which is usually a constant in a static array. This means that the stores actually can’t be static and must have the configuration defined name poked into it before the store is used, but this is relatively easy to arrange this in the OSSL_provider_init() function. Once this is done, it’s relatively easy to create a store. The only really problematic function is the STORE_EOF one, which is designed around files but means you have to keep an eof indicator in the context and update it to be 1 once the load function has complete.

The Provider Recursion Problem

This doesn’t seem to be discussed anywhere else, but it can become a huge issue if your provider depends on another library which also uses OpenSSL. The TPM2 provider depends on either the Intel or IBM TSS libraries and both of those use OpenSSL for cryptographic operations around TPM transport security since both of them use ECDH to derive a seed for session encryption and HMAC. The problem is that ordinarily the providers are called in the order they’re listed, so you always have to specify –provider default –provider tpm2 to make up for the missing public key operations in the TPM2 provider. However, the OpenSSL core operates a cache for the provider operations it has previously found and searches the cache first before doing any other lookups, so if the EC key management routines are cached (as they are if you input a TPM format key) and the default ones aren’t (because inputting TPM format keys requires no public key operations), the next attempt to generate an ephemeral EC key for the ECDH security derivation will find the TPM2 provider first. So say you are doing a signature which requires HMAC security to guard against interposer tampering. The use of ECDH in the HMAC seed derivation will then call back into the provider to do an ECDH operation which also requires session security and will thus call back again into the provider ad infinitum (or at least until stack overflow). The only way to break out of this infinite recursion is to try to prime the cache with the default provider as well as the TPM2 provider, so the tss library functions can find the default provider first. The (absolutely dirty) hack I have to do this is inside the pkey decode function as

	if (alg == TPM_ALG_ECC) {
		EVP_PKEY_CTX *ctx = EVP_PKEY_CTX_new_id(EVP_PKEY_EC, NULL);
		EVP_PKEY_CTX_free(ctx);
	}

Which currently works to break the recursion loop. However it is an unreliable hack because internally the OpenSSL hash bucket implementation orders the method cache by provider address and since the TPM2 provider is dynamically loaded it has a higher address than the OpenSSL default one. However, this will not survive security techniques like Address Space Layout Randomization.

Conclusions

Hopefully I’ve given a rapid (and possibly useful) overview of converting an engine to a provider which will give some pointers about provider conversion to all the engine token implementations out there. Please feel free to repurpose my opensslmissing routines under the MIT licence without any obligations to get them back upstream (although I would be interested in hearing about bugs and feature enhancements). In the end, it was only 1152 lines of C to implement the TPM2 provider (additive on top of the common shared code base with the existing Engine) and 681 lines in opensslmissing, showing firstly that there is still an need for OpenSSL itself to do the missing routines as a provider export and secondly that it really takes a fairly small amount of provider code to wrapper an existing engine implementation provided you’re discriminating about what functions you actually provide. As a final remark I should note that the openssl_tpm2_engine has a fairly extensive test suite which all now pass with the provider implementation as well.

Papering Over our TPM 2.0 TSS Divisions

For years I’ve been hoping that the Trusted Computing Group (TCG) based IBM and Intel TSS (TCG Software Stack) would simply integrate with one another into a single package. The rationale is pretty simple: the Intel TSS is already quite a large collection of libraries so adding one more (the IBM TSS has a single library) wouldn’t be too much of a burden. Both TSSs are based on TCG specifications, except that the IBM TSS is based on the TPM 2.0 Library Specification and the Intel TSS is based on the TPM Software Stack (also, not at all confusingly, abbreviated TSS). There’s actually very little overlap between these specifications so co-existence seems very reasonable. Before we get into the stories of these two stacks and what they do, I should confess my biases: while I’ve worked with the TCG over the years, I’ve always harboured the view that the complete lack of adoption of TPM 2.0’s predecessor (TPM 1.2) was because of the hugely complicated nature of the TCG mandated software stack which was implemented in Linux by trousers. It is my firm belief that the complexity of the API lead to the lack of uptake, even though I made several efforts over the years to make use of it.

My primary interest in the TPM has been as a secure laptop keystore (since I already paid for a TPM, I didn’t see the need to fork out again for one of the new security dongles; plus the TPM is infinitely scalable in the number of keys, unlike most dongles). The key to making the TPM usable in this form is integration with existing Cryptographic systems (via plugins if they do them). Since openssl has an engine plugin, I’ve already produced an openssl TPM2 engine, patches for gnupg and engine integration patches for openvpn (upstream in 2.5) and openssh as well as a PKC11 exporter (to make file based engine keys exportable as PKCS11 tokens). Note a lot of the patches aren’t strictly TPM patches, they’re actually making openssl engines work in places they previously didn’t. However, the one thing most of the patches that actually touch the TPM have in common is that they have to pick one or other of the available TSSs to operate with. Before describing the TSS agnostic solution, lets look at why these two TSSs exist and what the difference is between them and why you might choose one over the other.

Schizophrenia at the TCG

As I said in the introduction, both TSSs are based on TCG specifications. These standards aren’t ambiguous: they lay out in excruciating detail what the header files are called and what the prototypes and structures have to be. Both TSS implementations are the way they are because they wouldn’t be following the standards if they deviated even slightly. The problem is the standards don’t agree with each other in meaningful ways. For instance the TPM Library standards define every structure in terms of the fundamental unit of TPM data: the TPM2B structure, which defines a 16 bit big endian length followed by a data unit of that length. The TPM Library standards (in Part 4 section 9.10.6) lay out that every TPM2B_X structure shall be a union of a ‘b’ element which is a TPM2B and a ‘t’ element which is the actual structure. However the TPM Software Stack specification eliminates the plain TPM2B so every TPM2B_X structure in the latter specification are not unions, they are simply the ‘t’ form of the structure. This means that although TPM2B_X structures in each specification are byte for byte the same, they are definitionally different when written as C code and can’t be assigned to each other … oops. The TPM Library standard lays out additional structures for an elaborate calling convention for the TPM2_Command interfaces which are completely different from the ESYS_Command interfaces in the TPM Software Stack.

The reason it’s all done this way? well the specifications were built by completely different committees for what the committees saw as separate use cases, so they didn’t see a need to reconcile the differences. As long as the definitions were byte for byte compatible, everything would work out correctly on the wire. The problem was the TPM Library specification was released nearly a decade ahead of the TPM Software Stack specification, so the first TSS created had to follow the former because the latter didn’t exist.

Sessions, HMAC and Encryption

One of the perennial problems of a TPM is that integrity and security of the information going over the wire is the responsibility of the user. However, the encryption and integrity computations involved, particularly the key derivations, are incredibly involved (even though well documented in the TPM Library specification, so naturally everyone would like the TSS to do this. The problem the TPM Secure Stack had is that all the way up to its ESAPI specification, the security and integrity computations were still the responsibility of the user, so it didn’t begin to be useful until ESAPI was finalized a couple of years ago.

The Resource Manager Problem

TPM 2.0 was designed to be far leaner in terms of resources than TPM 1.2, which meant there was a very small limit to the number of sessions and volatile objects it could contain at any one time. This necessitated the use of a “resource manager” to control access otherwise applications would get unexpected out of resource errors. The Intel TSS has its own resource manager. However, the Linux Kernel itself incorporated a resource manager in the TPM device in 4.12 and the IBM TSS avoids the need for its own resource manager by using this, and will, therefore not work correctly on earlier kernel versions.

Inside the IBM TSS

Even though the IBM TSS is based on a solid and easily comprehensible and detailed specification, that specification itself suffers from a couple of defects. The first being it assumes you’re submitting to a physical TPM, so the specification has no functional (library based) submission API for TPM commands, so the IBM TSS had to invent API it called TSS_Execute() which is a way of sending TPM commands directly to the physical TPM over the kernel’s device interfaces. Secondly, the standard contains no routing interfaces (telling it what destination the TPM is on: should it open the /dev/tpmrm0 device or send the commands to the TPM over an IP socket), so this is controlled in the IBM TSS by several environment variables (TPM_INTERFACE_TYPE, which can be either “dev” or “socsim” for either a physical device or a network socket. The endpoints being controlled by TPM_DEVICE for “dev” type, which specifies which device to use, defaulting to /dev/tpmrm0 or TPM_SERVER_NAME and TPM_PLAFORM_PORT for “socsim”).

The invented TSS_Execute() API also does all the encryption and HMAC parts necessary for secure and integrity verified communication with the TPM, so it acts as a fully functional TSS. The main drawback of the IBM TSS is that it stores essential information about the sessions and handles in files which will, by default, be dropped into the local directory. Most users of the IBM TSS have to set TPM_DATA_DIR to be a specially created directory under /tmp to avoid leaving messy artifacts in users home directories.

Inside the Intel TSS

The TPM Software Stack consists of a large number of different specifications, including the resource manager (which is now unnecessary for kernels above 4.12) the TCTI which specifies the routing information for the TPM. It turns out that even in the Intel TSS, environment variables are the most convenient form to specify this information but, unfortunately, the name of the environment variable has been left up to each use case instead of being standardised in the library meaning you’ll have to consult the man page to figure out what it is. The next set of standards: SAPI and ESAPI define functional interfaces to the TPM with one submission API for each command and additionally a corresponding ..._Async()/..._Finish() pair for asynchronous programming. The only real difference between SAPI and ESAPI is that the latter also does the necessary session cryptography for security and integrity, so it’s pretty much the only usable interface for TPM commands. Unfortunately, the ESAPI interface, as constructed by the TCG, has several cases of premature abstraction the worst of which is a separate abstraction for the TPM handle interface which lives only as long as the lifetime of the connection object and which necessitates multiple conversions to and from internal handle objects if your session or object lives longer than the connection (which can be the case).

There is one final wrinkle is that in the handle abstraction, ESAPI has no API for retrieving the real TPM handle. I’d always wondered why the Intel TSS tpm2 tools always saved the objects they create to a context instead of simply returning the handle to them, but this is the reason: without the ability to transform an internal handle to an external one, you either save the context or let the object die when the connection terminates. This problem is one forced by the ESAPI standard, but eventually it became enough of a problem that the Intel TSS introduced its own additional API to remedy.

The other major difference between the Intel and IBM TSSs is memory handling for returned results: The IBM TSS requires pre-allocated structures whereas the Intel TSS insists on allocation on return. It looks like the Intel TSS should be able to tell if the return pointer is allocated or NULL, but right at the moment it always allocates and overwrites the pointer.

Constructing a unifying Interface for both the IBM and Intel TSSs

In essence the process for converting something that runs with the IBM TSS to being TSS Agnostic is a fairly simple three step process which I’ll illustrate by reference to the openssl tpm2 engine which has already been converted:

  1. Hide the structural differences by inserting a set of macros: VAL() and VAL_2B() which hide most of the TCG induced structure schizophrenia.
  2. Convert the API call structure to be functional instead of via a single TSS_Execute() call. This is quite involved so I did it by adding tpm2_Function() wrappers for each specific invocation.
  3. Introduce the correct premature abstraction for internal and external representation of handles. This was the nastiest step for me because handles are stored in long lived engine structures, and the internal and external representations are both forms of uint32_t even in ESAPI (meaning the compiler won’t complain if you assign one to the other) so it was incredibly painful to get this conversion correct.

Once this is done, the remaining step was to introduce a header which did the impedance matching between the Intel and IBM TSSs and an autoconf macro to detect which TSS is installed and the resulting configure and compile just works. The resulting code will now build and run under either TSS. I should point out that the Intel TSS is missing several helper routines, but these are added into the intel-tss.h header file by copying the from the original IBM TSS. Finally an autoconf check is added to look for the missing internal to external handle transform, and everything is ready to go.

It does seem like it would be easier to port an existing Intel TSS application to the IBM TSS, since points 2 and 3 will already be sorted out. However, all the major TSS library using applications are IBM TSS based, so I haven’t actually been able to verify this.

Remaining Problems and Anomalies

The biggest remaining issue was the test scripts. The openssl TPM2 engine has 27 of them all told, all designed to check the engine function by invoking it via openssl when connected to a software TPM. These scripts are all highly dependent on the IBM TSS command line binaries and the Intel TSS versions seem to be very unstable in terms of argument structure making it pretty much impossible to convert, so I elected finally to have the tests run only if the IBM TSS CLI is installed. The next problem was that the Intel TSS version of the engine didn’t actually pass all the tests. However this was quickly narrowed down to a bug in the Intel TSS when using bound sessions on the NULL seed.

The sole remaining issue is a curious performance anomaly. When running time make check with the IBM TSS, the result is:

real 0m6.100s
user 0m2.827s
sys 0m0.822s

and the same command with the Intel TSS (running one fewer test and skipping the NULL seed) is:

real	0m10.948s
user	0m6.822s
sys	0m0.859s

Showing that the Intel TSS is nearly twice as slow as the IBM one with most of the time differential being user time. Since the tests use a software TPM which can perform the cryptographic operations at the speed of the main CPU, this is showing some type of issue with the command transmission system of the Intel TSS, likely having to do with the fact that most applications use synchronous TPM operations (the engine certainly does) but in the Intel TSS, the synchronous operations are implemented as the corresponding asynchronous pair. Regardless of the root cause, this is unlikely to be a problem with real world TPM crypto where the time taken for any operation will be dominated by the slowness of the physical TPM.

Conclusion

The TSS agnostic scheme adopted by the openssl TPM2 engine should be easily adaptable for all the other non-engine TPM code bases, and thus should pave the way for users not having to choose between applications which only support the Intel or IBM TSSs and can choose to install the best supported one on their distribution. The next steps are to investigate adapting this infrastructure to the existing gnupg patches (done and upstream) and also see if it can be used to solve the gnutls conundrum over supporting TPM based keys.

Deploying Encrypted Images for Confidential Computing

In the previous post I looked at how you build an encrypted image that can maintain its confidentiality inside AMD SEV or Intel TDX. In this post I’ll discuss how you actually bring up a confidential VM from an encrypted image while preserving secrecy. However, first a warning: This post represents the state of the art and includes patches that are certainly not deployed in distributions and may not even be upstream, so if you want to follow along at home you’ll need to patch things like qemu, grub and OVMF. I should also add that, although I’m trying to make everything generic to confidential environments, this post is based on AMD SEV, which is the only confidential encrypted1 environment currently shipping.

The Basics of a Confidential Computing VM

At its base, current confidential computing environments are about using encrypted memory to run the virtual machine and guarding the encryption key so that the owner of the host system (the cloud service provider) can’t get access to it. Both SEV and TDX have the encryption technology inside the main memory controller meaning the L1 cache isn’t encrypted (still vulnerable to cache side channels) and DMA to devices must also be done via unencryped memory. This latter also means that both the BIOS and the Operating System of the guest VM must be enlightened to understand which pages to encrypted and which must not. For this reason, all confidential VM systems use OVMF2 to boot because this contains the necessary enlightening. To a guest, the VM encryption looks identical to full memory encryption on a physical system, so as long as you have a kernel which supports Intel or AMD full memory encryption, it should boot.

Each confidential computing system has a security element which sits between the encrypted VM and the host. In SEV this is an aarch64 processor called the Platform Security Processor (PSP) and in TDX it is an SGX enclave running Intel proprietary code. The job of the PSP is to bootstrap the VM, including encrypting the initial OVMF and inserting the encrypted pages. The security element also includes a validation certificate, which incorporates a Diffie-Hellman (DH) key. Once the guest owner obtains and validates the DH key it can use it to construct a one time ECDH encrypted bundle that can be passed to the security element on bring up. This bundle includes an encryption key which can be used to encrypt secrets for the security element and a validation key which can be used to verify measurements from the security element.

The way QEMU boots a Q35 machine is to set up all the configuration (including a disk device attached to the VM Image) load up the OVMF into rom memory and start the system running. OVMF pulls in the QEMU configuration and constructs the necessary ACPI configuration tables before executing grub and the kernel from the attached storage device. In a confidential VM, the first task is to establish a Guest Owner (the person whose encrypted VM it is) which is usually different from the Host Owner (the person running or controlling the Physical System). Ownership is established by transferring an encrypted bundle to the Secure Element before the VM is constructed.

The next step is for the VMM (QEMU in this case) to ask the secure element to provision the OVMF Firmware. Since the initial OVMF is untrusted, the Guest Owner should ask the Secure Element for an attestation of the memory contents before the VM is started. Since all paths lead through the Host Owner, who is also untrusted, the attestation contains a random nonce to prevent replay and is HMAC’d with a Guest Supplied key from the Launch Bundle. Once the Guest Owner is happy with the VM state, it supplies the Wrapped Key to the secure element (along with the nonce to prevent replay) and the Secure Element unwraps the key and provisions it to the VM where the Guest OS can use it for disc encryption. Finally, the enlightened guest reads the encrypted disk to unencrypted memory using DMA but uses the disk encryptor to decrypt it to encrypted memory, so the contents of the Encrypted VM Image are never visible to the Host Owner.

The Gaps in the System

The most obvious gap is that EFI booting systems don’t go straight from the OVMF firmware to the OS, they have to go via an EFI bootloader (grub, usually) which must be an efi binary on an unencrypted vFAT partition. The second gap is that grub must be modified to pick the disk encryption key out of wherever the Secure Element has stashed it. The third is that the key is currently stashed in VM memory before OVMF starts, so OVMF must know not to use or corrupt the memory. A fourth problem is that the current recommended way of booting OVMF has a flash drive for persistent variable storage which is under the control of the host owner and which isn’t part of the initial measurement.

Plugging The Gaps: OVMF

To deal with the problems in reverse order: the variable issue can be solved simply by not having a persistent variable store, since any mutable configuration information could be used to subvert the boot and leak the secret. This is achieved by stripping all the mutable variable handling out of OVMF. Solving key stashing simply means getting OVMF to set aside a page for a secret area and having QEMU recognise where it is for the secret injection. It turns out AMD were already working on a QEMU configuration table at a known location by the Reset Vector in OVMF, so the secret area is added as one of these entries. Once this is done, QEMU can retrieve the injection location from the OVMF binary so it doesn’t have to be specified in the QEMU Machine Protocol (QMP) command. Finally OVMF can protect the secret and package it up as an EFI configuration table for later collection by the bootloader.

The final OVMF change (which is in the same patch set) is to pull grub inside a Firmware Volume and execute it directly. This certainly isn’t the only possible solution to the problem (adding secure boot or an encrypted filesystem were other possibilities) but it is the simplest solution that gives a verifiable component that can be invariant across arbitrary encrypted boots (so the same OVMF can be used to execute any encrypted VM securely). This latter is important because traditionally OVMF is supplied by the host owner rather than being part of the VM image supplied by the guest owner. The grub script that runs from the combined volume must still be trusted to either decrypt the root or reboot to avoid leaking the key. Although the host owner still supplies the combined OVMF, the measurement assures the guest owner of its correctness, which is why having a fairly invariant component is a good idea … so the guest owner doesn’t have potentially thousands of different measurements for approved firmware.

Plugging the Gaps: QEMU

The modifications to QEMU are fairly simple, it just needs to scan the OVMF file to determine the location for the injected secret and inject it correctly using a QMP command.. Since secret injection is already upstream, this is a simple find and make the location optional patch set.

Plugging the Gaps: Grub

Grub today only allows for the manual input of the cryptodisk password. However, in the cloud we can’t do it this way because there’s no guarantee of a secure tty channel to the VM. The solution, therefore, is to modify grub so that the cryptodisk can use secrets from a provider, in addition to the manual input. We then add a provider that can read the efi configuration tables and extract the secret table if it exists. The current incarnation of the proposed patch set is here and it allows cryptodisk to extract a secret from an efisecret provider. Note this isn’t quite the same as the form expected by the upstream OVMF patch in its grub.cfg because now the provider has to be named on the cryptodisk command line thus

cryptodisk -s efisecret

but in all other aspects, Grub/grub.cfg works. I also discovered several other deviations from the initial grub.cfg (like Fedora uses /boot/grub2 instead of /boot/grub like everyone else) so the current incarnation of grub.cfg is here. I’ll update it as it changes.

Putting it All Together

Once you have applied all the above patches and built your version of OVMF with grub inside, you’re ready to do a confidential computing encrypted boot. However, you still need to verify the measurement and inject the encrypted secret. As I said before, this isn’t easy because, due to replay defeat requirements, the secret bundle must be constructed on the fly for each VM boot. From this point on I’m going to be using only AMD SEV as the example because the Intel hardware doesn’t yet exist and AMD kindly gave IBM research a box to play with (Anyone with a new EPYC 7xx1 or 7xx2 based workstation can likely play along at home, but check here). The first thing you need to do is construct a launch bundle. AMD has a tool called sev-tool to do this for you and the first thing you need to do is obtain the platform Diffie Hellman certificate (pdh.cert). The tool will extract this for you

sevtool --pdh_cert_export

Or it can be given to you by the cloud service provider (in this latter case you’ll want to verify the provenance using sevtool –validate_cert_chain, which contacts the AMD site to verify all the details). Once you have a trusted pdh.cert, you can use this to generate your own guest owner DH cert (godh.cert) which should be used only one time to give a semblance of ECDHE. godh.cert is used with pdh.cert to derive an encryption key for the launch bundle. You can generate this with

sevtool --generate_launch_blob <policy>

The gory details of policy are in the SEV manual chapter 3, but most guests use 1 which means no debugging. This command will generate the godh.cert, the launch_blob.bin and a tmp_tk.bin file which you must save and keep secure because it contains the Transport Encryption and Integrity Keys (TEK and TIK) which will be used to encrypt the secret. Figuring out the qemu command line options needed to launch and pause a SEV guest is a bit of a palaver, so here is mine. You’ll likely need to change things, like the QMP port and the location of your OVMF build and the launch secret.

Finally you need to get the launch measure from QMP, verify it against the sha256sum of OVMF.fd and create the secret bundle with the correct GUID headers. Since this is really fiddly to do with sevtool, I wrote this python script3 to do it all (note it requires qmp.py from the qemu git repository). You execute it as

sevsecret.py --passwd <disk passwd> --tiktek-file <location of tmp_tk.bin> --ovmf-hash <hash> --socket <qmp socket>

And it will verify the launch measure and encrypt the secret for the VM if the measure is correct and start the VM. If you got everything correct the VM will simply boot up without asking for a password (if you inject the wrong secret, it will still ask). And there you have it: you’ve booted up a confidential VM from an encrypted image file. If you’re like me, you’ll also want to fire up gdb on the qemu process just to show that the entire memory of the VM is encrypted …

Conclusions and Caveats

The above script should allow you to boot an encrypted VM anywhere: locally or in the cloud, provided you can access the QMP port (most clouds use libvirt which introduces yet another additional layering pain). The biggest drawback, if you refer to the diagram, is the yellow box: you must trust the secret element, which in both Intel and AMD is proprietary4, in order to get confidential computing to work. Although there is hope that in future the secret element could be fully open source, it isn’t today.

The next annoyance is that launching a confidential VM is high touch requiring collaboration from both the guest owner and the host owner (due to the anti-replay nonce). For a single launch, this is a minor annoyance but for an autoscaling (launch VMs as needed) platform it becomes a major headache. The solution seems to be to have some Hardware Security Module (HSM), like the cloud uses today to store encryption keys securely, and have it understand how to measure and launch encrypted VMs on behalf of the guest owner.

The final conclusion to remember is that confidentiality is not security: your VM is as exploitable inside a confidential encrypted VM as it was outside. In many ways confidentiality and security are opposites, in that security in part requires reducing the trusted code and confidentiality requires pulling as much as possible inside. Confidential VMs do have an answer to the Cloud trust problem since the enterprise can now deploy VMs without fear of tampering by the cloud provider, but those VMs are as insecure in the cloud as they were in the Enterprise Data Centre. All of this argues that Confidential Computing, while an important milestone, is only one step on the journey to cloud security.

Patch Status

The OVMF patches are upstream (including modifications requested by Intel for TDX). The QEMU and grub patch sets are still on the lists.

Building Encrypted Images for Confidential Computing

With both Intel and AMD announcing confidential computing features to run encrypted virtual machines, IBM research has been looking into a new format for encrypted VM images. The first question is why a new format, after all qcow2 only recently deprecated its old encrypted image format in favour of luks. The problem is that in confidential computing, the guest VM runs inside the secure envelope but the host hypervisor (including the QEMU process) is untrusted and thus runs outside the secure envelope and, unfortunately, even for the new luks format, the encryption of the image is handled by QEMU and so the encryption key would be outside the secure envelope. Thus, a new format is needed to keep the encryption key (and, indeed, the encryption mechanism) within the guest VM itself. Fortunately, encrypted boot of Linux systems has been around for a while, and this can be used as a practical template for constructing a fully confidential encrypted image format and maintaining that confidentiality within a hostile cloud environment. In this article, I’ll explore the state of the art in encrypted boot, constructing EFI encrypted boot images, and finally, in the follow on article, look at deploying an encrypted image into a confidential environment and maintaining key secrecy in the cloud.

Encrypted Boot State of the Art

Luks and the cryptsetup toolkit have been around for a while and recently (in 2018), the luks format was updated to version 2. However, actually booting a linux kernel from an encrypted partition has always been a bit of a systems problem, primarily because the bootloader (grub) must decrypt the partition to actually load the kernel. Fortunately, grub can do this, but unfortunately the current grub in most distributions (2.04) can only read the version 1 luks format. Secondly, the user must type the decryption passphrase into grub (so it can pull the kernel and initial ramdisk out of the encrypted partition to boot them), but grub currently has no mechanism to pass it on to the initial ramdisk for mounting root, meaning that either the user has to type their passphrase twice (annoying) or the initial ramdisk itself has to contain a file with the disk passphrase. This latter is the most commonly used approach and only has minor security implications when the system is in motion (the ramdisk and the key file must be root read only) and the password is protected at rest by the fact that the initial ramdisk is also on the encrypted volume. Even more annoying is the fact that there is no distribution standard way of creating the initial ramdisk. Debian (and Ubuntu) have the most comprehensive documentation on how to do this, so the next section will look at the much less well documented systemd/dracut mechanism.

Encrypted Boot for Systemd/Dracut

Part of the problem here seems to be less that stellar systems co-ordination between the two components. Additionally, the way systemd supports passphraseless encrypted volumes has been evolving for a while but changed again in v246 to mirror the Debian method. Since cloud images are usually pretty up to date, I’ll describe this new way. Each encrypted volume is referred to by UUID (which will be the UUID of the containing partition returned by blkid). To get dracut to boot from an encrypted partition, you must pass in

rd.luks.uuid=<UUID>

but you must also have a key file named

/etc/cryptsetup-keys.d/luks-<UUID>.key

And, since dracut hasn’t yet caught up with this, you usually need a cryptodisk.conf file in /etc/dracut.conf.d/ which contains

install_items+=" /etc/cryptsetup-keys.d/* "

Grub and EFI Booting Encrypted Images

Traditionally grub is actually installed into the disk master boot record, but for EFI boot that changed and the disk (or VM image) must have an EFI System partition which is where the grub.efi binary is installed. Part of the job of the grub.efi binary is to find the root partition and source the /boot/grub1/grub.cfg. When you install grub on an EFI partition a search for the root by UUID is actually embedded into the grub binary. Another problem is likely that your distribution customizes the location of grub and updates the boot variables to tell the system where it is. However, a cloud image can’t rely on the boot variables and must be installed in the default location (\EFI\BOOT\bootx64.efi). This default location can be achieved by adding the –removable flag to grub-install.

For encrypted boot, this becomes harder because the grub in the EFI partition must set up the cryptographic location by UUID. However, if you add

GRUB_ENABLE_CRYPTODISK=y

To /etc/default/grub it will do the necessary in grub-install and grub-mkconfig. Note that on Fedora, where every other GRUB_ENABLE parameter is true/false, this must be ‘y’, unfortunately grub-install will look for =y not =true.

Putting it all together: Encrypted VM Images

Start by extracting the root of an existing VM image to a tar file. Make sure it has all the tools you will need, like cryptodisk and grub-efi. Create a two partition raw image file and loopback mount it (I usually like 4GB) with a small efi partition (p1) and an encrypted root (p2):

truncate -s 4GB disk.img
parted disk.img mklabel gpt
parted disk.img mkpart primary 1Mib 100Mib
parted disk.img mkpart primary 100Mib 100%
parted disk.img set 1 esp on
parted disk.img set 1 boot on

Now setup the efi and cryptosystem (I use ext4, but it’s not required). Note at this time luks will require a password. Use a simple one and change it later. Also note that most encrypted boot documents advise filling the encrypted partition with random numbers. I don’t do this because the additional security afforded is small compared with the advantage of converting the raw image to a smaller qcow2 one.

losetup -P -f disk.img          # assuming here it uses loop0
l=($(losetup -l|grep disk.img)) # verify with losetup -l
mkfs.vfat ${l}p1
blkid ${l}p1       # remember the EFI partition UUID
cryptsetup --type luks1 luksFormat ${l}p2 # choose temp password
blkid ${l}p2       # remember this as <UUID> you'll need it later 
cryptsetup luksOpen ${l}p2 cr_root
mkfs.ext4 /dev/mapper/cr_root
mount /dev/mapper/cr_root /mnt
tar -C /mnt -xpf <vm root tar file>
for m in run sys proc dev; do mount --bind /$m /mnt/$m; done
chroot /mnt

Create or modify /etc/fstab to have root as /dev/disk/cr_root and the EFI partition by label under /boot/efi. Now set up grub for encrypted boot2

echo "GRUB_ENABLE_CRYPTODISK=y" >> /etc/default/grub
mount /boot/efi
grub-install --removable --target=x86_64-efi
grub-mkconfig -o /boot/grub/grub.cfg

For Debian, you’ll need to add an /etc/crypttab entry for the encrypted disk:

cr_root UUID=<uuid> luks none

And then re-create the initial ramdisk. For dracut systems, you’ll have to modify /etc/default/grub so the GRUB_CMDLINE_LINUX has a rd.luks.uuid=<UUID> entry. If this is a selinux based distribution, you may also have to trigger a relabel.

Now would also be a good time to make sure you have a root password you know or to install /root/.ssh/authorized_keys. You should unmount all the binds and /mnt and try EFI booting the image. You’ll still have to type the password a couple of times, but once the image boots you’re operating inside the encrypted envelope. All that remains is to create a fast boot high entropy low iteration password and replace the existing one with it and set the initial ramdisk to use it. This example assumes your image is mounted as SCSI disk sda, but it may be a virtual disk or some other device.

dd if=/dev/urandom bs=1 count=33|base64 -w 0 > /etc/cryptsetup-keys.d/luks-<UUID>.key
chmod 600 /etc/cryptsetup-keys.d/luks-<UUID>.key
cryptsetup --key-slot 1 luksAddKey /dev/sda2 # permanent recovery key
cryptsetup --key-slot 0 luksRemoveKey /dev/sda2 # remove temporary
cryptsetup --key-slot 0 --iter-time 1 luksAddKey /dev/sda2 /etc/cryptsetup-keys.d/luks-<UUID>.key

Note the “-w 0” is necessary to prevent the password from having a trailing newline which will make it difficult to use. For mkinitramfs systems, you’ll now need to modify the /etc/crypttab entry

cr_root UUID=<UUID> /etc/cryptsetup-keys.d/luks-<UUID>.key luks

For dracut you need the key install hook in /etc/dracut.conf.d as described above and for Debian you need the keyfile pattern:

echo "KEYFILE_PATTERN=\"/etc/cryptsetup-keys.d/*\"" >>/etc/cryptsetup-initramfs/conf-hook

You now rebuild the initial ramdisk and you should now be able to boot the cryptosystem using either the high entropy password or your rescue one and it should only prompt in grub and shouldn’t prompt again. This image file is now ready to be used for confidential computing.

Webauthn in Linux with a TPM via the HID gadget

Account security on the modern web is a bit of a nightmare. Everyone understands the need for strong passwords which are different for each account, but managing them is problematic because the human mind just can’t remember hundreds of complete gibberish words so everyone uses a password manager (which, lets admit it, for a lot of people is to write it down). A solution to this problem has long been something called two factor authentication (2FA) which authenticates you by something you know (like a password) and something you posses (like a TPM or a USB token). The problem has always been that you ideally need a different 2FA for each website, so that a compromise of one website doesn’t lead to the compromise of all your accounts.

Enter webauthn. This is designed as a 2FA protocol that uses public key cryptography instead of shared secrets and also uses a different public/private key pair for each website. Thus aspiring to be a passwordless secure scalable 2FA system for the web. However, the webauthn standard only specifies how the protocol works when the browser communicates with the remote website, there’s a different standard called FIDO or U2F that specifies how the browser communicates with the second factor (called an authenticator in FIDO speak) and how that second factor works.

It turns out that the FIDO standards do specify a TPM as one possible backend, so what, you might ask does this have to do with the Linux Gadget subsystem? The answer, it turns out, is that although the standards do recommend a TPM as the second factor, they don’t specify how to connect to one. The only connection protocols in the Client To Authenticator Protocol (CTAP) specifications are USB, BLE and NFC. And, in fact, the only one that’s really widely implemented in browsers is USB, so if you want to connect your laptop’s TPM to a browser it’s going to have to go over USB meaning you need a Linux USB gadget. Conspiracy theorists will obviously notice that if the main current connector is USB and FIDO requires new USB tokens because it’s a new standard then webauthn is a boon to token manufacturers.

How does Webauthn Work?

The protocol comes in two flavours, version 1 and version 2. Version 1 is fixed cryptography and version 2 is agile cryptography. However, version1 is simpler so that’s the one I’ll explain.

Webauthn essentially consists of two phases: a registration phase where the authenticator is tied to the account, which often happens when the remote account is created, and authentication where the authenticator is used to log in again to the website after registration. Obviously accounts often outlive the second factor, especially if it’s tied to a machine like the TPM, so the standard contemplates a single account having multiple registered authenticators.

The registration request consists of a random challenge supplied by the remote website to prevent replay and an application id which is constructed by the browser from the website supplied ID and the web origin of the site. The design is that the application ID should be unique for each remote account and not subject to being faked by the remote site to trick you into giving up some other application’s credentials.

The authenticator’s response consists of a unique public key, an opaque key handle, an attestation X.509 certificate containing a public key and a signature over the challenge, the application ID, the public key and the key handle using the private key of the certificate. The remote website can verify this signature against the certificate to verify registration. Additionally, Google recommends that the website also verifies the attestation certificate against a list of know device master certificates to prove it is talking to a genuine U2F authenticator. Since no-one is currently maintaining a database of “genuine” second factor master certificates, this last step mostly isn’t done today.

In version 1, the only key scheme allowed is Elliptic Curve over the NIST P-256 curve. This means that the public key is always 65 bytes long and an encrypted (or wrapped) form of the private key can be stashed inside the opaque key handle, which may be a maximum of 255 bytes. Since the key handle must be presented for each authentication attempt, it relieves the second factor from having to remember an ever increasing list of public/private key pairs because all it needs to do is unwrap the private key from the opaque handle and perform the signature and then forget the unwrapped private key. Note that this means per user account authenticator, the remote website must store the public key and the key handle, meaning about 300 bytes extra, but that’s peanuts compared to the amount of information remote websites usually store per registered account.

To perform an authentication the remote website presents a unique challenge, the raw ID from which the browser should construct the same application ID and the key handle. Ideally the authenticator should verify that the application ID matches the one used for registration (so it should be part of the wrapped key handle) and then perform a signature over the application ID, the challenge and a unique monotonically increasing counter number which is sent back in the response. To validly authenticate, the remote website verifies the signature is genuine and that the count has increased from the last time authentication has done (so it has to store the per authenticator 4 byte count as well). Any increase is fine, so each second factor only needs to maintain a single monotonically increasing counter to use for every registered site.

Problems with Webauthn and the TPM

The primary problem is the attestation certificate, which is actually an issue for the whole protocol. TPMs are actually designed to do attestation correctly, which means providing proof of being a genuine TPM without compromising the user’s privacy. The way they do this is via a somewhat complex attestation protocol involving a privacy CA. The problem they’re seeking to avoid is that if you present the same certificate every time you use the device for registration you can be tracked via that certificate and your privacy is compromised. The way the TPM gets around this is that you can use a privacy CA to produce an arbitrary number of different certificates for the same TPM and you could present a new one every time, thus leaving nothing to be tracked by.

The ability to track users by certificate has been a criticism levelled at FIDO and the best the alliance can come up with is the idea that perhaps you batch the attestation certificates, so the same certificate is used in hundreds of new keys.

The problem for TPMs though is that until FIDO devices use proper privacy CA based attestation, the best you can do is generate a separate self signed attestation certificate. The reason is that the TPM does contain its own certificate, but it’s encryption only, not signing because of the way the TPM privacy CA based attestation works. Thus, even if you were willing to give up your privacy you can’t use the TPM EK certificate as the FIDO attestation certificate. Plus, if Google carries out its threat to verify attestation certificates, this scheme is no longer going to work.

Aside about Browsers and CTAP

The crypto aware among you will recognise that there is already a library based standard that can be used to talk to a variety of USB tokens and even the TPM called PKCS#11. Mozilla Firefox, for instance, already supports using this as I demonstrated in a previous blog post. One might think, based on what I said about the one token per key problem in the introduction, that PKCS#11 can’t support the new key wrapping based aspect of FIDO but, in fact, it can via the C_WrapKey/C_UnwrapKey API. The only thing PKCS#11 can’t do is the new monotonic counter.

Even if PKCS#11 can’t perform all the necessary functions, what about a new or extended library based protocol? This is a good question to which I’ve been unable to get a satisfactory answer. Certainly doing CTAP correctly requires that your browser be able to speak directly to the USB, Bluetooth and NFC subsystems. Perhaps not too hard for a single platform browser like Internet Explorer, but fraught with platform complexity for generic browsers like FireFox where the only solution is to have a rust based accessor for every supported platform.

Certainly the lack of a library interface are where the TPM issues come from, because without that we have to plug the TPM based FIDO layer into a browser over an existing CTAP protocol it supports, i.e. USB. Fortunately Linux has the USB Gadget subsystem which fits the bill precisely.

Building Synthetic HID Devices with USB Gadget

Before you try this at home, I should point out that the Linux HID Gadget has a longstanding bug that will cause your machine to hang unless you have this patch applied. You have been warned!

The HID subsystem is for driving Human Interaction Devices meaning keyboard and mice. However, it has a simple packet (called report in USB speak) based protocol which is easy for most things to use. In order to facilitate this, Linux actually provides hidraw devices which allow you to send and receive these reports using read and write system calls (which, in fact, is how Firefox on Linux speaks CTAP). What the hid gadget does when set up is provide all the static emulation of HID device protocols (like discovery pages) while allowing you to send and receive the hidraw packets over the /dev/hidgX device tap, also via read and write (essentially operating like a tty/pty pair1). To get the whole thing running, the final piece of the puzzle is that the browser (most likely running as you) needs to be able to speak to the hidraw device, so you need a udev rule to make it accessible because by default they’re 0600. Since the same goes for every other USB security token, you’ll find the template in the same rpm that installs the PKCS#11 library for the token.

The way CTAP works is that every transaction is split into 64 byte reports and sent over the hidraw interface. All you need to do to get this setup is initialise a report descriptor for this type of device. Since it’s somewhat cumbersome to do, I’ve created this script to do it (run it as root). Once you have this, the hidraw and hidg devices will appear (make them both user accessible with chmod 666) and then all you need is a programme to drive the hidg device and you’re done.

A TPM Based Hid Gadget Driver

Note: this section is written describing TPM 2.0.

The first thing we need out of the TPM is a monotonic counter, but all TPMs have NV counter indexes which can be created (all TPM counters are 8 byte, whereas the CTAP protocol requires 4 bytes, but we simply chop off the top 4 bytes). By convention I create the counter at NV index 01000101. Once created, this counter will be persistent and monotonic for the lifetime of the TPM.

The next thing you need is an attestation certificate and key. These must be NIST P-256 based, but it’s easy to get openssl to create them

openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:prime256v1 -pkeyopt ec_param_enc:named_curve -out reg_key.key

openssl req -new -x509 -subj '/CN=My Fido Token/' -key reg_key.key -out reg_key.der -outform DER

This creates a self signed certificate, but you could also create a certificate chain this way.

Finally, we need the TPM to generate one NIST P-256 key pair per registration. Here we use the TPM2_Create() call which gets the TPM to create a random asymmetric key pair and return the public and wrapped private pieces. We can simply bundle these up and return them as the key handle (fortunately, what the TPM spits back for a NIST P-256 key is about 190 bytes when properly marshalled). When the remote end requests an authentication, we extract the TPM key from the key handle and use a TPM2_Load to place it in the TPM and sign the hash and then unload it from the TPM. Putting this all together this project (which is highly experimental) provides the script to create the devices and a hidg driver that interfaces to the TPM. All you need to do is run it as

hidgd /dev/hidg0 reg_key.der reg_key.key

And you’re good to go. If you want to test it there are plenty of public domain webauthn test sites, webauthn.org and webauthn.io2 are two I’ve tested as working.

TODO Items

The webauthn standard specifies the USB authenticator should ask for permission before performing either registration or authentication. Currently the TPM hid gadget doesn’t have any external verification, but in future I’ll add a configurable pinentry to add confirmation and possibly also a single password for verification.

The current code also does nothing to verify the application ID on a per authorization basis. This is a security problem because you are currently vulnerable to being spoofed by malicious websites who could hand you a snooped key handle and then use the signature to fake your login to a different site. To avoid this, I’m planning to use the policy area of the TPM key to hold the application ID. This should work because the generated keys have no authorization, either policy or password, so the policy area is effectively redundant. It is in the unwrapped public key, but if any part of the public key is tampered with the TPM will detect this via a hash in the wrapped private error and give a binding error on load.

The current code really only does version 1 of the FIDO protocol. Ideally it needs upgrading to version 2. However, there’s not really much point because for all the crypto agility, most TPMs on the market today can only do NIST P-256 curves, so you wouldn’t gain that much.

Conclusions

Using this scheme you’re ready to play with FIDO/U2F as long as you have a laptop with a functional TPM 2.0 and a working USB gadget subsystem. If you want to play, please remember to make sure you have the gadget patch applied.

Using TPM Based Client Certificates on Firefox and Apache

One of the useful features of Apache (or indeed any competent web server) is the ability to use client side certificates. All this means is that a certificate from each end of the TLS transaction is verified: the browser verifies the website certificate, but the website requires the client also to present one and verifies it. Using client certificates, when linked to your own client certificate CA gives web transactions the strength of two factor authentication if you do it on the login page. I use this feature quite a lot for all the admin features my own website does. With apache it’s really simple to turn on with the

SSLCACertificateFile

Directive which allows you to specify the CA for the accepted certificates. In my own setup I have my own self signed certificate as CA and then all the authority certificates use it as the issuer. You can turn Client Certificate verification on per location basis simply by doing

<Location /some/web/location>
SSLVerifyClient require
</Location

And Apache will take care of requesting the client certificate and verifying it against the CA. The only caveat here is that TLSv1.3 currently fails to work for this, so you have to disable it with

SSLProtocol -TLSv1.3

Client Certificates in Firefox

Firefox is somewhat hard to handle for SSL because it includes its own hand written mozilla secure sockets code, which has a toolkit quite unlike any other ssl toolkit1. In order to import a client certificate and key into firefox, you need to create a pkcs12 file containing them and import that into the “Your Certificates” box which is under Preferences > Privacy & Security > View Certificates

Obviously, simply supplying a key file to firefox presents security issues because you’d like to prevent a clever hacker from gaining access to it and thus running off with your client certificate. Firefox achieves a modicum of security by doing all key operations over the PKCS#11 API via a software token, which should mean that even malicious javascript cannot gain access to your key but merely the signing API

However, assuming you don’t quite trust this software separation, you need to store your client signing key in a secure vault like a TPM to make sure no web hacker can gain access to it. Various crypto system connectors, like the OpenSSL TPM2 and TPM2 engine, already exist but because Firefox uses its own crytographic code it can’t take advantage of them. In fact, the only external object the Firefox crypto code can use is a PKCS#11 module.

Aside about TPM2 and PKCS#11

The design of PKCS#11 is that it is a loadable library which can find and enumerate keys and certificates in some type of hardware device like a USB Key or a PCI attached HSM. However, since the connector is simply a library, nothing requires it connect to something physical and the OpenDNSSEC project actually produces a purely software based cryptographic token. In theory, then, it should be easy

The problems come with the PKCS#11 expectation of key residency: The library allows the consuming program to enumerate a list of slots each of which may, or may not, be occupied by a single token. Each token may contain one or more keys and certificates. Now the TPM does have a concept of a key resident in NV memory, which is directly analagous to the PKCS#11 concept of a token based key. The problems start with the TPM2 PC Client Profile which recommends this NV area be about 512 bytes, which is big enough for all of one key and thus not very scalable. In fact, the imagined use case of the TPM is with volatile keys which are demand loaded.

Demand loaded keys map very nicely to the OpenSSL idea of a key file, which is why OpenSSL TPM engines are very easy to understand and use, but they don’t map at all into the concept of token resident keys. The closest interface PKCS#11 has for handling key files is the provisioning calls, but even there they’re designed for placing keys inside tokens and, once provisioned, the keys are expected to be non-volatile. Worse still, very few PKCS#11 module consumers actually do provisioning, they mostly leave it up to a separate binary they expect the token producer to supply.

Even if the demand loading problem could be solved, the PKCS#11 API requires quite a bit of additional information about keys, like ids, serial numbers and labels that aren’t present in the standard OpenSSL key files and have to be supplied somehow.

Solving the Key File to PKCS#11 Mismatch

The solution seems reasonably simple: build a standard PKCS#11 library that is driven by a known configuration file. This configuration file can map keys to slots, as required by PKCS#11, and also supply all the missing information. the C_Login() operation is expected to supply a passphrase (or PIN in PKCS#11 speak) so that would be the point at which the private key could be loaded.

One of the interesting features of the above is that, while it could be implemented for the TPM engine only, it can also be implemented as a generic OpenSSL key exporter to PKCS#11 that happens also to take engine keys. That would mean it would work for non-engine keys as well as any engine that exists for OpenSSL … a nice little win.

Building an OpenSSL PKCS#11 Key Exporter

A Token can be built from a very simple ini like configuration file, with the global section setting global properties, like manufacurer id and library description and each individual section being used to instantiate a slot containing one key. We can make the slot name, the id and the label the same if not overridden and use key file directives to load the public and private keys. The serial number seems best constructed from a hash of the public key parameters (again, if not overridden). In order to support engine keys, the token library needs to know which engine to invoke, so I added an engine keyword to tell it.

With that, the mechanics of making the token library work with any OpenSSL key are set, the only thing is to plumb in the PKCS#11 glue API. At this point, I should add that the goal is simply to get keys and tokens working, not to replicate a full featured PKCS#11 API, so you shouldn’t use this as something to test against for a reference implementation (the softhsm2 token is much better for that). However, it should be functional enough to use for storing keys in Firefox (as well as other things, see below).

The current reasonably full featured source code is here, with a reference build using the OpenSUSE Build Service here. I should add that some of the build failures are due to problems with p11-kit and others due to the way Debian gets the wrong engine path for libp11.

At Last: Getting TPM Keys working with Firefox

A final problem with Firefox is that there seems to be no way to import a certificate file for which the private key is located on a token. The only way Firefox seems to support this is if the token contains both the private key and the certificate. At least this is my own project, so some coding later, the token now supports certificates as well.

The next problem is more mundane: generating the certificate and key. Obviously, the safest key is one which has never left the TPM, which means the certificate request needs to be built from it. I chose a CSR type that also includes my name and my machine name for later easy discrimination (and revocation if I ever lose my laptop). This is the sequence of commands for my machine called jarvis.

create_tpm2_key -a key.tpm
openssl req -subj "/CN=James Bottomley/UID=jarivs/" -new -engine tpm2 -keyform engine -key key.tpm -nodes -out jarvis.csr
openssl x509 -in jarvis.csr -req -CA my-ca.crt -engine tpm2 -CAkeyform engine -CAkey my-ca.key -days 3650 -out jarvis.crt

As you can see from the above, the key is first created by the TPM, then that key is used to create a certificate request where the common name is my name and the UID is the machine name (this is just my convention, feel free to use your own) and then finally it’s signed by my own CA, which you’ll notice is also based on a TPM key. Once I have this, I’m free to create an ini file to export it as a token to Firefox

manufacturer id = Firefox Client Cert
library description = Cert for hansen partnership
[mozilla-key]
certificate = /home/jejb/jarvis.crt
private key = /home/jejb/key.tpm
engine = tpm2

All I now need to do is load the PKCS#11 shared object library into Firefox using Settings > Privacy & Security > Security Devices > Load and I have a TPM based client certificate ready for use.

Additional Uses

It turns out once you have a generic PKCS#11 exporter for engine keys, there’s no end of uses for them. One of the most convenient has been using TPM2 keys with gnutls. Although gnutls was quick to adopt TPM 1.2 based keys, it’s been much slower with TPM2 but because gnutls already has a PKCS#11 interface using the p11 kit URI format, you can easily build a config file of all the TPM2 keys you want it to use and simply use them by URI in gnutls.

Unfortunately, this has also lead to some problems, the biggest one being Firefox: Firefox assumes, once you load a PKCS#11 module library, that you want it to use every single key it can find, which is fine until it pops up 10 dialogue boxes each time you start it, one for each key password, particularly if there’s only one key you actually care about it using. This problem doesn’t seem solvable in the Firefox token interface, so the eventual way I did it was to add the ability to specify the config file in the environment (variable OPENSSL_PKCS11_CONF) and modify my xfce Firefox action to set this in the environment pointing at a special configuration file with only Firefox’s key in it.

Conclusions and Future Work

Hopefully I’ve demonstrated this simple PKCS#11 converter can be useful both to keeping Firefox keys safe as well as uses in other things like gnutls. Unfortunately, it turns out that the world wide web is turning against PKCS#11 tokens as having usability problems and is moving on to something called FIDO2 tokens which have the web browser talking directly to the USB token. In my next technical post I hope to explain how you can use the Linux Kernel USB gadget system to connect a TPM up easily as a FIDO2 token so you can use the new passwordless webauthn protocol seamlessly.

Measuring the Horizontal Attack Profile of Nabla Containers

One of the biggest problems with the current debate about Container vs Hypervisor security is that no-one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors “feel” more secure than containers because of the interface breadth) but no-one actually has done a quantitative comparison.  The purpose of this blog post is to move the debate forwards by suggesting a quantitative methodology for measuring the Horizontal Attack Profile (HAP).  For more details about Attack Profiles, see this blog post.  I don’t expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well.

Well begin by looking at the Nabla technology through the relatively uncontroversial metric of performance.  In most security debates, it’s acceptable that some performance is lost by securing the application.  As a rule of thumb, placing an application in a hypervisor loses anywhere between 10-30% of the native performance.  Our goal here is to show that, for a variety of web tasks, the Nabla containers mechanism has an acceptable performance penalty.

Performance Measurements

We took some standard benchmarks: redis-bench-set, redis-bench-get, python-tornado and node-express and in the latter two we loaded up the web servers with simple external transactional clients.  We then performed the same test for docker, gVisor, Kata Containers (as our benchmark for hypervisor containment) and nabla.  In all the figures, higher is better (meaning more throughput):

The red Docker measure is included to show the benchmark.  As expected, the Kata Containers measure is around 10-30% down on the docker one in each case because of the hypervisor penalty.  However, in each case the Nabla performance is the same or higher than the Kata one, showing we pay less performance overhead for our security.  A final note is that since the benchmarks are network ones, there’s somewhat of a penalty paid by userspace networking stacks (which nabla necessarily has) for plugging into docker network, so we show two values, one for the bridging plug in (nabla-containers) required to orchestrate nabla with kubernetes and one as a direct connection (nabla-raw) showing where the performance would be without the network penalty.

One final note is that, as expected, gVisor sucks because ptrace is a really inefficient way of connecting the syscalls to the sandbox.  However, it is more surprising that gVisor-kvm (where the sandbox connects to the system calls of the container using hypercalls instead) is also pretty lacking in performance.  I speculate this is likely because hypercalls exact their own penalty and hypervisors usually try to minimise them, which using them to replace system calls really doesn’t do.

HAP Measurement Methodology

The Quantitative approach to measuring the Horizontal Attack Profile (HAP) says that we take the bug density of the Linux Kernel code  and multiply it by the amount of unique code traversed by the running system after it has reached a steady state (meaning that it doesn’t appear to be traversing any new kernel paths). For the sake of this method, we assume the bug density to be uniform and thus the HAP is approximated by the amount of code traversed in the steady state.  Measuring this for a running system is another matter entirely, but, fortunately, the kernel has a mechanism called ftrace which can be used to provide a trace of all of the functions called by a given userspace process and thus gives a reasonable approximation of the number of lines of code traversed (note this is an approximation because we measure the total number of lines in the function taking no account of internal code flow, primarily because ftrace doesn’t give that much detail).  Additionally, this methodology works very well for containers where all of the control flow emanates from a well known group of processes via the system call information, but it works less well for hypervisors where, in addition to the direct hypercall interface, you also have to add traces from the back end daemons (like the kvm vhost kernel threads or dom0 in the case of Xen).

HAP Results

The results are for the same set of tests as the performance ones except that this time we measure the amount of code traversed in the host kernel:

As stated in our methodology, the height of the bar should be directly proportional to the HAP where lower is obviously better.  On these results we can say that in all cases the Nabla runtime tender actually has a better HAP than the hypervisor contained Kata technology, meaning that we’ve achieved a container system with better HAP (i.e. more secure) than hypervisors.

Some of the other results in this set also bear discussing.  For instance the Docker result certainly isn’t 10x the Kata result as a naive analysis would suggest.  In fact, the containment provided by docker looks to be only marginally worse than that provided by the hypervisor.  Given all the hoopla about hypervisors being much more secure than containers this result looks surprising but you have to consider what’s going on: what we’re measuring in the docker case is the system call penetration of normal execution of the systems.  Clearly anything malicious could explode this result by exercising all sorts of system calls that the application doesn’t normally use.  However, this does show clearly that a docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.

The other surprising result is that, in spite of their claims to reduce the exposure to Linux System Calls, gVisor actually is either equivalent to the docker use case or, for the python tornado test, significantly worse than the docker case.  This too is explicable in terms of what’s going on under the covers: gVisor tries to improve containment by rewriting the Linux system call interface in Go.  However, no-one has paid any attention to the amount of system calls the Go runtime is actually using, which is what these results are really showing.  Thus, while current gVisor doesn’t currently achieve any containment improvement on this methodology, it’s not impossible to write a future version of the Go runtime that is much less profligate in the way it uses system calls by developing a Secure Go using the same methodology we used to develop Nabla.

Conclusions

On both tests, Nabla is far and away the best containment technology for secure workloads given that it sacrifices the least performance over docker to achieve the containment and, on the published results, is 2x more secure even than using hypervisor based containment.

Hopefully these results show that it is perfectly possible to have containers that are more secure than hypervisors and lays to rest, finally, the arguments about which is the more secure technology.  The next step, of course, is establishing the full extent of exposure to a malicious application and to do that, some type of fuzz testing needs to be employed.  Unfortunately, right at the moment, gVisor is simply crashing when subjected to fuzz testing, so it needs to become more robust before realistic measurements can be taken.

A New Method of Containment: IBM Nabla Containers

In the previous post about Containers and Cloud Security, I noted that most of the tenants of a Cloud Service Provider (CSP) could safely not worry about the Horizontal Attack Profile (HAP) and leave the CSP to manage the risk.  However, there is a small category of jobs (mostly in the financial and allied industries) where the damage done by a Horizontal Breach of the container cannot be adequately compensated by contractual remedies.  For these cases, a team at IBM research has been looking at ways of reducing the HAP with a view to making containers more secure than hypervisors.  For the impatient, the full open source release of the Nabla Containers technology is here and here, but for the more patient, let me explain what we did and why.  We’ll have a follow on post about the measurement methodology for the HAP and how we proved better containment than even hypervisor solutions.

The essence of the quest is a sandbox that emulates the interface between the runtime and the kernel (usually dubbed the syscall interface) with as little code as possible and a very narrow interface into the kernel itself.

The Basics: Looking for Better Containment

The HAP attack worry with standard containers is shown on the left: that a malicious application can breach the containment wall and attack an innocent application.  This attack is thought to be facilitated by the breadth of the syscall interface in standard containers so the guiding star in developing Nabla Containers was a methodology for measuring the reduction in the HAP (and hence the improvement in containment), but the initial impetus came from the observation that unikernel systems are nicely modular in the libOS approach, can be used to emulate systemcalls and, thanks to rumprun, have a wide set of support for modern web friendly languages (like python, node.js and go) with a fairly thin glue layer.  Additionally they have a fairly narrow set of hypercalls that are actually used in practice (meaning they can be made more secure than conventional hypervisors).  Code coverage measurements of standard unikernel based kvm images confirmed that they did indeed use a far narrower interface.

Replacing the Hypervisor Interface

One of the main elements of the hypervisor interface is the transition from a less privileged guest kernel to a more privileged host one via hypercalls and vmexits.  These CPU mediated events are actually quite expensive, certainly a lot more expensive than a simple system call, which merely involves changing address space and privilege level.  It turns out that the unikernel based kvm interface is really only nine hypercalls, all of which are capable of being rewritten as syscalls, so the approach to running this new sandbox as a container is to do this rewrite and seccomp restrict the interface to being only what the rewritten unikernel runtime actually needs (meaning that the seccomp profile is now CSP enforced).  This vision, by the way, of a broad runtime above being mediated to a narrow interface is where the name Nabla comes from: The symbol for Nabla is an inverted triangle (∇) which is broad at the top and narrows to a point at the base.

Using this formulation means that the nabla runtime (or nabla tender) can be run as a single process within a standard container and the narrowness of the interface to the host kernel prevents most of the attacks that a malicious application would be able to perform.

DevOps and the ParaVirt conundrum

Back at the dawn of virtualization, there were arguments between Xen and VMware over whether a hypervisor should be fully virtual (capable of running any system supported by the virtual hardware description) or paravirtual (the system had to be modified to run on the virtualization system and thus would be incapable of running on physical hardware).  Today, thanks in a large part to CPU support for virtualization primtives, fully paravirtual systems have long since gone the way of the dodo and everyone nowadays expects any OS running on a hypervisor to be capable of running on physical hardware1.  The death of paravirt also left the industry with an aversion to ever reviving it, which explains why most sandbox containment systems (gVisor, Kata) try to require no modifications to the image.

With DevOps, the requirement is that images be immutable and that to change an image you must take it through the full develop build, test, deploy cycle.  This development centric view means that, provided there’s no impact to the images you use as the basis for your development, you can easily craft your final image to suit the deployment environment, which means a step like linking with the nabla tender is very easy.  Essentially, this comes down to whether you take the Dev (we can rebuild to suit the environment) or the Ops (the deployment environment needs to accept arbitrary images) view.  However, most solutions take the Ops view because of the anti-paravirt bias.  For the Nabla tender, we take the Dev view, which is born out by the performance figures.

Conclusion

Like most sandbox models, the Nabla containers approach is an alternative to namespacing for containment, but it still requires cgroups for resource management.  The figures show that the containment HAP is actually better than that achieved with a hypervisor and the performance, while being marginally less than a namespaced container, is greater than that obtained by running a container inside a hypervisor.  Thus we conclude that for tenants who have a real need for HAP reduction, this is a viable technology.