Category Archives: Security

Webauthn in Linux with a TPM via the HID gadget

Account security on the modern web is a bit of a nightmare. Everyone understands the need for strong passwords which are different for each account, but managing them is problematic because the human mind just can’t remember hundreds of complete gibberish words so everyone uses a password manager (which, lets admit it, for a lot of people is to write it down). A solution to this problem has long been something called two factor authentication (2FA) which authenticates you by something you know (like a password) and something you posses (like a TPM or a USB token). The problem has always been that you ideally need a different 2FA for each website, so that a compromise of one website doesn’t lead to the compromise of all your accounts.

Enter webauthn. This is designed as a 2FA protocol that uses public key cryptography instead of shared secrets and also uses a different public/private key pair for each website. Thus aspiring to be a passwordless secure scalable 2FA system for the web. However, the webauthn standard only specifies how the protocol works when the browser communicates with the remote website, there’s a different standard called FIDO or U2F that specifies how the browser communicates with the second factor (called an authenticator in FIDO speak) and how that second factor works.

It turns out that the FIDO standards do specify a TPM as one possible backend, so what, you might ask does this have to do with the Linux Gadget subsystem? The answer, it turns out, is that although the standards do recommend a TPM as the second factor, they don’t specify how to connect to one. The only connection protocols in the Client To Authenticator Protocol (CTAP) specifications are USB, BLE and NFC. And, in fact, the only one that’s really widely implemented in browsers is USB, so if you want to connect your laptop’s TPM to a browser it’s going to have to go over USB meaning you need a Linux USB gadget. Conspiracy theorists will obviously notice that if the main current connector is USB and FIDO requires new USB tokens because it’s a new standard then webauthn is a boon to token manufacturers.

How does Webauthn Work?

The protocol comes in two flavours, version 1 and version 2. Version 1 is fixed cryptography and version 2 is agile cryptography. However, version1 is simpler so that’s the one I’ll explain.

Webauthn essentially consists of two phases: a registration phase where the authenticator is tied to the account, which often happens when the remote account is created, and authentication where the authenticator is used to log in again to the website after registration. Obviously accounts often outlive the second factor, especially if it’s tied to a machine like the TPM, so the standard contemplates a single account having multiple registered authenticators.

The registration request consists of a random challenge supplied by the remote website to prevent replay and an application id which is constructed by the browser from the website supplied ID and the web origin of the site. The design is that the application ID should be unique for each remote account and not subject to being faked by the remote site to trick you into giving up some other application’s credentials.

The authenticator’s response consists of a unique public key, an opaque key handle, an attestation X.509 certificate containing a public key and a signature over the challenge, the application ID, the public key and the key handle using the private key of the certificate. The remote website can verify this signature against the certificate to verify registration. Additionally, Google recommends that the website also verifies the attestation certificate against a list of know device master certificates to prove it is talking to a genuine U2F authenticator. Since no-one is currently maintaining a database of “genuine” second factor master certificates, this last step mostly isn’t done today.

In version 1, the only key scheme allowed is Elliptic Curve over the NIST P-256 curve. This means that the public key is always 65 bytes long and an encrypted (or wrapped) form of the private key can be stashed inside the opaque key handle, which may be a maximum of 255 bytes. Since the key handle must be presented for each authentication attempt, it relieves the second factor from having to remember an ever increasing list of public/private key pairs because all it needs to do is unwrap the private key from the opaque handle and perform the signature and then forget the unwrapped private key. Note that this means per user account authenticator, the remote website must store the public key and the key handle, meaning about 300 bytes extra, but that’s peanuts compared to the amount of information remote websites usually store per registered account.

To perform an authentication the remote website presents a unique challenge, the raw ID from which the browser should construct the same application ID and the key handle. Ideally the authenticator should verify that the application ID matches the one used for registration (so it should be part of the wrapped key handle) and then perform a signature over the application ID, the challenge and a unique monotonically increasing counter number which is sent back in the response. To validly authenticate, the remote website verifies the signature is genuine and that the count has increased from the last time authentication has done (so it has to store the per authenticator 4 byte count as well). Any increase is fine, so each second factor only needs to maintain a single monotonically increasing counter to use for every registered site.

Problems with Webauthn and the TPM

The primary problem is the attestation certificate, which is actually an issue for the whole protocol. TPMs are actually designed to do attestation correctly, which means providing proof of being a genuine TPM without compromising the user’s privacy. The way they do this is via a somewhat complex attestation protocol involving a privacy CA. The problem they’re seeking to avoid is that if you present the same certificate every time you use the device for registration you can be tracked via that certificate and your privacy is compromised. The way the TPM gets around this is that you can use a privacy CA to produce an arbitrary number of different certificates for the same TPM and you could present a new one every time, thus leaving nothing to be tracked by.

The ability to track users by certificate has been a criticism levelled at FIDO and the best the alliance can come up with is the idea that perhaps you batch the attestation certificates, so the same certificate is used in hundreds of new keys.

The problem for TPMs though is that until FIDO devices use proper privacy CA based attestation, the best you can do is generate a separate self signed attestation certificate. The reason is that the TPM does contain its own certificate, but it’s encryption only, not signing because of the way the TPM privacy CA based attestation works. Thus, even if you were willing to give up your privacy you can’t use the TPM EK certificate as the FIDO attestation certificate. Plus, if Google carries out its threat to verify attestation certificates, this scheme is no longer going to work.

Aside about Browsers and CTAP

The crypto aware among you will recognise that there is already a library based standard that can be used to talk to a variety of USB tokens and even the TPM called PKCS#11. Mozilla Firefox, for instance, already supports using this as I demonstrated in a previous blog post. One might think, based on what I said about the one token per key problem in the introduction, that PKCS#11 can’t support the new key wrapping based aspect of FIDO but, in fact, it can via the C_WrapKey/C_UnwrapKey API. The only thing PKCS#11 can’t do is the new monotonic counter.

Even if PKCS#11 can’t perform all the necessary functions, what about a new or extended library based protocol? This is a good question to which I’ve been unable to get a satisfactory answer. Certainly doing CTAP correctly requires that your browser be able to speak directly to the USB, Bluetooth and NFC subsystems. Perhaps not too hard for a single platform browser like Internet Explorer, but fraught with platform complexity for generic browsers like FireFox where the only solution is to have a rust based accessor for every supported platform.

Certainly the lack of a library interface are where the TPM issues come from, because without that we have to plug the TPM based FIDO layer into a browser over an existing CTAP protocol it supports, i.e. USB. Fortunately Linux has the USB Gadget subsystem which fits the bill precisely.

Building Synthetic HID Devices with USB Gadget

Before you try this at home, I should point out that the Linux HID Gadget has a longstanding bug that will cause your machine to hang unless you have this patch applied. You have been warned!

The HID subsystem is for driving Human Interaction Devices meaning keyboard and mice. However, it has a simple packet (called report in USB speak) based protocol which is easy for most things to use. In order to facilitate this, Linux actually provides hidraw devices which allow you to send and receive these reports using read and write system calls (which, in fact, is how Firefox on Linux speaks CTAP). What the hid gadget does when set up is provide all the static emulation of HID device protocols (like discovery pages) while allowing you to send and receive the hidraw packets over the /dev/hidgX device tap, also via read and write (essentially operating like a tty/pty pair1). To get the whole thing running, the final piece of the puzzle is that the browser (most likely running as you) needs to be able to speak to the hidraw device, so you need a udev rule to make it accessible because by default they’re 0600. Since the same goes for every other USB security token, you’ll find the template in the same rpm that installs the PKCS#11 library for the token.

The way CTAP works is that every transaction is split into 64 byte reports and sent over the hidraw interface. All you need to do to get this setup is initialise a report descriptor for this type of device. Since it’s somewhat cumbersome to do, I’ve created this script to do it (run it as root). Once you have this, the hidraw and hidg devices will appear (make them both user accessible with chmod 666) and then all you need is a programme to drive the hidg device and you’re done.

A TPM Based Hid Gadget Driver

Note: this section is written describing TPM 2.0.

The first thing we need out of the TPM is a monotonic counter, but all TPMs have NV counter indexes which can be created (all TPM counters are 8 byte, whereas the CTAP protocol requires 4 bytes, but we simply chop off the top 4 bytes). By convention I create the counter at NV index 01000101. Once created, this counter will be persistent and monotonic for the lifetime of the TPM.

The next thing you need is an attestation certificate and key. These must be NIST P-256 based, but it’s easy to get openssl to create them

openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:prime256v1 -pkeyopt ec_param_enc:named_curve -out reg_key.key

openssl req -new -x509 -subj '/CN=My Fido Token/' -key reg_key.key -out reg_key.der -outform DER

This creates a self signed certificate, but you could also create a certificate chain this way.

Finally, we need the TPM to generate one NIST P-256 key pair per registration. Here we use the TPM2_Create() call which gets the TPM to create a random asymmetric key pair and return the public and wrapped private pieces. We can simply bundle these up and return them as the key handle (fortunately, what the TPM spits back for a NIST P-256 key is about 190 bytes when properly marshalled). When the remote end requests an authentication, we extract the TPM key from the key handle and use a TPM2_Load to place it in the TPM and sign the hash and then unload it from the TPM. Putting this all together this project (which is highly experimental) provides the script to create the devices and a hidg driver that interfaces to the TPM. All you need to do is run it as

hidgd /dev/hidg0 reg_key.der reg_key.key

And you’re good to go. If you want to test it there are plenty of public domain webauthn test sites, webauthn.org and webauthn.io2 are two I’ve tested as working.

TODO Items

The webauthn standard specifies the USB authenticator should ask for permission before performing either registration or authentication. Currently the TPM hid gadget doesn’t have any external verification, but in future I’ll add a configurable pinentry to add confirmation and possibly also a single password for verification.

The current code also does nothing to verify the application ID on a per authorization basis. This is a security problem because you are currently vulnerable to being spoofed by malicious websites who could hand you a snooped key handle and then use the signature to fake your login to a different site. To avoid this, I’m planning to use the policy area of the TPM key to hold the application ID. This should work because the generated keys have no authorization, either policy or password, so the policy area is effectively redundant. It is in the unwrapped public key, but if any part of the public key is tampered with the TPM will detect this via a hash in the wrapped private error and give a binding error on load.

The current code really only does version 1 of the FIDO protocol. Ideally it needs upgrading to version 2. However, there’s not really much point because for all the crypto agility, most TPMs on the market today can only do NIST P-256 curves, so you wouldn’t gain that much.

Conclusions

Using this scheme you’re ready to play with FIDO/U2F as long as you have a laptop with a functional TPM 2.0 and a working USB gadget subsystem. If you want to play, please remember to make sure you have the gadget patch applied.

Using TPM Based Client Certificates on Firefox and Apache

One of the useful features of Apache (or indeed any competent web server) is the ability to use client side certificates. All this means is that a certificate from each end of the TLS transaction is verified: the browser verifies the website certificate, but the website requires the client also to present one and verifies it. Using client certificates, when linked to your own client certificate CA gives web transactions the strength of two factor authentication if you do it on the login page. I use this feature quite a lot for all the admin features my own website does. With apache it’s really simple to turn on with the

SSLCACertificateFile

Directive which allows you to specify the CA for the accepted certificates. In my own setup I have my own self signed certificate as CA and then all the authority certificates use it as the issuer. You can turn Client Certificate verification on per location basis simply by doing

<Location /some/web/location>
SSLVerifyClient require
</Location

And Apache will take care of requesting the client certificate and verifying it against the CA. The only caveat here is that TLSv1.3 currently fails to work for this, so you have to disable it with

SSLProtocol -TLSv1.3

Client Certificates in Firefox

Firefox is somewhat hard to handle for SSL because it includes its own hand written mozilla secure sockets code, which has a toolkit quite unlike any other ssl toolkit1. In order to import a client certificate and key into firefox, you need to create a pkcs12 file containing them and import that into the “Your Certificates” box which is under Preferences > Privacy & Security > View Certificates

Obviously, simply supplying a key file to firefox presents security issues because you’d like to prevent a clever hacker from gaining access to it and thus running off with your client certificate. Firefox achieves a modicum of security by doing all key operations over the PKCS#11 API via a software token, which should mean that even malicious javascript cannot gain access to your key but merely the signing API

However, assuming you don’t quite trust this software separation, you need to store your client signing key in a secure vault like a TPM to make sure no web hacker can gain access to it. Various crypto system connectors, like the OpenSSL TPM2 and TPM2 engine, already exist but because Firefox uses its own crytographic code it can’t take advantage of them. In fact, the only external object the Firefox crypto code can use is a PKCS#11 module.

Aside about TPM2 and PKCS#11

The design of PKCS#11 is that it is a loadable library which can find and enumerate keys and certificates in some type of hardware device like a USB Key or a PCI attached HSM. However, since the connector is simply a library, nothing requires it connect to something physical and the OpenDNSSEC project actually produces a purely software based cryptographic token. In theory, then, it should be easy

The problems come with the PKCS#11 expectation of key residency: The library allows the consuming program to enumerate a list of slots each of which may, or may not, be occupied by a single token. Each token may contain one or more keys and certificates. Now the TPM does have a concept of a key resident in NV memory, which is directly analagous to the PKCS#11 concept of a token based key. The problems start with the TPM2 PC Client Profile which recommends this NV area be about 512 bytes, which is big enough for all of one key and thus not very scalable. In fact, the imagined use case of the TPM is with volatile keys which are demand loaded.

Demand loaded keys map very nicely to the OpenSSL idea of a key file, which is why OpenSSL TPM engines are very easy to understand and use, but they don’t map at all into the concept of token resident keys. The closest interface PKCS#11 has for handling key files is the provisioning calls, but even there they’re designed for placing keys inside tokens and, once provisioned, the keys are expected to be non-volatile. Worse still, very few PKCS#11 module consumers actually do provisioning, they mostly leave it up to a separate binary they expect the token producer to supply.

Even if the demand loading problem could be solved, the PKCS#11 API requires quite a bit of additional information about keys, like ids, serial numbers and labels that aren’t present in the standard OpenSSL key files and have to be supplied somehow.

Solving the Key File to PKCS#11 Mismatch

The solution seems reasonably simple: build a standard PKCS#11 library that is driven by a known configuration file. This configuration file can map keys to slots, as required by PKCS#11, and also supply all the missing information. the C_Login() operation is expected to supply a passphrase (or PIN in PKCS#11 speak) so that would be the point at which the private key could be loaded.

One of the interesting features of the above is that, while it could be implemented for the TPM engine only, it can also be implemented as a generic OpenSSL key exporter to PKCS#11 that happens also to take engine keys. That would mean it would work for non-engine keys as well as any engine that exists for OpenSSL … a nice little win.

Building an OpenSSL PKCS#11 Key Exporter

A Token can be built from a very simple ini like configuration file, with the global section setting global properties, like manufacurer id and library description and each individual section being used to instantiate a slot containing one key. We can make the slot name, the id and the label the same if not overridden and use key file directives to load the public and private keys. The serial number seems best constructed from a hash of the public key parameters (again, if not overridden). In order to support engine keys, the token library needs to know which engine to invoke, so I added an engine keyword to tell it.

With that, the mechanics of making the token library work with any OpenSSL key are set, the only thing is to plumb in the PKCS#11 glue API. At this point, I should add that the goal is simply to get keys and tokens working, not to replicate a full featured PKCS#11 API, so you shouldn’t use this as something to test against for a reference implementation (the softhsm2 token is much better for that). However, it should be functional enough to use for storing keys in Firefox (as well as other things, see below).

The current reasonably full featured source code is here, with a reference build using the OpenSUSE Build Service here. I should add that some of the build failures are due to problems with p11-kit and others due to the way Debian gets the wrong engine path for libp11.

At Last: Getting TPM Keys working with Firefox

A final problem with Firefox is that there seems to be no way to import a certificate file for which the private key is located on a token. The only way Firefox seems to support this is if the token contains both the private key and the certificate. At least this is my own project, so some coding later, the token now supports certificates as well.

The next problem is more mundane: generating the certificate and key. Obviously, the safest key is one which has never left the TPM, which means the certificate request needs to be built from it. I chose a CSR type that also includes my name and my machine name for later easy discrimination (and revocation if I ever lose my laptop). This is the sequence of commands for my machine called jarvis.

create_tpm2_key -a key.tpm
openssl req -subj "/CN=James Bottomley/UID=jarivs/" -new -engine tpm2 -keyform engine -key key.tpm -nodes -out jarvis.csr
openssl x509 -in jarvis.csr -req -CA my-ca.crt -engine tpm2 -CAkeyform engine -CAkey my-ca.key -days 3650 -out jarvis.crt

As you can see from the above, the key is first created by the TPM, then that key is used to create a certificate request where the common name is my name and the UID is the machine name (this is just my convention, feel free to use your own) and then finally it’s signed by my own CA, which you’ll notice is also based on a TPM key. Once I have this, I’m free to create an ini file to export it as a token to Firefox

manufacturer id = Firefox Client Cert
library description = Cert for hansen partnership
[mozilla-key]
certificate = /home/jejb/jarvis.crt
private key = /home/jejb/key.tpm
engine = tpm2

All I now need to do is load the PKCS#11 shared object library into Firefox using Settings > Privacy & Security > Security Devices > Load and I have a TPM based client certificate ready for use.

Additional Uses

It turns out once you have a generic PKCS#11 exporter for engine keys, there’s no end of uses for them. One of the most convenient has been using TPM2 keys with gnutls. Although gnutls was quick to adopt TPM 1.2 based keys, it’s been much slower with TPM2 but because gnutls already has a PKCS#11 interface using the p11 kit URI format, you can easily build a config file of all the TPM2 keys you want it to use and simply use them by URI in gnutls.

Unfortunately, this has also lead to some problems, the biggest one being Firefox: Firefox assumes, once you load a PKCS#11 module library, that you want it to use every single key it can find, which is fine until it pops up 10 dialogue boxes each time you start it, one for each key password, particularly if there’s only one key you actually care about it using. This problem doesn’t seem solvable in the Firefox token interface, so the eventual way I did it was to add the ability to specify the config file in the environment (variable OPENSSL_PKCS11_CONF) and modify my xfce Firefox action to set this in the environment pointing at a special configuration file with only Firefox’s key in it.

Conclusions and Future Work

Hopefully I’ve demonstrated this simple PKCS#11 converter can be useful both to keeping Firefox keys safe as well as uses in other things like gnutls. Unfortunately, it turns out that the world wide web is turning against PKCS#11 tokens as having usability problems and is moving on to something called FIDO2 tokens which have the web browser talking directly to the USB token. In my next technical post I hope to explain how you can use the Linux Kernel USB gadget system to connect a TPM up easily as a FIDO2 token so you can use the new passwordless webauthn protocol seamlessly.

Measuring the Horizontal Attack Profile of Nabla Containers

One of the biggest problems with the current debate about Container vs Hypervisor security is that no-one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors “feel” more secure than containers because of the interface breadth) but no-one actually has done a quantitative comparison.  The purpose of this blog post is to move the debate forwards by suggesting a quantitative methodology for measuring the Horizontal Attack Profile (HAP).  For more details about Attack Profiles, see this blog post.  I don’t expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well.

Well begin by looking at the Nabla technology through the relatively uncontroversial metric of performance.  In most security debates, it’s acceptable that some performance is lost by securing the application.  As a rule of thumb, placing an application in a hypervisor loses anywhere between 10-30% of the native performance.  Our goal here is to show that, for a variety of web tasks, the Nabla containers mechanism has an acceptable performance penalty.

Performance Measurements

We took some standard benchmarks: redis-bench-set, redis-bench-get, python-tornado and node-express and in the latter two we loaded up the web servers with simple external transactional clients.  We then performed the same test for docker, gVisor, Kata Containers (as our benchmark for hypervisor containment) and nabla.  In all the figures, higher is better (meaning more throughput):

The red Docker measure is included to show the benchmark.  As expected, the Kata Containers measure is around 10-30% down on the docker one in each case because of the hypervisor penalty.  However, in each case the Nabla performance is the same or higher than the Kata one, showing we pay less performance overhead for our security.  A final note is that since the benchmarks are network ones, there’s somewhat of a penalty paid by userspace networking stacks (which nabla necessarily has) for plugging into docker network, so we show two values, one for the bridging plug in (nabla-containers) required to orchestrate nabla with kubernetes and one as a direct connection (nabla-raw) showing where the performance would be without the network penalty.

One final note is that, as expected, gVisor sucks because ptrace is a really inefficient way of connecting the syscalls to the sandbox.  However, it is more surprising that gVisor-kvm (where the sandbox connects to the system calls of the container using hypercalls instead) is also pretty lacking in performance.  I speculate this is likely because hypercalls exact their own penalty and hypervisors usually try to minimise them, which using them to replace system calls really doesn’t do.

HAP Measurement Methodology

The Quantitative approach to measuring the Horizontal Attack Profile (HAP) says that we take the bug density of the Linux Kernel code  and multiply it by the amount of unique code traversed by the running system after it has reached a steady state (meaning that it doesn’t appear to be traversing any new kernel paths). For the sake of this method, we assume the bug density to be uniform and thus the HAP is approximated by the amount of code traversed in the steady state.  Measuring this for a running system is another matter entirely, but, fortunately, the kernel has a mechanism called ftrace which can be used to provide a trace of all of the functions called by a given userspace process and thus gives a reasonable approximation of the number of lines of code traversed (note this is an approximation because we measure the total number of lines in the function taking no account of internal code flow, primarily because ftrace doesn’t give that much detail).  Additionally, this methodology works very well for containers where all of the control flow emanates from a well known group of processes via the system call information, but it works less well for hypervisors where, in addition to the direct hypercall interface, you also have to add traces from the back end daemons (like the kvm vhost kernel threads or dom0 in the case of Xen).

HAP Results

The results are for the same set of tests as the performance ones except that this time we measure the amount of code traversed in the host kernel:

As stated in our methodology, the height of the bar should be directly proportional to the HAP where lower is obviously better.  On these results we can say that in all cases the Nabla runtime tender actually has a better HAP than the hypervisor contained Kata technology, meaning that we’ve achieved a container system with better HAP (i.e. more secure) than hypervisors.

Some of the other results in this set also bear discussing.  For instance the Docker result certainly isn’t 10x the Kata result as a naive analysis would suggest.  In fact, the containment provided by docker looks to be only marginally worse than that provided by the hypervisor.  Given all the hoopla about hypervisors being much more secure than containers this result looks surprising but you have to consider what’s going on: what we’re measuring in the docker case is the system call penetration of normal execution of the systems.  Clearly anything malicious could explode this result by exercising all sorts of system calls that the application doesn’t normally use.  However, this does show clearly that a docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.

The other surprising result is that, in spite of their claims to reduce the exposure to Linux System Calls, gVisor actually is either equivalent to the docker use case or, for the python tornado test, significantly worse than the docker case.  This too is explicable in terms of what’s going on under the covers: gVisor tries to improve containment by rewriting the Linux system call interface in Go.  However, no-one has paid any attention to the amount of system calls the Go runtime is actually using, which is what these results are really showing.  Thus, while current gVisor doesn’t currently achieve any containment improvement on this methodology, it’s not impossible to write a future version of the Go runtime that is much less profligate in the way it uses system calls by developing a Secure Go using the same methodology we used to develop Nabla.

Conclusions

On both tests, Nabla is far and away the best containment technology for secure workloads given that it sacrifices the least performance over docker to achieve the containment and, on the published results, is 2x more secure even than using hypervisor based containment.

Hopefully these results show that it is perfectly possible to have containers that are more secure than hypervisors and lays to rest, finally, the arguments about which is the more secure technology.  The next step, of course, is establishing the full extent of exposure to a malicious application and to do that, some type of fuzz testing needs to be employed.  Unfortunately, right at the moment, gVisor is simply crashing when subjected to fuzz testing, so it needs to become more robust before realistic measurements can be taken.

A New Method of Containment: IBM Nabla Containers

In the previous post about Containers and Cloud Security, I noted that most of the tenants of a Cloud Service Provider (CSP) could safely not worry about the Horizontal Attack Profile (HAP) and leave the CSP to manage the risk.  However, there is a small category of jobs (mostly in the financial and allied industries) where the damage done by a Horizontal Breach of the container cannot be adequately compensated by contractual remedies.  For these cases, a team at IBM research has been looking at ways of reducing the HAP with a view to making containers more secure than hypervisors.  For the impatient, the full open source release of the Nabla Containers technology is here and here, but for the more patient, let me explain what we did and why.  We’ll have a follow on post about the measurement methodology for the HAP and how we proved better containment than even hypervisor solutions.

The essence of the quest is a sandbox that emulates the interface between the runtime and the kernel (usually dubbed the syscall interface) with as little code as possible and a very narrow interface into the kernel itself.

The Basics: Looking for Better Containment

The HAP attack worry with standard containers is shown on the left: that a malicious application can breach the containment wall and attack an innocent application.  This attack is thought to be facilitated by the breadth of the syscall interface in standard containers so the guiding star in developing Nabla Containers was a methodology for measuring the reduction in the HAP (and hence the improvement in containment), but the initial impetus came from the observation that unikernel systems are nicely modular in the libOS approach, can be used to emulate systemcalls and, thanks to rumprun, have a wide set of support for modern web friendly languages (like python, node.js and go) with a fairly thin glue layer.  Additionally they have a fairly narrow set of hypercalls that are actually used in practice (meaning they can be made more secure than conventional hypervisors).  Code coverage measurements of standard unikernel based kvm images confirmed that they did indeed use a far narrower interface.

Replacing the Hypervisor Interface

One of the main elements of the hypervisor interface is the transition from a less privileged guest kernel to a more privileged host one via hypercalls and vmexits.  These CPU mediated events are actually quite expensive, certainly a lot more expensive than a simple system call, which merely involves changing address space and privilege level.  It turns out that the unikernel based kvm interface is really only nine hypercalls, all of which are capable of being rewritten as syscalls, so the approach to running this new sandbox as a container is to do this rewrite and seccomp restrict the interface to being only what the rewritten unikernel runtime actually needs (meaning that the seccomp profile is now CSP enforced).  This vision, by the way, of a broad runtime above being mediated to a narrow interface is where the name Nabla comes from: The symbol for Nabla is an inverted triangle (∇) which is broad at the top and narrows to a point at the base.

Using this formulation means that the nabla runtime (or nabla tender) can be run as a single process within a standard container and the narrowness of the interface to the host kernel prevents most of the attacks that a malicious application would be able to perform.

DevOps and the ParaVirt conundrum

Back at the dawn of virtualization, there were arguments between Xen and VMware over whether a hypervisor should be fully virtual (capable of running any system supported by the virtual hardware description) or paravirtual (the system had to be modified to run on the virtualization system and thus would be incapable of running on physical hardware).  Today, thanks in a large part to CPU support for virtualization primtives, fully paravirtual systems have long since gone the way of the dodo and everyone nowadays expects any OS running on a hypervisor to be capable of running on physical hardware1.  The death of paravirt also left the industry with an aversion to ever reviving it, which explains why most sandbox containment systems (gVisor, Kata) try to require no modifications to the image.

With DevOps, the requirement is that images be immutable and that to change an image you must take it through the full develop build, test, deploy cycle.  This development centric view means that, provided there’s no impact to the images you use as the basis for your development, you can easily craft your final image to suit the deployment environment, which means a step like linking with the nabla tender is very easy.  Essentially, this comes down to whether you take the Dev (we can rebuild to suit the environment) or the Ops (the deployment environment needs to accept arbitrary images) view.  However, most solutions take the Ops view because of the anti-paravirt bias.  For the Nabla tender, we take the Dev view, which is born out by the performance figures.

Conclusion

Like most sandbox models, the Nabla containers approach is an alternative to namespacing for containment, but it still requires cgroups for resource management.  The figures show that the containment HAP is actually better than that achieved with a hypervisor and the performance, while being marginally less than a namespaced container, is greater than that obtained by running a container inside a hypervisor.  Thus we conclude that for tenants who have a real need for HAP reduction, this is a viable technology.

Containers and Cloud Security

Introduction

The idea behind this blog post is to take a new look at how cloud security is measured and what its impact is on the various actors in the cloud ecosystem.  From the measurement point of view, we look at the vertical stack: all code that is traversed to provide a service all the way from input web request to database update to output response potentially contains bugs; the bug density is variable for the different components but the more code you traverse the higher your chance of exposure to exploitable vulnerabilities.  We’ll call this the Vertical Attack Profile (VAP) of the stack.  However, even this axis is too narrow because the primary actors are the cloud tenant and the cloud service provider (CSP).  In an IaaS cloud, part of the vertical profile belongs to the tenant (The guest kernel, guest OS and application) and part (the hypervisor and host OS) belong to the CSP.  However, the CSP vertical has the additional problem that any exploit in this piece of the stack can be used to jump into either the host itself or any of the other tenant virtual machines running on the host.  We’ll call this exploit causing a failure of containment the Horizontal Attack Profile (HAP).  We should also note that any Horizontal Security failure is a potentially business destroying event for the CSP, so they care deeply about preventing them.  Conversely any exploit occurring in the VAP owned by the Tenant can be seen by the CSP as a tenant only problem and one which the Tenant is responsible for locating and fixing.  We correlate size of profile with attack risk, so the large the profile the greater the probability of being exploited.

From the Tenant point of view, improving security can be done in one of two ways, the first (and mostly aspirational) is to improve the security and monitoring of the part of the Vertical the Tenant is responsible for and the second is to shift responsibility to the CSP, so make the CSP responsible for more of the Vertical.  Additionally, for most Tenants, a Horizontal failure mostly just means they lose trust in the CSP, unless the Tenant is trusting the CSP with sensitive data which can be exfiltrated by the Horizontal exploit.  In this latter case, the Tenant still cannot do anything to protect the CSP part of the Security Profile, so it’s mostly a contractual problem: SLAs and penalties for SLA failures.

Examples

To see how these interpretations apply to the various cloud environments, lets look at some of the Cloud (and pre-Cloud) models:

Physical Infrastructure

The left hand diagram shows a standard IaaS rented physical system.  Since the Tenant rents the hardware it is shown as red indicating CSP ownership and the the two Tenants are shown in green and yellow.  In this model, barring attacks from the actual hardware, the Tenant owns the entirety of the VAP.  The nice thing for the CSP is that hardware provides air gap security, so there is no HAP which means it is incredibly secure.

However, there is another (much older) model shown on the right, called the shared login model,  where the Tenant only rents a login on the physical system.  In this model, only the application belongs to the Tenant, so the CSP is responsible for much of the VAP (the expanded red area).  Here the total VAP is the same, but the Tenant’s VAP is much smaller: the CSP is responsible for maintaining and securing everything apart from the application.  From the Tenant point of view this is a much more secure system since they’re responsible for much less of the security.  From the CSP point of view there is now a  because a tenant compromising the kernel can control the entire system and jump to other tenant processes.  This actually has the worst HAP of all the systems considered in this blog.

Hypervisor based Virtual Infrastructure

In this model, the total VAP is unquestionably larger (worse) than the physical system above because there’s simply more code to traverse (a guest and a host kernel).  However, from the Tenant’s point of view, the VAP should be identical to that of unshared physical hardware because the CSP owns all the additional parts.  However, there is the possibility that the Tenant may be compromised by vulnerabilities in the Virtual Hardware Emulation.  This can be a worry because an exploit here doesn’t lead to a Horizontal security problem, so the CSP is apt to pay less attention to vulnerabilities in the Virtual Hardware simply because each guest has its own copy (even though that copy is wholly under the control of the CSP).

The HAP is definitely larger (worse) than the physical host because of the shared code in the Host Kernel/Hypervisor, but it has often been argued that because this is so deep in the Vertical stack that the chances of exploit are practically zero (although venom gave the lie to this hope: stack depth represents obscurity, not security).

However, there is another way of improving the VAP and that’s to reduce the number of vulnerabilities that can be hit.  One way that this can be done is to reduce the bug density (the argument for rewriting code in safer languages) but another is to restrict the amount of code which can be traversed by narrowing the interface (for example, see arguments in this hotcloud paper).  On this latter argument, the host kernel or hypervisor does have a much lower VAP than the guest kernel because the hypercall interface used for emulating the virtual hardware is very narrow (much narrower than the syscall interface).

The important takeaways here are firstly that simply transferring ownership of elements in the VAP doesn’t necessarily improve the Tenant VAP unless you have some assurance that the CSP is actively monitoring and fixing them.  Conversely, when the threat is great enough (Horizontal Exploit), you can trust to the natural preservation instincts of the CSP to ensure correct monitoring and remediation because a successful Horizontal attack can be a business destroying event for the CSP.

Container Based Virtual Infrastructure

The total VAP here is identical to that of physical infrastructure.  However, the Tenant component is much smaller (the kernel accounting for around 50% of all vulnerabilities).  It is this reduction in the Tenant VAP that makes containers so appealing: the CSP is now responsible for monitoring and remediating about half of the physical system VAP which is a great improvement for the Tenant.  Plus when the CSP remediates on the host, every container benefits at once, which is much better than having to crack open every virtual machine image to do it.  Best of all, the Tenant images don’t have to be modified to benefit from these fixes, simply running on an updated CSP host is enough.  However, the cost for this is that the HAP is the entire linux kernel syscall interface meaning the HAP is much larger than then hypervisor virtual infrastructure case because the latter benefits from interface narrowing to only the hypercalls (qualitatively, assuming the hypercall interface is ~30 calls and the syscall interface is ~300 calls, then the HAP is 10x larger in the container case than the hypervisor case); however, thanks to protections from the kernel namespace code, the HAP is less than the shared login server case.  Best of all, from the Tenant point of view, this entire HAP cost is borne by the CSP, which makes this an incredible deal: not only does the Tenant get a significant reduction in their VAP but the CSP is hugely motivated to keep on top of all vulnerabilities in their part of the VAP and remediate very fast because of the business implications of a successful horizontal attack.  The flip side of this is that a large number of the world’s CSPs are very unhappy about this potential risks and costs and actually try to shift responsibility (and risk) back to the Tenant by advocating nested virtualization solutions like running containers in hypervisors. So remember, you’re only benefiting from the CSP motivation to actively maintain their share of the VAP if your CSP runs bare metal containers because otherwise they’ve quietly palmed the problem back off on you.

Other Avenues for Controlling Attack Profiles

The assumption above was that defect density per component is roughly constant, so effectively the more code the more defects.  However, it is definitely true that different code bases have different defect densities, so one way of minimizing your VAP is to choose the code you rely on carefully and, of course, follow bug reduction techniques in the code you write.

Density Reduction

The simplest way of reducing defects is to find and fix the ones in the existing code base (while additionally being careful about introducing new ones).  This means it is important to know how actively defects are being searched for and how quickly they are being remediated.  In general, the greater the user base for the component, the greater the size of the defect searchers and the faster the speed of their remediation, which means that although the Linux Kernel is a big component in the VAP and HAP, a diligent patch routine is a reasonable line of defence because a fixed bug is not an exploitable bug.

Another way of reducing defect density is to write (or rewrite) the component in a language which is less prone to exploitable defects.  While this approach has many advocates, particularly among language partisans, it suffers from the defect decay issue: the idea that the maximum number of defects occurs in freshly minted code and the number goes down over time because the more time from release the more chance they’ve been found.  This means that a newly rewritten component, even in a shiny bug reducing language, can still contain more bugs than an older component written in a more exploitable language, simply because a significant number of bugs introduced on creation have been found in the latter.

Code Reduction (Minimization Techniques)

It also stands to reason that, for a complex component, simply reducing the amount of code that is accessible to the upper components reduces the VAP because it directly reduces the number of defects.  However, reducing the amount of code isn’t as simple as it sounds: it can only really be done by components that are configurable and then only if you’re not using the actual features you eliminate.  Elimination may be done in two ways, either physically, by actually removing the code from the component or virtually by blocking access using a guard (see below).

Guarding and Sandboxing

Guarding is mostly used to do virtual code elimination by blocking access to certain code paths that the upper layers do not use.  For instance, seccomp  in the Linux Kernel can be used to block access to system calls you know the application doesn’t use, meaning it also blocks any attempt to exploit code that would be in those system calls, thus reducing the VAP (and also reducing the HAP if the kernel is shared).

The deficiencies in the above are obvious: if the application needs to use a system call, you cannot block it although you can filter it, which leads to huge and ever more complex seccomp policies.  The solution for the system call an application has to use problem can sometimes be guarding emulation.  In this mode the guard code actually emulates all the effects of the system call without actually making the actual system call into the kernel.  This approach, often called sandboxing, is certainly effective at reducing the HAP since the guards usually run in their own address space which cannot be used to launch a horizontal attack.  However, the sandbox may or may not reduce the VAP depending on the bugs in the emulation code vs the bugs in the original.  One of the biggest potential disadvantages to watch out for with sandboxing is the fact that the address space the sandbox runs in is often that of the tenant, often meaning the CSP has quietly switched ownership of that component back to the tenant as well.

Conclusions

First and foremost: security is hard.  As a cloud Tenant, you really want to offload as much of it as possible to people who are much more motivated to actually do it than you are (i.e. the Cloud Service Provider).

The complete Vertical Attack Profile of a container bare metal system in the cloud is identical to a physical system and better than a Hypervisor based system; plus the tenant owned portion is roughly 50% of the total VAP meaning that Containers are by far the most secure virtualization technology available today from the Tenant perspective.

The increased Horizontal Attack profile that containers bring should all rightly belong to the Cloud Service Provider.  However, CSPs are apt to shirk this responsibility and try to find creative ways to shift responsibility back to the tenant including spreading misinformation about the container Attack profiles to try to make Tenants demand nested solutions.

Before you, as a Tenant, start worrying about the CSP owned Horizontal Attack Profile, make sure that contractual remedies (like SLAs or Reputational damage to the CSP) would be insufficient to cover the consequences of any data loss that might result from a containment breach.  Also remember that unless you, as the tenant, are under external compliance obligations like HIPPA or PCI, contractual remedies for a containment failure are likely sufficient and you should keep responsibility for the HAP where it belongs: with the CSP.

Using Elliptic Curve Cryptography with TPM2

One of the most significant advances going from TPM1.2 to TPM2 was the addition of algorithm agility: The ability of TPM2 to work with arbitrary symmetric and asymmetric encryption schemes.  In practice, in spite of this much vaunted agile encryption capability, most actual TPM2 chips I’ve seen only support a small number of asymmetric encryption schemes, usually RSA2048 and a couple of Elliptic Curves.  However, the ability to support any Elliptic Curve at all is a step up from TPM1.2.  This blog post will detail how elliptic curve schemes can be integrated into existing cryptographic systems using TPM2.  However, before we start on the practice, we need at least a tiny swing through the theory of Elliptic Curves.

What is an Elliptic Curve?

An Elliptic Curve (EC) is simply the set of points that lie on the curve in the two dimensional plane (x,y) defined by the equation

y2 = x3 + ax + b

which means that every elliptic curve can be parametrised by two constants a and b.  The set of all points lying on the curve plus a point at infinity is combined with an addition operation to produce an abelian (commutative) group.  The addition property is defined by drawing straight lines between two points and seeing where they intersect the curve (or picking the infinity point if they don’t intersect).  Wikipedia has a nice diagrammatic description of this here.  The infinity point acts as the identity of the addition rule and the whole group is denoted E.

The utility for cryptography is that you can define an integer multiplier operation which is simply the element added to itself n times, so for P ∈ E, you can always find Q ∈ E such that

Q = P + P + P … = n × P

And, since it’s a simple multiplication like operation, it’s very easy to compute Q.  However, given P and Q it is computationally very difficult to get back to n.  In fact, it can be demonstrated mathematically that trying to compute n is equivalent to the discrete logarithm problem which is the mathematical basis for the cryptographic security of RSA.  This also means that EC keys suffer the same (actually more so) problems as RSA keys: they’re not Quantum Computing secure (vulnerable to the Quantum Shor’s algorithm) and they would be instantly compromised if the discrete logarithm problem were ever solved.

Therefore, for any elliptic curve, E, you can choose a known point G ∈ E, select a large integer d and you can compute a point P = d × G.  You can then publish (P, G, E) as your public key knowing it’s computationally infeasible for anyone to derive your private key d.

For instance, Diffie-Hellman key exchange can be done by agreeing (E, G) and getting Alice and Bob to select private keys dA, dB.  Then knowing Bob’s public key PB, Alice can select a random integer r, which she publishes, and compute a key agreement as a secret point on the Elliptic Curve (r dA) × PB.  Bob can derive the same Elliptic Curve point because

(r dA) × PB = (r dA)dB × G = (r dB) dA × G = (r dB) × PA

The agreement is a point on the curve, but you can use an agreed hashing or other mechanism to get from the point to a symmetric key.

Seems simple, but the problem for computing is that we really want to use integers and right at the moment the elliptic curve is defined over all the real numbers, meaning E is of infinite size and involves floating point computations (of rather large precision).

Elliptic Curves over Finite Fields

Fortunately there is a mathematical theory of finite fields, called Galois Theory, which allows us to take the Galois Field over prime number p, which is denoted GF(p), and compute Elliptic Curve points over this field.  This derivation, which is mathematically rather complicated, is denoted E(GF(p)), where every point (x,y) is represented by a pair of integers between 0 and p-1.  There is another theory that says the number of elements in E(GF(p))

n = |E(GF(p))|

is roughly the same size as p, meaning if you choose a 32 bit prime p, you likely have a field over roughly 2^32 elements.  For every point P in E(GF(p)) it is also mathematically proveable that n × P = 0. where 0 is the zero point (which was the infinity point in the real elliptic curve).

This means that you can take any point, G,  in E(GF(p)) and compute a subgroup based on it:

EG = { ∀m ∈ Zn : m × G }

If you’re lucky |EG| = |E(GF(p))| and G is the generator of the entire group.  However, G may only generate a subgroup and you will find |EG| = h|E(GF(p))| where integer h is called the cofactor.  In general you want the cofactor to be small (preferably less than four) for EG to be cryptographically useful.

For a computer’s purposes, EG is the elliptic curve group used for integer arithmetic in the cryptographic algorithms.  The Curve and Generator is then defined by (p, a, b, Gx, Gy, n, h) which are the published parameters of the key (Gx, Gy represent the x and y elements of point G).  You select a random number d as your private key and your public key P = d × G exactly as above, except now P is easy to compute with integer operations.

Problems with Elliptic Curves

Although I stated above that solving P = d × G is equivalent in difficulty to the discrete logarithm problem, that’s not generally true.  If the discrete logarithm problem were solved, then we’d easily be able to compute d for every generator and curve, but it is possible to pick curves for which d can be easily computed without solving the discrete logarithm problem. This is the reason why you should never pick your own curve parameters (even if you think you know what you’re doing) because it’s very easy to choose a compromised curve.  As a demonstration of the difficulty of the problem: each of the major nation state actors, Russia, China and the US, publishes their own curve parameters for use in their own cryptographic EC implementations and each of them thinks the parameters published by the others is compromised in a way that allows the respective national security agencies to derive private keys.  So if nation state actors can’t tell if a curve is compromised or not, you surely won’t be able to either.

Therefore, to be secure in EC cryptography, you pick and existing curve which has been vetted and select some random Generator Point on it.  Of course, if you’re paranoid, that means you won’t be using any of the nation state supplied curves …

Using the TPM2 with Elliptic Curves in Cryptosystems

The initial target for this work was the openssl cryptosystem whose libraries are widely used for deriving other uses (like https in apache or openssh). Originally, when I did the initial TPM2 enabling of openssl as described in this blog post, I added TPM2 as a patch to the existing TPM 1.2 openssl_tpm_engine.  Unfortunately, openssl_tpm_engine seems to be pretty much defunct at this point, so I started my own openssl_tpm2_engine as a separate git tree to begin experimenting with Elliptic Curve keys (if you don’t use git, you can download the tar file here). One of the benefits to running my own source tree is that I can now add a testing infrastructure that makes use of the IBM TPM emulator to make sure that the basic cryptographic operations all work which means that make check functions even when a TPM2 isn’t available.  The current key creation and import algorithms use secured connections to the TPM (to avoid eavesdropping) which means it’s only really possible to construct them using the IBM TSS. To make all of this easier, I’ve set up an openSUSE Build Service repository which is building for all major architectures and the openSUSE and Fedora distributions (ignore the failures, they’re currently induced because the TPM emulator only currently works on 64 bit little endian systems, so make check is failing, but the TPM people at IBM are working on this, so eventually the builds should be complete).

TPM2 itself also has some annoying restrictions.  The biggest of which is that it doesn’t allow you to pass in arbitrary elliptic curve parameters; you may only use elliptic curves which the TPM itself knows.  This will be annoying if you have an existing EC key you’re trying to import because the TPM may reject it as an unknown algorithm.  For instance, openssl can actually compute with arbitrary EC parameters, but has 39 current elliptic curves parametrised by name. By contrast, my Nuvoton TPM2 inside my Dell XPS 13 knows precisely two curves:

jejb@jarvis:~> create_tpm2_key --list-curves
prime256v1
bnp256

However, assuming you’ve picked a compatible curve for your EC private key (and you’ve defined a parent key for the storage hierarchy) you can simply import it to a TPM bound key:

create_tpm2_key -p 81000001 -w key.priv key.tpm

The tool will report an error if it can’t convert the curve parameters to a named elliptic curve known to the TPM

jejb@jarvis:~> openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:brainpoolP256r1 > key.priv
jejb@jarvis:~> create_tpm2_key -p 81000001 -w key.priv key.tpm
TPM does not support the curve in this EC key
openssl_to_tpm_public failed with 166
TPM_RC_CURVE - curve not supported Handle number unspecified

You can also create TPM resident private keys simply by specifying the algorithm

create_tpm2_key -p 81000001 --ecc bnp256 key.tpm

Once you have your TPM based EC keys, you can use them to create public keys and certificates.  For instance, you create a self-signed X509 certificate based on the tpm key by

openssl req -new -x509 -sha256  -key key.tpm -engine tpm2 -keyform engine -out my.crt

Why you should use EC keys with the TPM

The initial attraction is the same as for RSA keys: making it impossible to extract your private key from the system.  However, the mathematical calculations for EC keys are much simpler than for RSA keys and don’t involve finding strong primes, so it’s much simpler for the TPM (being a fairly weak calculation machine) to derive private and public EC keys.  For instance the times taken to derive a RSA key from the primary seed and an EC key differ dramatically

jejb@jarvis:~> time tsscreateprimary -hi o -ecc bnp256 -st
Handle 80ffffff

real 0m0.111s
user 0m0.000s
sys 0m0.014s

jejb@jarvis:~> time tsscreateprimary -hi o -rsa -st
Handle 80ffffff

real 0m20.473s
user 0m0.015s
sys 0m0.084s

so for a slow system like the TPM, using EC keys is a significant speed advantage.  Additionally, there are other advantages.  The standard EC Key signature algorithm is a modification of the NIST Digital Signature Algorithm called ECDSA.  However DSA and ECDSA require a cryptographically strong (and secret) random number as Sony found out to their cost in the EC Key compromise of Playstation 3.  The TPM is a good source of cryptographically strong random numbers and if it generates the signature internally, you can be absolutely sure of keeping the input random number secret.

Why you might want to avoid EC keys altogether

In spite of the many advantages described above, EC keys suffer one additional disadvantage over RSA keys in that Elliptic Curves in general are very hot fields of mathematical research so even if the curve you use today is genuinely not compromised, it’s not impossible that a mathematical advance tomorrow will make the curve you chose (and thus all the private keys you generated) vulnerable.  Of course, the same goes for RSA if anyone ever cracks the discrete logarithm problem, but solving that problem would likely be fully published to world acclaim and recognition as a significant contribution to the advancement of number theory.  Discovering an attack on a currently used elliptic curve on the other hand might be better remunerated by offering to sell it privately to one of the national security agencies …

Using letsencrypt certificates with DANE

If, like me, you run your own cloud server, at some point you need TLS certificates to export secure services.  Like a lot of people I object to paying a so called X.509 authority a yearly fee just to get a certificate, so I’ve been using a free startcom one for a while.  With web browsers delisting startcom, I’m unable to get a new usable certificate from them, so I’ve been investigating letsencrypt instead (if you’re in to fun ironies, by the way, you can observe that currently the letsencrypt site isn’t using a letsencrypt certificate, perhaps indicating the administrative difficulty of doing so).

The problem with letsencrypt certificates is that they currently have a 90 day expiry, which means you really need to use an automated tool to keep your TLS certificate current.  Fortunately the EFF has developed such a tool: certbot (they use a letsencrypt certificate for their site, indicating you can have trust that they do know what they’re doing).  However, one of the problems with certbot is that, by default, it generates a new key each time the certificate is renewed.  This isn’t a problem for most people, but if you use DANE records, it causes significant issues.

Why use both DANE and letsencrypt?

The beauty of DANE, as I’ve written before, is that it gives you a much more secure way of identifying your TLS certificate (provided you run DNSSEC).  People verifying your certificate may use DANE as the only verification mechanism (perhaps because they also distrust the X.509 authorities) which means the best practice is to publish a DANE TLSA record for each service and also use an X.509 authority rooted certificate.  That way your site just works for everyone.

The problem here is that being DNS based, DANE records can be cached for a while, so it can take a few days for DANE certificate updates to propagate through the DNS infrastructure. DANE records have an answer for this: they have a mode where the record identifies only the hash of the public key used by the website, not the certificate itself, which means you can change your certificate as much as you want provided you keep the same public/private key pair.  And here’s the rub: if certbot is going to try to give you a new key on each renewal, this isn’t going to work.

The internet society also has posts about this.

Making certbot work with DANE

Fortunately, there is a solution: the certbot manual mode (certonly) takes a –csr flag which allows you to construct your own certificate request to send to letsencrypt, meaning you can keep a fixed key … at the cost of not using most of the certbot automation.  So, how do you construct a correct csr for letsencrypt?  Like most free certificates, letsencrypt will only allow you to specify the certificate commonName, which must be a DNS pointer to the actual website.  If you need a certificate that covers multiple sites, all the other sites must be enumerated in the x509 v3 extensions field subjectAltName.  Let’s look at how openssl can generate such a certificate request.  One of the slight problems is that openssl, being a cranky tool, does not allow you to specify a subjectAltName on the command line, so you have to construct a special configuration file for it.  I called mine letsencrypt.conf

[req]
prompt = no
distinguished_name = req_dn
req_extensions = req_ext

[req_dn]
commonName = bedivere.hansenpartnership.com

[req_ext]
subjectAltName=@alt_names

[alt_names]
DNS.1=bedivere.hansenpartnership.com
DNS.2=www.hansenpartnership.com
DNS.3=hansenpartnership.com
DNS.4=blog.hansenpartnership.com

As you can see, I’ve given my canonical server (bedivere) as the common name and then four other subject alt names.  Once you have this config file tailored to your needs, you merely run

openssl req -new -key <mykey.key> -config letsencrypt.conf -out letsencrypt.csr

Where mykey.key is the path to your private key (you need a private key because even though the CSR only contains the public key, it is also signed).  However, once you’ve produced this letsencrypt.csr, you no longer need the private key and, because it’s undated, it will now work forever, meaning the infrastructure you put into place with certbot doesn’t need to be privileged enough to access your private key.  Once this is done, you make sure you have TLSA 3 1 1 records pointing to the hash of your public key (here’s a handy website to generate them for you) and you never need to alter your DANE record again.  Note, by the way that letsencrypt certificates can be used for non-web purposes (I use mine for encrypted SMTP as well), so you’ll need one DANE record for every service you use them for.

Putting it all together

Now that you have your certificate request, depending on what version of certbot you have, you may need it in DER format

openssl req -in letsencrypt.csr -out letsencrypt.der -outform DER

And you’re ready to run the following script from cron

#!/bin/bash
date=$(date +%Y-%m-%d)
dir=/etc/ssl/certs
cert="${dir}/letsencrypt-${date}.crt"
fullchain="${dir}/letsencrypt-${date}.pem"
chain="${dir}/letsencrypt-chain-${date}.pem"
csr=/etc/ssl/letsencrypt.der
out=/tmp/certbot.out

##
# certbot handling
#
# first it cannot replace certs, so ensure new locations (date suffix)
# each time mean the certificate is unique each time.  Next, it's
# really chatty, so the only way to tell if there was a failure is to
# check whether the certificates got updated and then get cron to
# email the log
##

certbot certonly --webroot --csr ${csr} --preferred-challenges http-01 -w /var/www --fullchain-path ${fullchain} --chain-path ${chain} --cert-path ${cert} > ${out} 2>&1

if [ ! -f ${fullchain} -o ! -f ${chain} -o ! -f ${cert} ]; then
    cat ${out}
    exit 1;
fi

# link into place

# cert only (apache needs)
ln -sf ${cert} ${dir}/letsencrypt.crt
# cert with chain (stunnel needs)
ln -sf ${fullchain} ${dir}/letsencrypt.pem
# chain only (apache needs)
ln -sf ${chain} ${dir}/letsencrypt-chain.pem

# reload the services
sudo systemctl reload apache2
sudo systemctl restart stunnel4
sudo systemctl reload postfix

Note that this script needs the ability to write files and create links in /etc/ssl/certs (can be done by group permission) and the systemctl reloads need the following in /etc/sudoers

%LimitedAdmins ALL=NOPASSWD: /bin/systemctl reload apache2
%LimitedAdmins ALL=NOPASSWD: /bin/systemctl reload postfix
%LimitedAdmins ALL=NOPASSWD: /bin/systemctl restart stunnel4

And finally you can run this as a cron script under whichever user you’ve chosen to have sufficient privilege to write the certificates.  I run this every month, so I know I if anything goes wrong I have at least two months to fix it.

Oh, and just in case you need proof that I got this all working, here you are!

TPM2 and Linux

Recently Microsoft started mandating TPM2 as a hardware requirement for all platforms running recent versions of windows.  This means that eventually all shipping systems (starting with laptops first) will have a TPM2 chip.  The reason this impacts Linux is that TPM2 is radically different from its predecessor TPM1.2; so different, in fact, that none of the existing TPM1.2 software on Linux (trousers, the libtpm.so plug in for openssl, even my gnome keyring enhancements) will work with TPM2.  The purpose of this blog is to explore the differences and how we can make ready for the transition.

What are the Main 1.2 vs 2.0 Differences?

The big one is termed Algorithm Agility.  TPM1.2 had SHA1 and RSA2048 only.  TPM2 is designed to have many possible algorithms, including support for elliptic curve and a host of government mandated (Russian and Chinese) crypto systems.  There’s no requirement for any shipping TPM2 to support any particular algorithms, so you actually have to ask your TPM what it supports.  The bedrock for TPM2 in the West seems to be RSA1024-2048, ECC and AES for crypto and SHA1 and SHA256 for hashes1.

What algorithm agility means is that you can no longer have root keys (EK and SRK see here for details) like TPM1.2 did, because a key requires a specific crypto algorithm.  Instead TPM2 has primary “seeds” and a Key Derivation Function (KDF).  The way this works is that a seed is simply a long string of random numbers, but it is used as input to the KDF along with the key parameters and the algorithm and out pops a real key based on the seed.  The KDF is deterministic, so if you input the same algorithm and the same parameters you get the same key again.  There are four primary seeds in the TPM2: Three permanent ones which only change when the TPM2 is cleared: endorsement (EPS), Platform (PPS) and Storage (SPS).  There’s also a Null seed, which is used for ephemeral keys and changes every reboot.  A key derived from the SPS can be regarded as the SRK and a key derived from the EPS can be regarded as the EK. Objects descending from these keys are called members of hierarchies2. One of the interesting aspects of the TPM is that the root of a hierarchy is a key not a seed (because you need to exchange secret information with the TPM), and that there can be multiple of these roots with different key algorithms and parameters.

Additionally, the mechanism for making use of keys has changed slightly.  In TPM 1.2 to import a secret key you wrapped it asymmetrically to the SRK and then called LoadKeyByBlob to get a use handle.  In TPM2 this is a two stage operation, firstly you import a wrapped (or otherwise protected) private key with TPM2_Import, but that returns a private key structure encrypted with the parent key’s internal symmetric key.  This symmetrically encrypted key is then loaded (using TPM2_Load) to obtain a use handle whenever needed.  The philosophical change is from online keys in TPM 1.2 (keys which were resident inside the TPM) to offline keys in TPM2 (keys which now can be loaded when needed).  This philosophy has been reinforced by reducing the space available to keep keys loaded in TPM2 (see later).

Playing with TPM2

If you have a recent laptop, chances are you either have or can software upgrade to a TPM2.  I have a dell XPS13 the skylake version which comes with a software upgradeable Nuvoton TPM.  Dell kindly provides a 1.2->2 switching program here, which seems to work under Freedos (odin boot) so I have a physical TPM2 based system.  For those of you who aren’t so lucky, you can still play along, but you need a TPM2 emulator.  The best one is here; simply download and untar it then type make in the src directory and run it as ./tpm_server.  It listens on two TCP ports, 2321 and 2322, for TPM commands so there’s no need to install it anywhere, it can be run directly from the source directory.

After that, you need the interface software called tss2.  The source is here, but Fedora 25 and recent Ubuntu already package it.  I’ve also built openSUSE packages here.  The configuration of tss2 is controlled by environment variables.  The most important one is TPM_INTERFACE_TYPE which tells it how to connect to the TPM2.  If you’re using a simulator, you set this to “socsim” and if you have a real TPM2 device you set it to “dev”.  One final thing about direct device connection: in tss2 there’s no daemon like trousers had to broker the connection, all your users connect directly to the TPM2 device /dev/tpm0.  To do this, the device has to support read and write by arbitrary users, so its permissions need to be 0666.  I’ve got a udev script to achieve this

# tpm 2 devices need to be world readable
SUBSYSTEM=="tpm", ACTION=="add", MODE="0666"

Which goes in /etc/udev/rules.d/80-tpm-2.rules on openSUSE.  The next thing you need to do, if you’re running the simulator, is power it on and start it up (for a real device, this is done by the bios):

tsspowerup
tssstartup

The simulator will now create a NVChip file wherever you started it to store NV ram based objects, which it will read on next start up.  The first thing you need to do is create an SRK and store it in NV memory.  Microsoft uses the well known key handle 81000001 for this, so we’ll do the same.  The reason for doing this is that a real TPM takes ages to run the KDF for RSA keys because it has to look for prime numbers:

jejb@jarvis:~> TPM_INTERFACE_TYPE=socsim time tsscreateprimary -hi o -st -rsa
Handle 80000000
0.03 user 0.00 system 0:00.06 elapsed

jejb@jarvis:~> TPM_INTERFACE_TYPE=dev time tsscreateprimary -hi o -st -rsa
Handle 80000000
0.04 user 0.00 system 0:20.51 elapsed

As you can see: the simulator created a primary storage key (the SRK) in a few milliseconds, but it took my real TPM2 20 seconds to do it3 … not something you want to wait for, hence the need to store this permanently under a well known key handle and get rid of the temporary copy

tssevictcontrol -hi o -ho 80000000 -hp 81000001
tssflushcontext -ha 80000000

tssevictcontrol tells the TPM to copy the key at transient handle 800000004  to permanent NV handle 81000001 and tssflushcontext erases the transient key.  Flushing transient objects is very important, because TPM2 has a lot less transient storage space than TPM1.2 did; usually only about three handles worth.  You can tell how much you have by doing

tssgetcapability -cap 6|grep -i transient
TPM_PT 0000010e value 00000003 TPM_PT_HR_TRANSIENT_MIN - the minimum number of transient objects that can be held in TPM RAM

Where the value (00000003) tells me that the TPM can store at least 3 transient objects.  After that you’ll start getting out of space errors from it.

The final step in taking ownership of a TPM2 is to set the authorization passwords.  Each of the four hierarchies (Null, Owner, Endorsement, Platform) and the Lockout has a possible authority password.  The Platform authority is cleared on startup, so there’s not much point setting it (it’s used by the BIOS or Firmware to perform TPM functions).  Of the other four, you really only need to set Owner, Endorsement and Lockout (I use the same password for all of them).

tsshierarchychangeauth -hi l -pwdn <your password>
tsshierarchychangeauth -hi e -pwdn <your password>
tsshierarchychangeauth -hi o -pwdn <your password>

After this is done, you’re all set.  Note that as well as these authorizations, each object can have its own authorization (or even policy), so the SRK you created earlier still has no password, allowing it to be used by anyone.  Note also that the owner authorization controls access to the NV memory, so you’ll need to supply it now to make other objects persistent.

An Aside about Real TPM2 devices and the Resource Manager

Although I’m using the code below to store my keys in the TPM2, there’s a couple of practical limitations which means it won’t work for you if you have multiple TPM2 using applications without a kernel update.  The two problems are

  1. The Linux Kernel TPM2 device /dev/tpm0 only allows one user at once.  If a second application tries to open the device it will get an EBUSY which causes TSS_Create() to fail.
  2. Because most applications make use of transient key slots and most TPM2s have only a couple of these, simultaneous users can end up running out of these and getting unexpected out of space errors.

The solution to both of these is something called a Resource Manager (RM).  What the RM does is effectively swap transient objects in and out of the TPM as needed to prevent it from running out of space.  Linux has an issue in that both the kernel and userspace are potential users of TPM keys so the resource manager has to live inside the kernel.  Jarkko Sakkinen has preliminary resource manager patches here, and they will likely make it into kernel 4.11 or 4.12.  I’m currently running my laptop with the RM patches applied, so multiple applications work for me, but since these are preliminary patches, I wouldn’t currently advise others to do this.  The way these patches work is that once you declare to the kernel via an ioctl that you want to use the RM, every time you send a command to the TPM, your context gets swapped in, the command is executed, the context is swapped out and the response sent meaning that no other user of the TPM sees your transient objects.  The moment you send the ioctl, the TPM device allows another user to open it as well.

Using TPM2 as a keystore

Once the implementation is sorted out, openssl and gnome-keyring patches can be produced for TPM2.  The only slight wrinkle is that for create_tpm2_key you require a parent key to exist in the NV storage (at the 81000001 handle we talked about previously).  So to convert from a password protected openssh RSA key to a TPM2 based one, you do

create_tpm2_key -a -p 81000001 -w id_rsa id_rsa.tpm
mv id_rsa.tpm id_rsa

And then gnome keyring manager will work nicely (make sure you keep a copy of your original private key in case you want to move to a new laptop or reseed the TPM2).  If you use the same TPM2 password as your original key password, you won’t even need to update the gnome loginkeyring for the new password.

Conclusions

Because of the lack of an in-kernel Resource Manager, TPM2 is ready for experimentation in Linux but definitely not ready for prime time yet (unless you’re willing to patch your kernel).  Hopefully this will change in the 4.11 or 4.12 kernel when the Resource Manager finally goes upstream5.

Looking forwards to the new stack, the lack of a central daemon is really a nice feature: tcsd crashing used to kill all of my TPM key based applications, but with tss2 having no central daemon, everything has just worked(tm) so far.  A kernel based RM also means that the kernel can happily use the TPM (for its trusted keys and disk encryption) without interfering with whatever userspace is doing.

TPM enabling gnome-keyring

One of the questions about the previous post on using your TPM as a secure key store was “could the TPM be used to protect ssh keys?”  The answer is yes, because openssh uses openssl (so you can simply convert an openssh private key to a TPM private key) but the ssh-agent wouldn’t work because ssh-add passes in the private keys by their component primes, which is not possible when the private key is guarded by the TPM.  However,  that made me actually look at gnome-keyring to figure out how it worked and whether it could be used with the TPM.  The answer is yes, but there are also some interesting side effects of TPM enabling gnome-keyring.

Gnome-keyring Architecture

Gnome-keyring consists of essentially three components: a pluggable store backend, a secure passphrase holder (which is implemented as a backend store) and an agent frontend.  The frontend and backend talk to each other using the pkcs11 protocol.  This also means that the backend can serve anything that also speaks pkcs11, which most encryption systems do.  The stores consist of a variety of file or directory backed keys (for instance the ssh-store simply loads all the ssh keys in your $HOME/.ssh; the secret-store uses the gnome default keyring store, $HOME/.local/shared/keyring/,  to store collections of passwords)  The frontends act as bridges between a variety of external protocols and the keyring daemon.  They take whatever external protocol they’re speaking in, convert the request to pkcs11 and query the backends for the information.  The most important frontend is the login one which is called by gnome-keyring-daemon at start of day to unlock the secret-store which contains all your other key passwords using your login password as the key.  The ssh-agent frontend speaks the ssh agent protocol and converts key and signing requests to pkcs11 which is mostly served by the ssh-store.  The gpg-agent store speaks a very cut down version of the gpg agent protocol: basically all it does is allow you to store gpg key passwords in the secret-store; it doesn’t do any cryptographic operations.

Pkcs11 Essentials

Pkcs11 is a highly complex protocol designed for opening sessions which query or operate on tokens, which roughly speaking represent a bundle of objects (If you have a USB crypto key, that’s usually represented as a token and your stored keys as objects).  There is some intermediate stuff about slots, but that mostly applies to tokens which may be offline and need insertion, which isn’t relevant to the gnome keyring.  The objects may have several levels of visibility, but the most common is public (always visible) and private (must be logged in to the token to see them).  Objects themselves have a variety of attributes, some of which depend on what type of object they are and some of which are universal (like id and label).  For instance, an object representing an RSA public key would have the public exponent and the public modulus as queryable attributes.

The pkcs11 protocol also has a reasonably comprehensive object finding protocol.  An arbitrary list of attributes and values can be passed to the query and it will return all objects that fully match.  The token identifier is a query attribute which may be present but doesn’t have to be, so if you omit it, you end up searching over every token the pkcs11 subsystem knows about.

The other operation that pkcs11 understands is logging into a token with a pin (which is pkcs11 speak for a passphrase).  The pin doesn’t have to be supplied by the entity performing the login, it may be supplied to the token via an out of band mechanism (for instance the little button on the yukikey, or even a real keypad).  The important thing for gnome keyring here is that logging into a token may be as simple as sending the login command and letting the token sort out the authorization, which is the way gnome keyring operates.

You also need to understand is conventions about searching for keys.  The pkcs11 standard recommends (but does not require) that public and private key pairs, which exist as two separate objects, should have the same id attribute.  This means that if you want to find an rsa private key, the way you do this is by searching the public objects for the exponent and modulus.  Once this is returned, you log into the token and retrieve the private key object by the public key id.

The final thing to understand is that once you have the private key object, it merely acts as an entitlement to have the token perform certain private key operations for you (like encryption, decryption or signing); it doesn’t mean you have access to the private key itself.

Gnome Keyring handling of ssh keys

Once the ssh-agent side of gnome keyring receives a challenge, it must respond by returning the private key signature of the challenge.  To do this, it searches the pkcs11 for the key used for the challenge (for RSA keys, it searches by modulus and exponent, for DSA keys it searches by signature primes, etc).  One interesting point to note is the search isn’t limited to the gnome keyring ssh token store, so if the key is found anywhere, in any pkcs11 token, it will be used.  The expectation is that they key id attribute will be the ssh fingerprint and the key label attribute will be the ssh comment, but these aren’t used in the actual search.  Once the public key is found, the agent logs into the token, retrieves the private key by label and id and proceeds to get the private key to sign the challenge.

Adding TPM key handling to the ssh store

Gnome keyring is based on GNU libgcrypt which handles all cryptographic objects via s-expressions (which are basically lisp like strings).  Gcrypt itself doesn’t seem to have an s-expression for a token, so the actual signing will be done inside the keyring code.  The s-expression I chose to represent a TPM based private key is

(private-key
  (rsa
    (tpm
      (blob <binary key blob>)
      (auth <authoriztion>))))

The rsa is necessary for it to be recognised for RSA signatures.  Now the ssh-store is modified to recognise the PEM guards for TPM keys, load the key blob up into the s-expression and ask the secret-store for the authorization passphrase which is also loaded.  In theory, it might be safer to ask the secret store for the authorization at key use time, but this method mirrors what happens to private keys, which are decrypted at load time and stored as s-expressions containing the component primes.

Now the RSA signing code is hooked to check for TPM s-expressions and divert the signature to the TPM if they’re found.  Once this is done, gnome keyring is fully enabled for TPM based ssh keys.  The initial email thread about this is here, and an openSUSE build repository of the modified code is here.

One important design constraint for this is that people (well, OK me) have a lot of ssh keys, sometimes more than could be effectively stored by the TPM in its internal shielded memory (plus if you’re like me, you’re using the TPM for other things like VPN), so the design of the keyring TPM additions is not to burden that memory further, thus TPM keys are only loaded into the TPM whenever they’re needed for an operation and are unloaded otherwise.  This means the TPM can scale to hundreds of keys, but at the expense of taking longer: Instead of simply asking the TPM to sign something, you have to first ask the TPM to load and unwrap (which is an RSA operation) the key, then use it to sign.  Effectively it’s two expensive TPM operations per real cryptographic function.

Using TPM based ssh keys

Once you’ve installed the modified gnome-keyring package, you’re ready actually to make use of it.  Since we’re going to replace all your ssh private keys with TPM equivalents, which are keyed to a given storage root key (SRK)1 which changes every time the TPM is cleared, it is prudent to take a backup of all your ssh keys on some offline encrypted USB stick, just in case you ever need to restore them (or transfer them to a new laptop2).

cd ~/.ssh/
for pub in *rsa*.pub; do
    priv=$(basename $pub .pub)
    echo $priv
    create_tpm_key -m -a -w $priv ${priv}.tpm
    mv ${priv}.tpm $priv
done

You’ll be prompted first to create a TPM authorization password for the key, then to verify it, then to give the PEM password for the ssh key (for each key).  Note, this only transfers RSA keys (the only algorithm the TPM can handle) and also note the script above is overwriting the PEM private key, so make sure you have a backup.  The create_tpm_key command comes from the openssl_tpm_engine package, which I’ve patched here to support random migration authority and well known SRK authority.

All you have to do now is log out and back in (to restart the gnome-keyring daemon with the new ssh keystore) and you’re using TPM based keys for all your ssh operations.  I’ve noticed that this adds a couple of hundred milliseconds per login, so if you batch stuff over ssh, this is why your scripts are slower.

Using Your TPM as a Secure Key Store

One of the new features of Linux Plumbers Conference this year was the TPM Microconference, which facilitated great discussions both in the session itself and in the hallways.  Quite a bit of discussion was generated by the Beginner’s Guide to the TPM talk I gave, mostly because I blamed the Trusted Computing Group for the abject failure to adopt TPMs for anything citing the incredible complexity of their stack.

The main thing that came out of this discussion was that a lot of this stack complexity can be hidden from users and we should concentrate on making the TPM “just work” for all cryptographic functions where we have parallels in the existing security layers (like the keystore).  One of the great advantages of the TPM, instead of messing about with USB pkcs11 tokens, is that it has a file format for TPM keys (I’ll explain this later) which can be used directly in place of standard private key files.  However, before we get there, lets discuss some of the basics of how your TPM works and how to make use of it.

TPM Basics

Note that all of what I’m saying below applies to a 1.2 TPM (the type most people have in their laptops) 2.0 TPMs are now appearing on the market, but chances are you have a 1.2.

A TPM is traditionally delivered in your laptop in an uninitialised state.  In older laptops, the TPM is traditionally disabled and you usually have to find an entry in the BIOS menu to enable it.  In more modern laptops (thanks to Windows 10) the TPM is enabled in the bios and ready for the OS install to make use of it.  All TPMs are delivered with one manufacturer set key called the Endorsement Key (EK).  This key is unique to your TPM (like an identifying label) and is used as part of the attestation protocol.  Because the EK is a unique label, the attestation protocol is rather complex involving a so called privacy CA to protect your identity, but because it isn’t necessary to use the TPM as a secure keystore, I won’t cover it further.

The other important key, which you have to generate, is called the Storage Root Key.  This key is generated internally within the TPM once somebody takes ownership of it.  The package you need to begin using the tpm is tpm-tools, which is packaged by most distros. You must also have the Linux TSS stack trousers installed (just installing tpm-tools will often pull this in) and have the tcsd part of trousers running (usually systemctl start tcsd; systemctl enable tcsd). I tend to configure my TPM with an owner password (for things like resetting dictionary attacks) but a well known storage root key authority.  To do this from a fully cleared and enabled TPM, execute

tpm_takeownership -z

And give your chosen owner password when prompted.  If you get an error, chances are you need to go back to the BIOS menu and actively clear and reset the TPM (usually under the security options).

Aside about Authority and the Trusted Security Stack

To the TPM, an “authority” is a 20 byte number you use to prove you’re allowed to manipulate whatever object you’re trying to use.  The TPM typically has a well known way of converting typed passwords into these 20 byte codes.  The way you prove you know the authority is to add a Hashed Message Authentication Code (HMAC) to your TPM command.  This means that the hash can only be generated by someone who knows the authority for the object, but anyone seeing the hash cannot derive the authority from it.  The utility of this is that the trousers library (tspi) generates the HMAC before the TPM command is passed to the central daemon (tcsd) meaning that nothing except you and the TPM know the authority

trousersThe final thing about authority you need to know is that the TPM has a concept of “well known authority” which simply means supply 20 bytes of zeros.  It’s kind of paradoxical to have a secret everyone knows, however, there are reasons for this:  For most objects in the TPM whether you require authority to use them is optional, but for some it is mandatory.  For objects (like the SRK) where authority is mandatory, using the well known authority is equivalent to saying actually I don’t need authorization for this object.

The Storage Root Key (SRK)

Once you’ve generated this above, the TPM keeps the secret part permanently hidden, but can be persuaded to give anyone the public part.  In TPM 1.2, the SRK is a RSA 2048 key.  On most modern TPMs, you have to tell the tpm you want anyone to be able to read the public part of the storage root key, which you do with this command

tpm_restrictsrk -a

You’ll get prompted for the owner password.  Once you execute this command, anyone who knows the SRK authority (which you’ve set to be well known) is allowed to read the public part.

Why all this fuss about SRK authorization?  Well, traditionally, the TPM is designed for use in a hostile multi-user environment.  In the relaxed, no authorization, environment I’ve advised you to set up, anyone who knows the SRK can upload any storage object (like a key or protected blob) into the TPM.  This means, since the TPM has very limited storage, that they could in theory do a DoS attack against the TPM simply by filling it with objects.  On a laptop where there’s only one user (you) this is not usually a concern, hence the advice to use a well known authority, which makes the TPM much easier to use.

The way external objects (like keys or data blobs) are uploaded into the TPM is that they all have a parent (which must be a storage key) and they are encrypted to the public part of this key (in TPM parlance, this is called wrapping).  The TPM can have deep key hierarchies (all eventually parented to the SRK), but for a laptop, it makes sense simply to use the SRK as the only storage key and wrap everything for it as the parent.  Now here’s the reason for the well known authority: to upload an object into the TPM, it not only needs to be wrapped to the parent key, you also need to use the parent key authority to perform the upload.  The object you’re using also has a separate authority.  This means that when you upload and use a key, if you’ve set a SRK password, you’ll end up having to type both the SRK password and the key password pretty much every time you use it, which is a bit of a pain.

The tools used to create wrapped keys are found in the openssl_tpm_engine package.  I’ve done a few patches to make it easier to use (mostly by trying well known authority first before asking for the SRK password), so you can see my patched version here.  The first thing you can do is take any PEM key file you have and wrap it for your tpm

create_tpm_key -m -w test.key test.tpm.key

This creates a TPM key file test.tpm.key containing a wrapped key for your TPM with no authority (to add an authority password, use the -a option).  If you cat the test.tpm.key file, you’ll see it looks like a standard PEM file, except the guards are now

-----BEGIN TSS KEY BLOB-----
-----END TSS KEY BLOB-----

This key is now wrapped for your TPM’s SRK and would only be usable on your laptop.  If you’re fortunate enough to be using an application linked with gnutls, you can simply use this key with the URI  tpmkey:file=<path to test.tpm.key>.  If you’re using openssl, you need to patch it to get it to use TPM keys easily (see below).

The ideal, however, would be since these are PEM files with unique guards, any ssl provider should simply recognise the guards  and load the key into the TPM  This means that in order to use a TPM key, you take a standard PEM private key file, transform it into a TPM key file and then simply copy it back to where the original key file was being used from and voila! you’re using a TPM based key.  This is what the openssl patches below do.

Getting TPM keys to “just work” with openssl

In openssl, external encryption processors, like the TPM or USB keys are used by things called engines.  The engine you need for the TPM is also in the openssl_tpm_engine package, so once you’ve installed that package, the engine is available.  Unfortunately, openssl doesn’t naturally use a particular engine unless told to do so (most of the openssl tools have a -engine option for this).  However, having to specify the engine in every application somewhat spoils the “just works” aspect we’re looking for, so the openssl patches here allow an engine to specify that it knows how to parse a PEM file and can load a key from it.  This allows you simply to replace the original key file with a TPM protected key file and have your application continue working with it.

As a demo of the usefulness, I’m using it on my current laptop with all my VPN keys.  It is also possible to use it with openssh keys, since they’re standard PEM files.  However, the way openssh works with agents means that the agent cannot handle the keys and you have to type the password (if you set one one the key) each time you use it.

It should be noted that the idea of having PEM based TPM keys just work in openssl is encountering resistance.  However, it does just work in gnutls (provided you change the file name to be a tpmkey:file= URL).

Conclusions (or How Well is it Working?)

As I said above, I’m currently using this scheme for my openvpn and ssh keys.  I have to confess, since I use openssh a lot, I got very tired of having to type the password on every ssh operation, so I’ve gone back to using non-TPM based keys which can be handled by the agent.  Fixing this is on my list of things to look at.  However, I still am using TPM based keys for my openvpn.

Even for openvpn, though there are hiccoughs: the trousers daemon, tcsd, crashes periodically on my platform.  When it does, the vpn goes down (because the VPN needs a key based authentication transaction every hour to rotate the symmetric encryption keys).  Unfortunately, just restarting tcsd isn’t enough because the design of trousers doesn’t seem to be robust to this failure (even though the tspi part linked with the application could recreate all the keys), so the VPN itself must be restarted when this happens, which makes it rather user unfriendly.  Fixing trousers to cope with tcsd failure is also on my list of things to fix …