Tag Archives: openssl engine

Papering Over our TPM 2.0 TSS Divisions

For years I’ve been hoping that the Trusted Computing Group (TCG) based IBM and Intel TSS (TCG Software Stack) would simply integrate with one another into a single package. The rationale is pretty simple: the Intel TSS is already quite a large collection of libraries so adding one more (the IBM TSS has a single library) wouldn’t be too much of a burden. Both TSSs are based on TCG specifications, except that the IBM TSS is based on the TPM 2.0 Library Specification and the Intel TSS is based on the TPM Software Stack (also, not at all confusingly, abbreviated TSS). There’s actually very little overlap between these specifications so co-existence seems very reasonable. Before we get into the stories of these two stacks and what they do, I should confess my biases: while I’ve worked with the TCG over the years, I’ve always harboured the view that the complete lack of adoption of TPM 2.0’s predecessor (TPM 1.2) was because of the hugely complicated nature of the TCG mandated software stack which was implemented in Linux by trousers. It is my firm belief that the complexity of the API lead to the lack of uptake, even though I made several efforts over the years to make use of it.

My primary interest in the TPM has been as a secure laptop keystore (since I already paid for a TPM, I didn’t see the need to fork out again for one of the new security dongles; plus the TPM is infinitely scalable in the number of keys, unlike most dongles). The key to making the TPM usable in this form is integration with existing Cryptographic systems (via plugins if they do them). Since openssl has an engine plugin, I’ve already produced an openssl TPM2 engine, patches for gnupg and engine integration patches for openvpn (upstream in 2.5) and openssh as well as a PKC11 exporter (to make file based engine keys exportable as PKCS11 tokens). Note a lot of the patches aren’t strictly TPM patches, they’re actually making openssl engines work in places they previously didn’t. However, the one thing most of the patches that actually touch the TPM have in common is that they have to pick one or other of the available TSSs to operate with. Before describing the TSS agnostic solution, lets look at why these two TSSs exist and what the difference is between them and why you might choose one over the other.

Schizophrenia at the TCG

As I said in the introduction, both TSSs are based on TCG specifications. These standards aren’t ambiguous: they lay out in excruciating detail what the header files are called and what the prototypes and structures have to be. Both TSS implementations are the way they are because they wouldn’t be following the standards if they deviated even slightly. The problem is the standards don’t agree with each other in meaningful ways. For instance the TPM Library standards define every structure in terms of the fundamental unit of TPM data: the TPM2B structure, which defines a 16 bit big endian length followed by a data unit of that length. The TPM Library standards (in Part 4 section 9.10.6) lay out that every TPM2B_X structure shall be a union of a ‘b’ element which is a TPM2B and a ‘t’ element which is the actual structure. However the TPM Software Stack specification eliminates the plain TPM2B so every TPM2B_X structure in the latter specification are not unions, they are simply the ‘t’ form of the structure. This means that although TPM2B_X structures in each specification are byte for byte the same, they are definitionally different when written as C code and can’t be assigned to each other … oops. The TPM Library standard lays out additional structures for an elaborate calling convention for the TPM2_Command interfaces which are completely different from the ESYS_Command interfaces in the TPM Software Stack.

The reason it’s all done this way? well the specifications were built by completely different committees for what the committees saw as separate use cases, so they didn’t see a need to reconcile the differences. As long as the definitions were byte for byte compatible, everything would work out correctly on the wire. The problem was the TPM Library specification was released nearly a decade ahead of the TPM Software Stack specification, so the first TSS created had to follow the former because the latter didn’t exist.

Sessions, HMAC and Encryption

One of the perennial problems of a TPM is that integrity and security of the information going over the wire is the responsibility of the user. However, the encryption and integrity computations involved, particularly the key derivations, are incredibly involved (even though well documented in the TPM Library specification, so naturally everyone would like the TSS to do this. The problem the TPM Secure Stack had is that all the way up to its ESAPI specification, the security and integrity computations were still the responsibility of the user, so it didn’t begin to be useful until ESAPI was finalized a couple of years ago.

The Resource Manager Problem

TPM 2.0 was designed to be far leaner in terms of resources than TPM 1.2, which meant there was a very small limit to the number of sessions and volatile objects it could contain at any one time. This necessitated the use of a “resource manager” to control access otherwise applications would get unexpected out of resource errors. The Intel TSS has its own resource manager. However, the Linux Kernel itself incorporated a resource manager in the TPM device in 4.12 and the IBM TSS avoids the need for its own resource manager by using this, and will, therefore not work correctly on earlier kernel versions.

Inside the IBM TSS

Even though the IBM TSS is based on a solid and easily comprehensible and detailed specification, that specification itself suffers from a couple of defects. The first being it assumes you’re submitting to a physical TPM, so the specification has no functional (library based) submission API for TPM commands, so the IBM TSS had to invent API it called TSS_Execute() which is a way of sending TPM commands directly to the physical TPM over the kernel’s device interfaces. Secondly, the standard contains no routing interfaces (telling it what destination the TPM is on: should it open the /dev/tpmrm0 device or send the commands to the TPM over an IP socket), so this is controlled in the IBM TSS by several environment variables (TPM_INTERFACE_TYPE, which can be either “dev” or “socsim” for either a physical device or a network socket. The endpoints being controlled by TPM_DEVICE for “dev” type, which specifies which device to use, defaulting to /dev/tpmrm0 or TPM_SERVER_NAME and TPM_PLAFORM_PORT for “socsim”).

The invented TSS_Execute() API also does all the encryption and HMAC parts necessary for secure and integrity verified communication with the TPM, so it acts as a fully functional TSS. The main drawback of the IBM TSS is that it stores essential information about the sessions and handles in files which will, by default, be dropped into the local directory. Most users of the IBM TSS have to set TPM_DATA_DIR to be a specially created directory under /tmp to avoid leaving messy artifacts in users home directories.

Inside the Intel TSS

The TPM Software Stack consists of a large number of different specifications, including the resource manager (which is now unnecessary for kernels above 4.12) the TCTI which specifies the routing information for the TPM. It turns out that even in the Intel TSS, environment variables are the most convenient form to specify this information but, unfortunately, the name of the environment variable has been left up to each use case instead of being standardised in the library meaning you’ll have to consult the man page to figure out what it is. The next set of standards: SAPI and ESAPI define functional interfaces to the TPM with one submission API for each command and additionally a corresponding ..._Async()/..._Finish() pair for asynchronous programming. The only real difference between SAPI and ESAPI is that the latter also does the necessary session cryptography for security and integrity, so it’s pretty much the only usable interface for TPM commands. Unfortunately, the ESAPI interface, as constructed by the TCG, has several cases of premature abstraction the worst of which is a separate abstraction for the TPM handle interface which lives only as long as the lifetime of the connection object and which necessitates multiple conversions to and from internal handle objects if your session or object lives longer than the connection (which can be the case).

There is one final wrinkle is that in the handle abstraction, ESAPI has no API for retrieving the real TPM handle. I’d always wondered why the Intel TSS tpm2 tools always saved the objects they create to a context instead of simply returning the handle to them, but this is the reason: without the ability to transform an internal handle to an external one, you either save the context or let the object die when the connection terminates. This problem is one forced by the ESAPI standard, but eventually it became enough of a problem that the Intel TSS introduced its own additional API to remedy.

The other major difference between the Intel and IBM TSSs is memory handling for returned results: The IBM TSS requires pre-allocated structures whereas the Intel TSS insists on allocation on return. It looks like the Intel TSS should be able to tell if the return pointer is allocated or NULL, but right at the moment it always allocates and overwrites the pointer.

Constructing a unifying Interface for both the IBM and Intel TSSs

In essence the process for converting something that runs with the IBM TSS to being TSS Agnostic is a fairly simple three step process which I’ll illustrate by reference to the openssl tpm2 engine which has already been converted:

  1. Hide the structural differences by inserting a set of macros: VAL() and VAL_2B() which hide most of the TCG induced structure schizophrenia.
  2. Convert the API call structure to be functional instead of via a single TSS_Execute() call. This is quite involved so I did it by adding tpm2_Function() wrappers for each specific invocation.
  3. Introduce the correct premature abstraction for internal and external representation of handles. This was the nastiest step for me because handles are stored in long lived engine structures, and the internal and external representations are both forms of uint32_t even in ESAPI (meaning the compiler won’t complain if you assign one to the other) so it was incredibly painful to get this conversion correct.

Once this is done, the remaining step was to introduce a header which did the impedance matching between the Intel and IBM TSSs and an autoconf macro to detect which TSS is installed and the resulting configure and compile just works. The resulting code will now build and run under either TSS. I should point out that the Intel TSS is missing several helper routines, but these are added into the intel-tss.h header file by copying the from the original IBM TSS. Finally an autoconf check is added to look for the missing internal to external handle transform, and everything is ready to go.

It does seem like it would be easier to port an existing Intel TSS application to the IBM TSS, since points 2 and 3 will already be sorted out. However, all the major TSS library using applications are IBM TSS based, so I haven’t actually been able to verify this.

Remaining Problems and Anomalies

The biggest remaining issue was the test scripts. The openssl TPM2 engine has 27 of them all told, all designed to check the engine function by invoking it via openssl when connected to a software TPM. These scripts are all highly dependent on the IBM TSS command line binaries and the Intel TSS versions seem to be very unstable in terms of argument structure making it pretty much impossible to convert, so I elected finally to have the tests run only if the IBM TSS CLI is installed. The next problem was that the Intel TSS version of the engine didn’t actually pass all the tests. However this was quickly narrowed down to a bug in the Intel TSS when using bound sessions on the NULL seed.

The sole remaining issue is a curious performance anomaly. When running time make check with the IBM TSS, the result is:

real 0m6.100s
user 0m2.827s
sys 0m0.822s

and the same command with the Intel TSS (running one fewer test and skipping the NULL seed) is:

real	0m10.948s
user	0m6.822s
sys	0m0.859s

Showing that the Intel TSS is nearly twice as slow as the IBM one with most of the time differential being user time. Since the tests use a software TPM which can perform the cryptographic operations at the speed of the main CPU, this is showing some type of issue with the command transmission system of the Intel TSS, likely having to do with the fact that most applications use synchronous TPM operations (the engine certainly does) but in the Intel TSS, the synchronous operations are implemented as the corresponding asynchronous pair. Regardless of the root cause, this is unlikely to be a problem with real world TPM crypto where the time taken for any operation will be dominated by the slowness of the physical TPM.

Conclusion

The TSS agnostic scheme adopted by the openssl TPM2 engine should be easily adaptable for all the other non-engine TPM code bases, and thus should pave the way for users not having to choose between applications which only support the Intel or IBM TSSs and can choose to install the best supported one on their distribution. The next steps are to investigate adapting this infrastructure to the existing gnupg patches (done and upstream) and also see if it can be used to solve the gnutls conundrum over supporting TPM based keys.