Papering Over our TPM 2.0 TSS Divisions

For years I’ve been hoping that the Trusted Computing Group (TCG) based IBM and Intel TSS (TCG Software Stack) would simply integrate with one another into a single package. The rationale is pretty simple: the Intel TSS is already quite a large collection of libraries so adding one more (the IBM TSS has a single library) wouldn’t be too much of a burden. Both TSSs are based on TCG specifications, except that the IBM TSS is based on the TPM 2.0 Library Specification and the Intel TSS is based on the TPM Software Stack (also, not at all confusingly, abbreviated TSS). There’s actually very little overlap between these specifications so co-existence seems very reasonable. Before we get into the stories of these two stacks and what they do, I should confess my biases: while I’ve worked with the TCG over the years, I’ve always harboured the view that the complete lack of adoption of TPM 2.0’s predecessor (TPM 1.2) was because of the hugely complicated nature of the TCG mandated software stack which was implemented in Linux by trousers. It is my firm belief that the complexity of the API lead to the lack of uptake, even though I made several efforts over the years to make use of it.

My primary interest in the TPM has been as a secure laptop keystore (since I already paid for a TPM, I didn’t see the need to fork out again for one of the new security dongles; plus the TPM is infinitely scalable in the number of keys, unlike most dongles). The key to making the TPM usable in this form is integration with existing Cryptographic systems (via plugins if they do them). Since openssl has an engine plugin, I’ve already produced an openssl TPM2 engine, patches for gnupg and engine integration patches for openvpn (upstream in 2.5) and openssh as well as a PKC11 exporter (to make file based engine keys exportable as PKCS11 tokens). Note a lot of the patches aren’t strictly TPM patches, they’re actually making openssl engines work in places they previously didn’t. However, the one thing most of the patches that actually touch the TPM have in common is that they have to pick one or other of the available TSSs to operate with. Before describing the TSS agnostic solution, lets look at why these two TSSs exist and what the difference is between them and why you might choose one over the other.

Schizophrenia at the TCG

As I said in the introduction, both TSSs are based on TCG specifications. These standards aren’t ambiguous: they lay out in excruciating detail what the header files are called and what the prototypes and structures have to be. Both TSS implementations are the way they are because they wouldn’t be following the standards if they deviated even slightly. The problem is the standards don’t agree with each other in meaningful ways. For instance the TPM Library standards define every structure in terms of the fundamental unit of TPM data: the TPM2B structure, which defines a 16 bit big endian length followed by a data unit of that length. The TPM Library standards (in Part 4 section 9.10.6) lay out that every TPM2B_X structure shall be a union of a ‘b’ element which is a TPM2B and a ‘t’ element which is the actual structure. However the TPM Software Stack specification eliminates the plain TPM2B so every TPM2B_X structure in the latter specification are not unions, they are simply the ‘t’ form of the structure. This means that although TPM2B_X structures in each specification are byte for byte the same, they are definitionally different when written as C code and can’t be assigned to each other … oops. The TPM Library standard lays out additional structures for an elaborate calling convention for the TPM2_Command interfaces which are completely different from the ESYS_Command interfaces in the TPM Software Stack.

The reason it’s all done this way? well the specifications were built by completely different committees for what the committees saw as separate use cases, so they didn’t see a need to reconcile the differences. As long as the definitions were byte for byte compatible, everything would work out correctly on the wire. The problem was the TPM Library specification was released nearly a decade ahead of the TPM Software Stack specification, so the first TSS created had to follow the former because the latter didn’t exist.

Sessions, HMAC and Encryption

One of the perennial problems of a TPM is that integrity and security of the information going over the wire is the responsibility of the user. However, the encryption and integrity computations involved, particularly the key derivations, are incredibly involved (even though well documented in the TPM Library specification, so naturally everyone would like the TSS to do this. The problem the TPM Secure Stack had is that all the way up to its ESAPI specification, the security and integrity computations were still the responsibility of the user, so it didn’t begin to be useful until ESAPI was finalized a couple of years ago.

The Resource Manager Problem

TPM 2.0 was designed to be far leaner in terms of resources than TPM 1.2, which meant there was a very small limit to the number of sessions and volatile objects it could contain at any one time. This necessitated the use of a “resource manager” to control access otherwise applications would get unexpected out of resource errors. The Intel TSS has its own resource manager. However, the Linux Kernel itself incorporated a resource manager in the TPM device in 4.12 and the IBM TSS avoids the need for its own resource manager by using this, and will, therefore not work correctly on earlier kernel versions.

Inside the IBM TSS

Even though the IBM TSS is based on a solid and easily comprehensible and detailed specification, that specification itself suffers from a couple of defects. The first being it assumes you’re submitting to a physical TPM, so the specification has no functional (library based) submission API for TPM commands, so the IBM TSS had to invent API it called TSS_Execute() which is a way of sending TPM commands directly to the physical TPM over the kernel’s device interfaces. Secondly, the standard contains no routing interfaces (telling it what destination the TPM is on: should it open the /dev/tpmrm0 device or send the commands to the TPM over an IP socket), so this is controlled in the IBM TSS by several environment variables (TPM_INTERFACE_TYPE, which can be either “dev” or “socsim” for either a physical device or a network socket. The endpoints being controlled by TPM_DEVICE for “dev” type, which specifies which device to use, defaulting to /dev/tpmrm0 or TPM_SERVER_NAME and TPM_PLAFORM_PORT for “socsim”).

The invented TSS_Execute() API also does all the encryption and HMAC parts necessary for secure and integrity verified communication with the TPM, so it acts as a fully functional TSS. The main drawback of the IBM TSS is that it stores essential information about the sessions and handles in files which will, by default, be dropped into the local directory. Most users of the IBM TSS have to set TPM_DATA_DIR to be a specially created directory under /tmp to avoid leaving messy artifacts in users home directories.

Inside the Intel TSS

The TPM Software Stack consists of a large number of different specifications, including the resource manager (which is now unnecessary for kernels above 4.12) the TCTI which specifies the routing information for the TPM. It turns out that even in the Intel TSS, environment variables are the most convenient form to specify this information but, unfortunately, the name of the environment variable has been left up to each use case instead of being standardised in the library meaning you’ll have to consult the man page to figure out what it is. The next set of standards: SAPI and ESAPI define functional interfaces to the TPM with one submission API for each command and additionally a corresponding ..._Async()/..._Finish() pair for asynchronous programming. The only real difference between SAPI and ESAPI is that the latter also does the necessary session cryptography for security and integrity, so it’s pretty much the only usable interface for TPM commands. Unfortunately, the ESAPI interface, as constructed by the TCG, has several cases of premature abstraction the worst of which is a separate abstraction for the TPM handle interface which lives only as long as the lifetime of the connection object and which necessitates multiple conversions to and from internal handle objects if your session or object lives longer than the connection (which can be the case).

There is one final wrinkle is that in the handle abstraction, ESAPI has no API for retrieving the real TPM handle. I’d always wondered why the Intel TSS tpm2 tools always saved the objects they create to a context instead of simply returning the handle to them, but this is the reason: without the ability to transform an internal handle to an external one, you either save the context or let the object die when the connection terminates. This problem is one forced by the ESAPI standard, but eventually it became enough of a problem that the Intel TSS introduced its own additional API to remedy.

The other major difference between the Intel and IBM TSSs is memory handling for returned results: The IBM TSS requires pre-allocated structures whereas the Intel TSS insists on allocation on return. It looks like the Intel TSS should be able to tell if the return pointer is allocated or NULL, but right at the moment it always allocates and overwrites the pointer.

Constructing a unifying Interface for both the IBM and Intel TSSs

In essence the process for converting something that runs with the IBM TSS to being TSS Agnostic is a fairly simple three step process which I’ll illustrate by reference to the openssl tpm2 engine which has already been converted:

  1. Hide the structural differences by inserting a set of macros: VAL() and VAL_2B() which hide most of the TCG induced structure schizophrenia.
  2. Convert the API call structure to be functional instead of via a single TSS_Execute() call. This is quite involved so I did it by adding tpm2_Function() wrappers for each specific invocation.
  3. Introduce the correct premature abstraction for internal and external representation of handles. This was the nastiest step for me because handles are stored in long lived engine structures, and the internal and external representations are both forms of uint32_t even in ESAPI (meaning the compiler won’t complain if you assign one to the other) so it was incredibly painful to get this conversion correct.

Once this is done, the remaining step was to introduce a header which did the impedance matching between the Intel and IBM TSSs and an autoconf macro to detect which TSS is installed and the resulting configure and compile just works. The resulting code will now build and run under either TSS. I should point out that the Intel TSS is missing several helper routines, but these are added into the intel-tss.h header file by copying the from the original IBM TSS. Finally an autoconf check is added to look for the missing internal to external handle transform, and everything is ready to go.

It does seem like it would be easier to port an existing Intel TSS application to the IBM TSS, since points 2 and 3 will already be sorted out. However, all the major TSS library using applications are IBM TSS based, so I haven’t actually been able to verify this.

Remaining Problems and Anomalies

The biggest remaining issue was the test scripts. The openssl TPM2 engine has 27 of them all told, all designed to check the engine function by invoking it via openssl when connected to a software TPM. These scripts are all highly dependent on the IBM TSS command line binaries and the Intel TSS versions seem to be very unstable in terms of argument structure making it pretty much impossible to convert, so I elected finally to have the tests run only if the IBM TSS CLI is installed. The next problem was that the Intel TSS version of the engine didn’t actually pass all the tests. However this was quickly narrowed down to a bug in the Intel TSS when using bound sessions on the NULL seed.

The sole remaining issue is a curious performance anomaly. When running time make check with the IBM TSS, the result is:

real 0m6.100s
user 0m2.827s
sys 0m0.822s

and the same command with the Intel TSS (running one fewer test and skipping the NULL seed) is:

real	0m10.948s
user	0m6.822s
sys	0m0.859s

Showing that the Intel TSS is nearly twice as slow as the IBM one with most of the time differential being user time. Since the tests use a software TPM which can perform the cryptographic operations at the speed of the main CPU, this is showing some type of issue with the command transmission system of the Intel TSS, likely having to do with the fact that most applications use synchronous TPM operations (the engine certainly does) but in the Intel TSS, the synchronous operations are implemented as the corresponding asynchronous pair. Regardless of the root cause, this is unlikely to be a problem with real world TPM crypto where the time taken for any operation will be dominated by the slowness of the physical TPM.

Conclusion

The TSS agnostic scheme adopted by the openssl TPM2 engine should be easily adaptable for all the other non-engine TPM code bases, and thus should pave the way for users not having to choose between applications which only support the Intel or IBM TSSs and can choose to install the best supported one on their distribution. The next steps are to investigate adapting this infrastructure to the existing gnupg patches (done and upstream) and also see if it can be used to solve the gnutls conundrum over supporting TPM based keys.

13 thoughts on “Papering Over our TPM 2.0 TSS Divisions

  1. Pingback: Activitypub announce from James Bottomley

  2. foofoo

    Hmm. One question though: which of the two libraries is the better one to use in new programs? Any opinion on that? I.e. which one is more commonly used, easier to handle, better tested and so on?

    Reply
    1. jejb Post author

      Anecdotal evidence says the IBM one, which I’ve been using for the past five years. The problem is that neither is in wide use for cryptographic applications, so a population of 1 (me) is hardly a good study. As I said in the blog, as soon as I ran the existing crypto unit tests (which the IBM TSS passes) over the Intel TSS, it failed. However, the Intel TSS has now fixed all the issues, so I’d say if you’re building this for an older distro, you need to use the IBM one; if it’s for a future release which has the latest Intel TSS (the regression test failure issue should make it into version 3.0.4), then it should work fine.

      Reply
      1. foofoo

        Hmm, but it appears the IBM one is not packaged for Fedora/Red Hat distros at all, and pocon on Debian suggests it’s almost unused compared to the Intel one.

        Reply
        1. jejb Post author

          It’s packaged in openSUSE too, which is the desktop distribution I use. Pretty much all of the TPM libraries are unused even if they are installed. I think the Intel TSS gets some workout from the attestation servers, like Keylime, but they’re experimental too. As far as I can tell I’m pretty much the only person using TPM key files for my ssh/gnupg keys on a regular basis; I’m sure there will be plenty of teething troubles to be found once it starts being used for key protection on a regular basis. The only data point I can give you is that the IBM TSS is the one I use and it’s been stable for five years now.

          Reply
  3. Erik

    Hi,

    “There is one final wrinkle is that in the handle abstraction, ESAPI has no API for retrieving the real TPM handle. I’d always wondered why the Intel TSS tpm2 tools always saved the objects they create to a context instead of simply returning the handle to them, but this is the reason: without the ability to transform an internal handle to an external one, you either save the context or let the object die when the connection terminates. This problem is one forced by the ESAPI standard, but eventually it became enough of a problem that the Intel TSS introduced its own additional API to remedy.”

    What is the alternative? or who should be responsible for flushing unused handles?
    For example, the TPM has limits on how many live sessions there can be (at least for authorization) which can be reached if the handles aren’t flushed (this includes saved contexts of the sessions as well).
    With that said I’m a bit curious in which cases you need to go from an ESYS_TR to an numeric TPM2 handle?

    ” However, all the major TSS library using applications are IBM TSS based, so I haven’t actually been able to verify this.”
    What applications are you thinking of in this case? I’m asking because my experience has very much been the reverse and it seems I missed some cool/useful applications out there.

    You also mentioned patches for openssh to use an openssl engine for keys, I have seen that been discussed by others as well, but I’m interested to understand the use of openssl engines instead of using an external ssh agent (using HostKeyAgent as I guess it’s mostly related to servers), what kind of pros and cons do you see with does two solutions?

    Reply
    1. jejb Post author

      What is the alternative? or who should be responsible for flushing unused handles?

      The handle abstraction doesn’t seem to be about flushing … even using ESAPI you’re responsible for managing the loaded handles.

      With that said I’m a bit curious in which cases you need to go from an ESYS_TR to an numeric TPM2 handle?

      As the article said: any time the use lives beyond the context or any time the handle is exposed externally or might be used by a different context.

      What applications are you thinking of in this case? I’m asking because my experience has very much been the reverse and it seems I missed some cool/useful applications out there.

      Well, I only know about what I’ve done … I haven’t found any other code I could use based on either the IBM or Intel TSS. What I use mostly is the engine, for openssh and openvpn. The openvpn has my keys tied to secure boot policy not password. I use openssh via the posted patches and the gnome ssh-agent based keyring. All my gpg keys are TPM based and, since I have my own secure boot keys, my kernel signing process is all TPM based. I also run a couple of CAs which have TPM based roots. And I have a couple of client web certificates exported via the pkcs#11 exporter.

      interested to understand the use of openssl engines instead of using an external ssh agent (using HostKeyAgent as I guess it’s mostly related to servers)

      HostKeyAgent is used by an ssh server to serve the host key over an agent. I actually don’t use this at all since my main cloud system doesn’t have a TPM, so it just has a root only unprotected host key as is the norm. I have a ton of client side logins, so I use the engine patches to openssh to use about five client keys via the TPM.

      Reply
  4. William C Roberts

    Theirs quite a few misconceptions here, Ill attempt to address them.

    The statement, “including the resource manager (which is now unnecessary for kernels above 4.12”, is not true. The in-kernel RM is still missing key features, like ungapping.

    In regards to the ESAPI ESYS_TR abstraction, the WHY here is important. It’s meant to prevent the pitfall of users not checking the name of an object. One cannot simply trust the handle value.
    If you want to use a raw handle, its best to know the objects name so you can ensure your getting the object you expect. Just using a raw handle number can be dangerous. For instance, if your starting an auth session with an existing persistent primary key, you need both the handle and the name to verify you’re starting an auth session with the proper key. Rather than requiring users to pass both bits of data around, the ESYS_TR API handles this for the user via ESYS_TR_Serialize and Deserialize calls.

    Regarding the paragraph around, “There is one final wrinkle is that in the handle abstraction, ESAPI has no API for retrieving the real TPM handle”. ESAPI spec since r08 (May 2021) provides the API Esys_TR_GetTpmHandle exists. See: https://trustedcomputinggroup.org/wp-content/uploads/TSS_ESAPI_v1p0_r08_pub.pdf. So it is a standard interface defined by ESAPI.

    However, out of all of the tpm2-software projects, the only consumer was tpm2-pkcs11, and that appears that it can be dropped: https://github.com/tpm2-software/tpm2-pkcs11/pull/669
    If you need that function, it’s likely a red herring that you should do something else.

    To correct this point, “I’d always wondered why the Intel TSS tpm2 tools always saved the objects they create to a context instead of simply returning the handle to them”. The reason is due to resource managers not the ESYS_TR abstraction. This problem existed even when the tools were built with SAPI. When a process exits, the RM will flush objects. So if you want to access your transient object from say a TPM2_Create call to a TPM2_Sign call, when using an RM, you need to either:
    1. Do all commands in one process completely.
    2. Provide a shim between the RM and the tool to keep it open
    3. Context Save/Context Load between command invocations and pass the context file.

    We’ve explored all the options, and item 3 is the approach used by tpm2-tools. It just works, doesn’t need any other additional pieces and allows the tools to stay decomposed into commandlets.
    So the file passed via -c, is the context file and exists because of RM’s.

    Clarifying, “The other major difference between the Intel and IBM TSSs is memory handling for returned results: The IBM TSS requires pre-allocated structures whereas the Intel TSS insists on allocation on return”. Depends on the API. The System API is all pre-allocated by caller, the ESAPI is callee allocated. So you can’t just say TSS, because it depends on the library itself.

    Clarifying, “Showing that the Intel TSS is nearly twice as slow as the IBM one with most of the time differential being user time.”

    Perhaps tpm2-software/tpm2-tss runs more tests? The tpm2-software/tpm2-tss is a *published* 85% code coverage, what’s the IBM’s TSS coverage?.

    However, one simply cannot run make check as the performance test. This assumes that the test code between the two projects is identical, which is false. Test codes have different setup and teardown routines as well as differing coverages. This is not comparing apples to oranges. To get actual metrics, the performance test code would need to be the same or as close as possible given the API differences.

    Also, its not the Intel TSS. Although it may have been initial started by Intel, since that time numerous external companies and individuals have collaborated. With the initial and substantial
    ESAPI and FAPI contributions coming Fraunhofer SIT. To celebrate this Open Source community, the organization name is now tpm2-software. Please refer to it as such.

    Reply
    1. jejb Post author

      The statement, “including the resource manager (which is now unnecessary for kernels above 4.12”, is not true. The in-kernel RM is still missing key features, like ungapping.

      I think you’ll find regapping is the only missing feature. There is a patch for it but the Intel guy running the project though it was unnecessary which is why it’s not upstream. We can certainly incorporate the patch if you make a case for it on the list.

      Clarifying, “The other major difference between the Intel and IBM TSSs is memory handling for returned results: The IBM TSS requires pre-allocated structures whereas the Intel TSS insists on allocation on return”. Depends on the API. The System API is all pre-allocated by caller, the ESAPI is callee allocated. So you can’t just say TSS, because it depends on the library itself.

      Because this is cryptography and requires encryption and HMAC authentication sessions, SAPI isn’t an option, so in all points I’m just talking about the ESAPI API.

      Clarifying, “Showing that the Intel TSS is nearly twice as slow as the IBM one with most of the time differential being user time.”

      Perhaps tpm2-software/tpm2-tss runs more tests? The tpm2-software/tpm2-tss is a *published* 85% code coverage, what’s the IBM’s TSS coverage?.

      Actually, I think there’s a misunderstanding here: I’m not running the regression tests that come with the TSS, I’m running the regression tests that come with my engine. Now it’s TSS agnostic it can theoretically run the same test suite for both, which gives a direct comparison, although the Intel TSS has several fewer tests than the IBM TSS because of various failures, which have been reported upstream but which lead to some tests being disabled. However, the figures of 2x slower are on my tests even with the IBM TSS running more of them.

      Reply
      1. William C Roberts

        The one feature I need, is that when a session is context saved, the process exits, that session is not flushed and can be context loaded later. Not sure if that is ungapping or not, but it’s the single feature that keeps me from EOL’ing tpm2-abrmd.

        For the esapi, that was a feature I considered adding. Allowing callee allocated buffers by checking the pointer value, if not null allocate, else use. But I was concerned existing application code could perhaps not be initializing ptr values, so I didn’t want to make them update for that.

        When you say, “I’m running the regression tests that come with my engine”. Are you referring to your openssl engine?

        Reply
        1. jejb Post author

          The one feature I need, is that when a session is context saved, the process exits, that session is not flushed and can be context loaded later. Not sure if that is ungapping or not, but it’s the single feature that keeps me from EOL’ing tpm2-abrmd.

          Yes, that’s what causes gapping problems … or in the kernel rm case simply keeping a long lived session while others use the TPM. However, I was forced to admit when questioned that in all my current TPM code none of the sessions is long lived: they all serve a single TPM command, so I couldn’t point to any use case I had for a long lived session and Jarkko wanted to keep the rm code simple.

          When you say, “I’m running the regression tests that come with my engine”. Are you referring to your openssl engine?

          Yes, the suite of tests that comes with the newly TSS agnostic openssl engine.

          Reply
          1. William C Roberts

            I don’t know why I missed what you were doing with make check, its pretty clear.

            But we definitely need that feature for a robust command line tooling to pass sessions along between process lifetimes. It seems tpm2-tools and bash has become the programming language of choice for quite a few projects, so this support is crucial in supporting them.

            FYI tpm2-tools is also CLI API stable at 4.0+. Don’t use anything before 4.0, which is in bold on the project page.

  5. William C Roberts

    Additionally, the nested structures introduce:
    1. Non-needed level of nesting for common users. They have to initialize the .t or access via yet another named element. Really the only spot where .b makes sense is for marshaling libraries, which that’s easy enough to do without it.
    2. Discards compile time bounds checking (-Warray-bounds). accessing through .b over .t.

    From an API perspective, it seemed best to not:
    1. complicate the definitions
    2. make users have to type .t or .b
    3. provide users a foot cannon around -Warray-bounds, see: https://gist.github.com/williamcroberts/0006f4990e19f9a47da897e9aadfc83c

    In the tpm2-software/tpm2-tss code base, it ended up also reducing the LoC (-30) as well as simplify everything by needing .t and .b in everything. The marshalling code didn’t even have to change, see this commit:
    https://github.com/tpm2-software/tpm2-tss/commit/a64c33eb58ff289e9cfb83448ec9e9cbb53dfe3e

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.