The Mythical Economic Model of Open Source

1 Reply

It has become fashionable today to study open source through the lens of economic benefits to developers and sometimes draw rather alarming conclusions. It has also become fashionable to assume a business model tie and then berate the open source community, or their licences, for lack of leadership when the business model fails. The purpose of this article is to explain, in the first part, the fallacy of assuming any economic tie in open source at all and, in the second part, go on to explain how economics in open source is situational and give an overview of some of the more successful models.

Open Source is a Creative Intellectual Endeavour

All the creative endeavours of humanity, like art, science or even writing code, are often viewed as activities that produce societal benefit. Logically, therefore, the people who engage in them are seen as benefactors of society, but assuming people engage in these endeavours purely to benefit society is mostly wrong. People engage in creative endeavours because it satisfies some deep need within themselves to exercise creativity and solve problems often with little regard to the societal benefit. The other problem is that the more directed and regimented a creative endeavour is, the less productive its output becomes. Essentially to be truly creative, the individual has to be free to pursue their own ideas. The conundrum for society therefore is how do you harness this creativity for societal good if you can’t direct it without stifling the very creativity you want to harness? Obviously society has evolved many models that answer this (universities, benefactors, art incubation programmes, museums, galleries and the like) with particular inducements like funding, collaboration, infrastructure and so on.

Why Open Source development is better than Proprietary

Simply put, the Open Source model, involving huge freedoms to developers to decide direction and great opportunities for collaboration stimulates the intellectual creativity of those developers to a far greater extent than when you have a regimented project plan and a specific task within it. The most creatively deadening job for any engineer is to find themselves strictly bound within the confines of a project plan for everything. This, by the way, is why simply allowing a percentage of paid time for participating in Open Source seems to enhance input to proprietary projects: the liberated creativity has a knock on effect even in regimented development. However, obviously, the goal for any Corporation dependent on code development should be to go beyond the knock on effect and actually employ open source methodologies everywhere high creativity is needed.

What is Open Source?

Open Source has it’s origin in code sharing models, permissive from BSD and reciprocal from GNU. However, one of its great values is the reasons why people do open source aren’t the same reasons why the framework was created in the first place. Today Open Source is a framework which stimulates creativity among developers and helps them create communities, provides economic benefits to corportations (provided they understand how to harness them) and produces a great societal good in general in terms of published reusable code.

Economics and Open Source

As I said earlier, the framework of Open Source has no tie to economics, in the same way things like artistic endeavour don’t. It is possible for a great artist to make money (as Picasso did), but it’s equally possible for a great artist to live all their lives in penury (as van Gough did). The demonstration of the analogy is that trying to measure the greatness of the art by the income of the artist is completely wrong and shortsighted. Developing the ability to exploit your art for commercial gain is an additional skill an artist can develop (or not, as they choose) it’s also an ability they could fail in and in all cases it bears no relation to the societal good their art produces. In precisely the same way, finding an economic model that allows you to exploit open source (either individually or commercially) is firstly a matter of choice (if you have other reasons for doing Open Source, there’s no need to bother) and secondly not a guarantee of success because not all models succeed. Perhaps the easiest way to appreciate this is through the lens of personal history.

Why I got into Open Source

As a physics PhD student, I’d always been interested in how operating systems functioned, but thanks to the BSD lawsuit and being in the UK I had no access to the actual source code. When Linux came along as a distribution in 1992, it was a revelation: not only could I read the source code but I could have a fully functional UNIX like system at home instead of having to queue for time to write up my thesis in TeX on the limited number of department terminals.

After completing my PhD I was offered a job looking after computer systems in the department and my first success was shaving a factor of ten off the computing budget by buying cheap pentium systems running Linux instead of proprietary UNIX workstations. This success was nearly derailed by an NFS bug in Linux but finding and fixing the bug (and getting it upstream into the 1.0.2 kernel) cemented the budget savings and proved to the department that we could handle this new technology for a fraction of the cost of the old. It also confirmed my desire to poke around in the Operating System which I continued to do, even as I moved to America to work on Proprietary software.

In 2000 I got my first Open Source break when the product I’d been working on got sold to a silicon valley startup, SteelEye, whose business plan was to bring High Availability to Linux. As the only person on the team with an Open Source track record, I became first the Architect and later CTO of the company, with my first job being to make the somewhat eccentric Linux SCSI subsystem work for the shared SCSI clusters LifeKeeper then used. Getting SCSI working lead to fund interactions with the Linux community, an Invitation to present on fixing SCSI to the Kernel Summit in 2002 and the maintainership of SCSI in 2003. From that point, working on upstream open source became a fixture of my Job requirements but progressing through Novell, Parallels and now IBM it also became a quality sought by employers.

I have definitely made some money consulting on Open Source, but it’s been dwarfed by my salary which does get a boost from my being an Open Source developer with an external track record.

The Primary Contributor Economic Models

Looking at the active contributors to Open Source, the primary model is that either your job description includes working on designated open source projects so you’re paid to contribute as your day job
or you were hired because of what you’ve already done in open source and contributing more is a tolerated use of your employer’s time, a third, and by far smaller group is people who work full-time on Open Source but fund themselves either by shared contributions like patreon or tidelift or by actively consulting on their projects. However, these models cover existing contributors and they’re not really a route to becoming a contributor because employers like certainty so they’re unlikely to hire someone with no track record to work on open source, and are probably not going to tolerate use of their time for developing random open source projects. This means that the route to becoming a contributor, like the route to becoming an artist, is to begin in your own time.

Users versus Developers

Open Source, by its nature, is built by developers for developers. This means that although the primary consumers of open source are end users, they get pretty much no say in how the project evolves. This lack of user involvement has been lamented over the years, especially in projects like the Linux Desktop, but no real community solution has ever been found. The bottom line is that users often don’t know what they want and even if they do they can’t put it in technical terms, meaning that all user driven product development involves extensive and expensive product research which is far beyond any open source project. However, this type of product research is well within the ability of most corporations, who can also afford to hire developers to provide input and influence into Open Source projects.

Business Model One: Reflecting the Needs of Users

In many ways, this has become the primary business model of open source. The theory is simple: develop a traditional customer focussed business strategy and execute it by connecting the gathered opinions of customers to the open source project in exchange for revenue for subscription, support or even early shipped product. The business value to the end user is simple: it’s the business value of the product tuned to their needs and the fact that they wouldn’t be prepared to develop the skills to interact with the open source developer community themselves. This business model starts to break down if the end users acquire developer sophistication, as happens with Red Hat and Enterprise users. However, this can still be combatted by making sure its economically unfeasible for a single end user to match the breadth of the offering (the entire distribution). In this case, the ability of the end user to become involved in individual open source projects which matter to them is actually a better and cheaper way of doing product research and feeds back into the synergy of this business model.

This business model entirely breaks down when, as in the case of the cloud service provider, the end user becomes big enough and technically sophisticated enough to run their own distributions and sees doing this as a necessary adjunct to their service business. This means that you can no-longer escape the technical sophistication of the end user by pursuing a breadth of offerings strategy.

Business Model Two: Drive Innovation and Standardization

Although venture capitalists (VCs) pay lip service to the idea of constant innovation, this isn’t actually what they do as a business model: they tend to take an innovation and then monetize it. The problem is this model doesn’t work for open source: retaining control of an open source project requires a constant stream of innovation within the source tree itself. Single innovations get attention but unless they’re followed up with another innovation, they tend to give the impression your source tree is stagnating, encouraging forks. However, the most useful property of open source is that by sharing a project and encouraging contributions, you can obtain a constant stream of innovation from a well managed community. Once you have a constant stream of innovation to show, forking the project becomes much harder, even for a cloud service provider with hundreds of developers, because they must show they can match the innovation stream in the public tree. Add to that Standardization which in open source simply means getting your project adopted for use by multiple consumers (say two different clouds, or a range of industry). Further, if the project is largely run by a single entity and properly managed, seeing the incoming innovations allows you to recruit the best innovators, thus giving you direct ownership of most of the innovation stream. In the early days, you make money simply by offering user connection services as in Business Model One, but the ultimate goal is likely acquisition for the talent possesed, which is a standard VC exit strategy.

All of this points to the hypothesis that the current VC model is wrong. Instead of investing in people with the ideas, you should be investing in people who can attract and lead others with ideas

Other Business Models

Although the models listed above have proven successful over time, they’re by no means the only possible ones. As the space of potential business models gets explored, it could turn out they’re not even the best ones, meaning the potential innovation a savvy business executive might bring to open source is newer and better business models.

Conclusions

Business models are optional extras with open source and just because you have a successful open source project does not mean you’ll have an equally successful business model unless you put sufficient thought into constructing and maintaining it. Thus a successful open source start up requires three elements: A sound business model, or someone who can evolve one, a solid community leader and manager and someone with technical ability in the problem space.

If you like working in Open Source as a contributor, you don’t necessarily have to have a business model at all and you can often simply rely on recognition leading to opportunities that provide sufficient remuneration.

Although there are several well known business models for exploiting open source, there’s no reason you can’t create your own different one but remember: a successful open source project in no way guarantees a successful business model.

A Roadmap for Eliminating Patents in Open Source

9 Replies

The realm of Software Patents is often considered to be a fairly new field which isn’t really influenced by anything else that goes on in the legal lansdcape. In particular there’s a very old field of patent law called exhaustion which had, up until a few years ago, never been applied to software patents. This lack of application means that exhaustion is rarely raised as a defence against infringement and thus it is regarded as an untested strategy. Van Lindberg recently did a FOSDEM presentation containing interesting ideas about how exhaustion might apply to software patents in the light of recent court decisions. The intriguing possibility this offers us is that we may be close to an enforceable court decision (at least in the US) that would render all patents in open source owned by community members exhausted and thus unenforceable. The purpose of this blog post is to explain the current landscape and how we might be able to get the necessary missing court decisions to make this hope a reality.

What is Patent Exhaustion?

Patent law is ancient, going back to Greece in around 500BC. However, every legal system has been concerned that patent holders, being an effective monopoly with the legal right to exclude others, did not abuse that monopoly position. This lead to the concept that if you used your monopoly power to profit, you should only be able to do it once for the same item so that absolute property rights couldn’t be clouded by patents. This leads to something called the exhaustion doctrine: so if Alice holds a patent on some item which she sells to Bob and Bob later sells the same item to Charlie, Alice can’t force Bob or Charlie to give her a part of their sale proceeds in exchange for her allowing Charlie to practise the patent on the item. The patent rights are said to be exhausted with the sale from Alice to Bob, so there are no patent rights left to enforce on Charlie. The exhaustion doctrine has since been expanded to any authorized transfer, even if no money changes hands (so if Alice simply gave Bob the item instead of selling it, the patent still exhausts at that transaction and Bob is still free to give or sell the item to Charlie without interference from Alice).

Of course, modern US patent rights have been around now for two centuries and in that time manufacturers have tried many ingenious schemes to get around the exhaustion doctrine profitably, all of which have so far failed in the courts, leading to quite a wealth of case law on the subject. The most interesting recent example (Lexmark v Impression) was over whether a patent holder could use their patent power to enforce any onward conditions at all for which the US Supreme Court came to the conclusive finding: they can’t and goes on to say that all patent rights in the item terminate in the first authorized transfer. That doesn’t mean no post sale conditions can be imposed, they can by contract or licence or other means, it just means post sale conditions can’t be enforced by patent actions. This is the bind for Lexmark: their sales contracts did specify that empty cartridges couldn’t be resold, so their customers violated that contract by selling the cartridges to Impression to refill and resell. However, that contract was between Lexmark and the customer not Lexmark and Impression, so absent patent remedies Lexmark has no contractual case against Impression, only against its own customers.

Can Exhaustion apply if Software isn’t actually sold?

The exhaustion doctrine actually has an almost identical equivalent for copyright called the First Sale doctrine. Back when software was being commercialized, no software distributor liked the idea that copyright in software exhausts after it is sold, so the idea of licensing instead of selling software was born, which is why you always get that end user licence agreement for software you think you bought. However, this makes all software (including open source) a very tricky for patent exhaustion because there’s no first sale to exhaust the rights.

The idea that Exhaustion didn’t have to involve an exchange of something (so became authorized transfer instead of first sale) in US law is comparatively recent, dating to a 2013 decision LifeScan v Shasta where one point won on appeal was that giving away devices did exhaust the patent. The idea that authorized transfer could extend to software downloads really dates to Cascades v Samsung in 2014.

The bottom line is that exhaustion does apply to software and downloading is an authorized transfer within the meaning of the Exhaustion Doctrine.

The Implications of Lexmark v Impression for Open Source

The precedent for Open Source is quite clear: Patents cannot be used to impose onward conditions that the copyright licence doesn’t. For instance the Open Air Interface 5G alliance public licence attempts just such a restriction in clause 3 “Grant of Patent License” where it tries to restrict the grant to being only if you use the source for “study and research” otherwise you need an additional patent licence from OAI. Lexmark v Impressions makes that clause invalid in the licence: once you obtain open source under the OAI licence, the OAI patents exhaust at that point and there are no onward patent rights left to enforce. This means that source distributed under OAI can be reused under the terms of the copyright licence (which is permissive) without any fear of patent restrictions. Now OAI can still amend its copyright licence to impose the field of use restrictions and enforce them via copyright means, it just can’t use patents to do so.

FRAND and Open Source

There have recently been several attempts to claim that FRAND patent enforcement and Open Source licensing can be compatible, or more specifically a FRAND patent pool holder like a Standards Development Organization can both produce an Open Source reference implementation and still collect patent Royalties. This looks to be wrong, however; the Supreme Court decision is clear: once a FRAND Patent pool holder distributes any code, that distribution is an authorized transfer within the meaning of the first sale doctrine and all FRAND pool patents exhaust at that point. The only way to enforce the FRAND royalty payments after this would be in the copyright licence of the code and obviously such a copyright licence, while legal, would not be remotely an Open Source licence.

Exhausting Patents By Distribution

The next question to address is could patents become exhausted simply because the holder distributed Open Source code in any form? As I said before, there is actually a case on point for this as well: Cascades v Samsung. In this case, Cascades tried to sue Samsung for violating a patent on the Dalvik JIT engine in AOSP. Cascades claimed they had licensed the patent to Google for a payment only for use in Google products. Samsung claimed exhaustion because Cascades had licensed the patent to Google and Samsung downloaded AOSP from Google. The court agreed with this and dismissed the infringement action. Case closed, right? Not so fast: it turns out Cascades raised a rather silly defence to Samsung’s claim of exhaustion, namely that the authorized transfer under the exhaustion doctrine didn’t happen until Samsung did the download from Google, so they were still entitled to enforce the Google products only restriction. As I said in the beginning courts have centuries of history with manufacturers trying to get around the exhaustion doctrine and this one crashed and burned just like all the others. However, the question remains: if Cascades had raised a better defence to the exhaustion claim, would they have prevailed?

The defence Cascades could have raised is that Samsung didn’t just download code from Google, they also copied the code they downloaded and those copies should be covered under the patent right to exclude manufacture, which didn’t exhaust with the download. To illustrate this in the Alice, Bob, Charlie chain: Alice sells an item to Bob and thus exhausts the patent so Bob can sell it on to Charlie unencumbered. However that exhaustion does not give either Bob or Charlie the right to manufacture a new copy of the item and sell it to Denise because exhaustion only applies to the same item Alice sold, not to a newly manufactured copy of that item.

The copy as new manufacture defence still seems rather vulnerable on two grounds: first because Samsung could download any number of exhausted copies from Google, so what’s the difference between them downloading ten copies and them downloading one copy and then copying it themselves nine times. Secondly, and more importantly, Cascades already had a remedy in copyright law: their patent licence to Google could have required that the AOSP copyright licence be amended not to allow copying of the source code by non-Google entities except on payment of royalties to Cascades. The fact that Cascades did not avail themselves of this remedy at the time means they’re barred from reclaiming it now via patent action.

The bottom line is that distribution exhausts all patents reading on the code you distribute is a very reasonable defence to maintain in a patent infringement lawsuit and it’s one we should be using much more often.

Exhaustion by Contribution

This is much more controversial and currently has no supporting case law. The idea is that Distribution can occur even with only incremental updates on the existing base (git pull to update code, say), so if delta updates constitute an authorized transfer under the exhaustion doctrine, then so must a patch based contribution, being a delta update from a contributor to the project, be an authorized transfer. In which case all patents which read on the project at the time of contribution must also exhaust when the contribution is made.

Even if the above doesn’t fly, it’s undeniable most contributions today are made by cloning a git tree and republishing it plus your own updates (essentially a github fork) which makes you a bona fide distributor of the whole project because it can all be downloaded from your cloned tree. Thus I think it’s reasonable to hold that all patents owned by distributors and contributors in an open source project have exhausted in that project. In other words all the arguments about the scope and extent of patent grants and patent capture in open source licences is entirely unnecessary.

Therefore, all active participants in an Open Source community ipso facto exhaust any patents on the community code as that code is redistributed.

Implications for Proprietary Software

Firstly, it’s important to note that the exhaustion arguments above have no impact on the patentability of software or the validity of software patents in general, just on their enforcement. Secondly, exhaustion is triggered by the unencumbered right to redistribute which is present in all Open Source licences. However, proprietary software doesn’t come with a right to redistribute in the copyright licence, meaning exhaustion likely doesn’t trigger for them. Thus the exhaustion arguments above have no real impact on the ability to enforce software patents in proprietary code except that one possible defence that could be raised is that the code practising the patent in the proprietary software was, in fact, legitimately obtained from an open source project under a permissive licence and thus the patent has exhausted. The solution, obviously, is that if you worry about enforceability of patents in proprietary software, always use a copyleft licence for your open source.

What about the Patent Troll Problem?

Trolls, by their nature, are not IP producing entities, thus they are not ecosystem participants. Therefore trolls, being outside the community, can pursue infringement cases unburdened by exhaustion problems. In theory, this is partially true but Trolls don’t produce anything, therefore they have to acquire their patents from someone who does. That means that if the producer from whom the troll acquired the patent was active in the community, the patent has still likely exhausted. Since the life of a patent is roughly 20 years and mass adoption of open source throughout the software industry is only really 10 years old¹ there still may exist patents owned by Trolls that came from corporations before they began to be Open Source players and thus might not be exhausted.

The hope this offers for the Troll problem is that in 10 years time, all these unexhausted patents will have expired and thanks to the onward and upward adoption of open source there really will be no place for Trolls to acquire unexhausted patents to use against the software industry, so the Troll threat is time limited.

A Call to Arms: Realising the Elimination of Patents in Open Source

Your mission, should you choose to be part of this project, is to help advance the legal doctrines on patent exhaustion. In particular, if the company you work for is sued for patent infringement in any Open Source project, even by a troll, suggest they look into asserting an exhaustion based defence. Even if your company isn’t currently under threat of litigation, simply raising awareness of the option of exhaustion can help enormously.

The first case an exhaustion defence could potentially be tried is this one: Sequoia Technology is asserting a patent against LVM in the Linux kernel. However it turns out that patent 6,718,436 is actually assigned to ETRI, who merely licensed it to Sequoia for the purposes of litigation. ETRI, by the way, is a Linux Foundation member but, more importantly, in 2007 ETRI launched their own distribution of Linux called Booyo which would appear to be evidence that their own actions as a distributor of the Linux Kernel have exhaused patent 6,718,436 in Linux long before they ever licensed it to Sequoia.

If we get this right, in 10 years the Patent threat in Open Source could be history, which would be a nice little legacy to leave our children.

Webauthn in Linux with a TPM via the HID gadget

10 Replies

Account security on the modern web is a bit of a nightmare. Everyone understands the need for strong passwords which are different for each account, but managing them is problematic because the human mind just can’t remember hundreds of complete gibberish words so everyone uses a password manager (which, lets admit it, for a lot of people is to write it down). A solution to this problem has long been something called two factor authentication (2FA) which authenticates you by something you know (like a password) and something you posses (like a TPM or a USB token). The problem has always been that you ideally need a different 2FA for each website, so that a compromise of one website doesn’t lead to the compromise of all your accounts.

Enter webauthn. This is designed as a 2FA protocol that uses public key cryptography instead of shared secrets and also uses a different public/private key pair for each website. Thus aspiring to be a passwordless secure scalable 2FA system for the web. However, the webauthn standard only specifies how the protocol works when the browser communicates with the remote website, there’s a different standard called FIDO or U2F that specifies how the browser communicates with the second factor (called an authenticator in FIDO speak) and how that second factor works.

It turns out that the FIDO standards do specify a TPM as one possible backend, so what, you might ask does this have to do with the Linux Gadget subsystem? The answer, it turns out, is that although the standards do recommend a TPM as the second factor, they don’t specify how to connect to one. The only connection protocols in the Client To Authenticator Protocol (CTAP) specifications are USB, BLE and NFC. And, in fact, the only one that’s really widely implemented in browsers is USB, so if you want to connect your laptop’s TPM to a browser it’s going to have to go over USB meaning you need a Linux USB gadget. Conspiracy theorists will obviously notice that if the main current connector is USB and FIDO requires new USB tokens because it’s a new standard then webauthn is a boon to token manufacturers.

How does Webauthn Work?

The protocol comes in two flavours, version 1 and version 2. Version 1 is fixed cryptography and version 2 is agile cryptography. However, version1 is simpler so that’s the one I’ll explain.

Webauthn essentially consists of two phases: a registration phase where the authenticator is tied to the account, which often happens when the remote account is created, and authentication where the authenticator is used to log in again to the website after registration. Obviously accounts often outlive the second factor, especially if it’s tied to a machine like the TPM, so the standard contemplates a single account having multiple registered authenticators.

The registration request consists of a random challenge supplied by the remote website to prevent replay and an application id which is constructed by the browser from the website supplied ID and the web origin of the site. The design is that the application ID should be unique for each remote account and not subject to being faked by the remote site to trick you into giving up some other application’s credentials.

The authenticator’s response consists of a unique public key, an opaque key handle, an attestation X.509 certificate containing a public key and a signature over the challenge, the application ID, the public key and the key handle using the private key of the certificate. The remote website can verify this signature against the certificate to verify registration. Additionally, Google recommends that the website also verifies the attestation certificate against a list of know device master certificates to prove it is talking to a genuine U2F authenticator. Since no-one is currently maintaining a database of “genuine” second factor master certificates, this last step mostly isn’t done today.

In version 1, the only key scheme allowed is Elliptic Curve over the NIST P-256 curve. This means that the public key is always 65 bytes long and an encrypted (or wrapped) form of the private key can be stashed inside the opaque key handle, which may be a maximum of 255 bytes. Since the key handle must be presented for each authentication attempt, it relieves the second factor from having to remember an ever increasing list of public/private key pairs because all it needs to do is unwrap the private key from the opaque handle and perform the signature and then forget the unwrapped private key. Note that this means per user account authenticator, the remote website must store the public key and the key handle, meaning about 300 bytes extra, but that’s peanuts compared to the amount of information remote websites usually store per registered account.

To perform an authentication the remote website presents a unique challenge, the raw ID from which the browser should construct the same application ID and the key handle. Ideally the authenticator should verify that the application ID matches the one used for registration (so it should be part of the wrapped key handle) and then perform a signature over the application ID, the challenge and a unique monotonically increasing counter number which is sent back in the response. To validly authenticate, the remote website verifies the signature is genuine and that the count has increased from the last time authentication has done (so it has to store the per authenticator 4 byte count as well). Any increase is fine, so each second factor only needs to maintain a single monotonically increasing counter to use for every registered site.

Problems with Webauthn and the TPM

The primary problem is the attestation certificate, which is actually an issue for the whole protocol. TPMs are actually designed to do attestation correctly, which means providing proof of being a genuine TPM without compromising the user’s privacy. The way they do this is via a somewhat complex attestation protocol involving a privacy CA. The problem they’re seeking to avoid is that if you present the same certificate every time you use the device for registration you can be tracked via that certificate and your privacy is compromised. The way the TPM gets around this is that you can use a privacy CA to produce an arbitrary number of different certificates for the same TPM and you could present a new one every time, thus leaving nothing to be tracked by.

The ability to track users by certificate has been a criticism levelled at FIDO and the best the alliance can come up with is the idea that perhaps you batch the attestation certificates, so the same certificate is used in hundreds of new keys.

The problem for TPMs though is that until FIDO devices use proper privacy CA based attestation, the best you can do is generate a separate self signed attestation certificate. The reason is that the TPM does contain its own certificate, but it’s encryption only, not signing because of the way the TPM privacy CA based attestation works. Thus, even if you were willing to give up your privacy you can’t use the TPM EK certificate as the FIDO attestation certificate. Plus, if Google carries out its threat to verify attestation certificates, this scheme is no longer going to work.

Aside about Browsers and CTAP

The crypto aware among you will recognise that there is already a library based standard that can be used to talk to a variety of USB tokens and even the TPM called PKCS#11. Mozilla Firefox, for instance, already supports using this as I demonstrated in a previous blog post. One might think, based on what I said about the one token per key problem in the introduction, that PKCS#11 can’t support the new key wrapping based aspect of FIDO but, in fact, it can via the C_WrapKey/C_UnwrapKey API. The only thing PKCS#11 can’t do is the new monotonic counter.

Even if PKCS#11 can’t perform all the necessary functions, what about a new or extended library based protocol? This is a good question to which I’ve been unable to get a satisfactory answer. Certainly doing CTAP correctly requires that your browser be able to speak directly to the USB, Bluetooth and NFC subsystems. Perhaps not too hard for a single platform browser like Internet Explorer, but fraught with platform complexity for generic browsers like FireFox where the only solution is to have a rust based accessor for every supported platform.

Certainly the lack of a library interface are where the TPM issues come from, because without that we have to plug the TPM based FIDO layer into a browser over an existing CTAP protocol it supports, i.e. USB. Fortunately Linux has the USB Gadget subsystem which fits the bill precisely.

Building Synthetic HID Devices with USB Gadget

Before you try this at home, I should point out that the Linux HID Gadget has a longstanding bug that will cause your machine to hang unless you have this patch applied. You have been warned!

The HID subsystem is for driving Human Interaction Devices meaning keyboard and mice. However, it has a simple packet (called report in USB speak) based protocol which is easy for most things to use. In order to facilitate this, Linux actually provides hidraw devices which allow you to send and receive these reports using read and write system calls (which, in fact, is how Firefox on Linux speaks CTAP). What the hid gadget does when set up is provide all the static emulation of HID device protocols (like discovery pages) while allowing you to send and receive the hidraw packets over the /dev/hidgX device tap, also via read and write (essentially operating like a tty/pty pair²). To get the whole thing running, the final piece of the puzzle is that the browser (most likely running as you) needs to be able to speak to the hidraw device, so you need a udev rule to make it accessible because by default they’re 0600. Since the same goes for every other USB security token, you’ll find the template in the same rpm that installs the PKCS#11 library for the token.

The way CTAP works is that every transaction is split into 64 byte reports and sent over the hidraw interface. All you need to do to get this setup is initialise a report descriptor for this type of device. Since it’s somewhat cumbersome to do, I’ve created this script to do it (run it as root). Once you have this, the hidraw and hidg devices will appear (make them both user accessible with chmod 666) and then all you need is a programme to drive the hidg device and you’re done.

A TPM Based Hid Gadget Driver

Note: this section is written describing TPM 2.0.

The first thing we need out of the TPM is a monotonic counter, but all TPMs have NV counter indexes which can be created (all TPM counters are 8 byte, whereas the CTAP protocol requires 4 bytes, but we simply chop off the top 4 bytes). By convention I create the counter at NV index 01000101. Once created, this counter will be persistent and monotonic for the lifetime of the TPM.

The next thing you need is an attestation certificate and key. These must be NIST P-256 based, but it’s easy to get openssl to create them

openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:prime256v1 -pkeyopt ec_param_enc:named_curve -out reg_key.key

openssl req -new -x509 -subj '/CN=My Fido Token/' -key reg_key.key -out reg_key.der -outform DER

This creates a self signed certificate, but you could also create a certificate chain this way.

Finally, we need the TPM to generate one NIST P-256 key pair per registration. Here we use the TPM2_Create() call which gets the TPM to create a random asymmetric key pair and return the public and wrapped private pieces. We can simply bundle these up and return them as the key handle (fortunately, what the TPM spits back for a NIST P-256 key is about 190 bytes when properly marshalled). When the remote end requests an authentication, we extract the TPM key from the key handle and use a TPM2_Load to place it in the TPM and sign the hash and then unload it from the TPM. Putting this all together this project (which is highly experimental) provides the script to create the devices and a hidg driver that interfaces to the TPM. All you need to do is run it as

hidgd /dev/hidg0 reg_key.der reg_key.key

And you’re good to go. If you want to test it there are plenty of public domain webauthn test sites, webauthn.org and webauthn.io³ are two I’ve tested as working.

TODO Items

The webauthn standard specifies the USB authenticator should ask for permission before performing either registration or authentication. Currently the TPM hid gadget doesn’t have any external verification, but in future I’ll add a configurable pinentry to add confirmation and possibly also a single password for verification.

The current code also does nothing to verify the application ID on a per authorization basis. This is a security problem because you are currently vulnerable to being spoofed by malicious websites who could hand you a snooped key handle and then use the signature to fake your login to a different site. To avoid this, I’m planning to use the policy area of the TPM key to hold the application ID. This should work because the generated keys have no authorization, either policy or password, so the policy area is effectively redundant. It is in the unwrapped public key, but if any part of the public key is tampered with the TPM will detect this via a hash in the wrapped private error and give a binding error on load.

The current code really only does version 1 of the FIDO protocol. Ideally it needs upgrading to version 2. However, there’s not really much point because for all the crypto agility, most TPMs on the market today can only do NIST P-256 curves, so you wouldn’t gain that much.

Conclusions

Using this scheme you’re ready to play with FIDO/U2F as long as you have a laptop with a functional TPM 2.0 and a working USB gadget subsystem. If you want to play, please remember to make sure you have the gadget patch applied.

Using TPM Based Client Certificates on Firefox and Apache

2 Replies

One of the useful features of Apache (or indeed any competent web server) is the ability to use client side certificates. All this means is that a certificate from each end of the TLS transaction is verified: the browser verifies the website certificate, but the website requires the client also to present one and verifies it. Using client certificates, when linked to your own client certificate CA gives web transactions the strength of two factor authentication if you do it on the login page. I use this feature quite a lot for all the admin features my own website does. With apache it’s really simple to turn on with the

SSLCACertificateFile

Directive which allows you to specify the CA for the accepted certificates. In my own setup I have my own self signed certificate as CA and then all the authority certificates use it as the issuer. You can turn Client Certificate verification on per location basis simply by doing

<Location /some/web/location>
      SSLVerifyClient require
</Location

And Apache will take care of requesting the client certificate and verifying it against the CA. The only caveat here is that TLSv1.3 currently fails to work for this, so you have to disable it with

SSLProtocol -TLSv1.3

Client Certificates in Firefox

Firefox is somewhat hard to handle for SSL because it includes its own hand written mozilla secure sockets code, which has a toolkit quite unlike any other ssl toolkit⁴. In order to import a client certificate and key into firefox, you need to create a pkcs12 file containing them and import that into the “Your Certificates” box which is under Preferences > Privacy & Security > View Certificates

Obviously, simply supplying a key file to firefox presents security issues because you’d like to prevent a clever hacker from gaining access to it and thus running off with your client certificate. Firefox achieves a modicum of security by doing all key operations over the PKCS#11 API via a software token, which should mean that even malicious javascript cannot gain access to your key but merely the signing API

However, assuming you don’t quite trust this software separation, you need to store your client signing key in a secure vault like a TPM to make sure no web hacker can gain access to it. Various crypto system connectors, like the OpenSSL TPM⁵ and TPM2 engine, already exist but because Firefox uses its own crytographic code it can’t take advantage of them. In fact, the only external object the Firefox crypto code can use is a PKCS#11 module.

Aside about TPM2 and PKCS#11

The design of PKCS#11 is that it is a loadable library which can find and enumerate keys and certificates in some type of hardware device like a USB Key or a PCI attached HSM. However, since the connector is simply a library, nothing requires it connect to something physical and the OpenDNSSEC project actually produces a purely software based cryptographic token. In theory, then, it should be easy

The problems come with the PKCS#11 expectation of key residency: The library allows the consuming program to enumerate a list of slots each of which may, or may not, be occupied by a single token. Each token may contain one or more keys and certificates. Now the TPM does have a concept of a key resident in NV memory, which is directly analagous to the PKCS#11 concept of a token based key. The problems start with the TPM2 PC Client Profile which recommends this NV area be about 512 bytes, which is big enough for all of one key and thus not very scalable. In fact, the imagined use case of the TPM is with volatile keys which are demand loaded.

Demand loaded keys map very nicely to the OpenSSL idea of a key file, which is why OpenSSL TPM engines are very easy to understand and use, but they don’t map at all into the concept of token resident keys. The closest interface PKCS#11 has for handling key files is the provisioning calls, but even there they’re designed for placing keys inside tokens and, once provisioned, the keys are expected to be non-volatile. Worse still, very few PKCS#11 module consumers actually do provisioning, they mostly leave it up to a separate binary they expect the token producer to supply.

Even if the demand loading problem could be solved, the PKCS#11 API requires quite a bit of additional information about keys, like ids, serial numbers and labels that aren’t present in the standard OpenSSL key files and have to be supplied somehow.

Solving the Key File to PKCS#11 Mismatch

The solution seems reasonably simple: build a standard PKCS#11 library that is driven by a known configuration file. This configuration file can map keys to slots, as required by PKCS#11, and also supply all the missing information. the C_Login() operation is expected to supply a passphrase (or PIN in PKCS#11 speak) so that would be the point at which the private key could be loaded.

One of the interesting features of the above is that, while it could be implemented for the TPM engine only, it can also be implemented as a generic OpenSSL key exporter to PKCS#11 that happens also to take engine keys. That would mean it would work for non-engine keys as well as any engine that exists for OpenSSL … a nice little win.

Building an OpenSSL PKCS#11 Key Exporter

A Token can be built from a very simple ini like configuration file, with the global section setting global properties, like manufacurer id and library description and each individual section being used to instantiate a slot containing one key. We can make the slot name, the id and the label the same if not overridden and use key file directives to load the public and private keys. The serial number seems best constructed from a hash of the public key parameters (again, if not overridden). In order to support engine keys, the token library needs to know which engine to invoke, so I added an engine keyword to tell it.

With that, the mechanics of making the token library work with any OpenSSL key are set, the only thing is to plumb in the PKCS#11 glue API. At this point, I should add that the goal is simply to get keys and tokens working, not to replicate a full featured PKCS#11 API, so you shouldn’t use this as something to test against for a reference implementation (the softhsm2 token is much better for that). However, it should be functional enough to use for storing keys in Firefox (as well as other things, see below).

The current reasonably full featured source code is here, with a reference build using the OpenSUSE Build Service here. I should add that some of the build failures are due to problems with p11-kit and others due to the way Debian gets the wrong engine path for libp11.

At Last: Getting TPM Keys working with Firefox

A final problem with Firefox is that there seems to be no way to import a certificate file for which the private key is located on a token. The only way Firefox seems to support this is if the token contains both the private key and the certificate. At least this is my own project, so some coding later, the token now supports certificates as well.

The next problem is more mundane: generating the certificate and key. Obviously, the safest key is one which has never left the TPM, which means the certificate request needs to be built from it. I chose a CSR type that also includes my name and my machine name for later easy discrimination (and revocation if I ever lose my laptop). This is the sequence of commands for my machine called jarvis.

create_tpm2_key -a key.tpm
openssl req -subj "/CN=James Bottomley/UID=jarivs/" -new -engine tpm2 -keyform engine -key key.tpm -nodes -out jarvis.csr
openssl x509 -in jarvis.csr -req -CA  my-ca.crt -engine tpm2 -CAkeyform engine -CAkey my-ca.key -days 3650 -out jarvis.crt

As you can see from the above, the key is first created by the TPM, then that key is used to create a certificate request where the common name is my name and the UID is the machine name (this is just my convention, feel free to use your own) and then finally it’s signed by my own CA, which you’ll notice is also based on a TPM key. Once I have this, I’m free to create an ini file to export it as a token to Firefox

manufacturer id = Firefox Client Cert
library description = Cert for hansen partnership
[mozilla-key]
certificate = /home/jejb/jarvis.crt
private key = /home/jejb/key.tpm
engine = tpm2

All I now need to do is load the PKCS#11 shared object library into Firefox using Settings > Privacy & Security > Security Devices > Load and I have a TPM based client certificate ready for use.

Additional Uses

It turns out once you have a generic PKCS#11 exporter for engine keys, there’s no end of uses for them. One of the most convenient has been using TPM2 keys with gnutls. Although gnutls was quick to adopt TPM 1.2 based keys, it’s been much slower with TPM2 but because gnutls already has a PKCS#11 interface using the p11 kit URI format, you can easily build a config file of all the TPM2 keys you want it to use and simply use them by URI in gnutls.

Unfortunately, this has also lead to some problems, the biggest one being Firefox: Firefox assumes, once you load a PKCS#11 module library, that you want it to use every single key it can find, which is fine until it pops up 10 dialogue boxes each time you start it, one for each key password, particularly if there’s only one key you actually care about it using. This problem doesn’t seem solvable in the Firefox token interface, so the eventual way I did it was to add the ability to specify the config file in the environment (variable OPENSSL_PKCS11_CONF) and modify my xfce Firefox action to set this in the environment pointing at a special configuration file with only Firefox’s key in it.

Conclusions and Future Work

Hopefully I’ve demonstrated this simple PKCS#11 converter can be useful both to keeping Firefox keys safe as well as uses in other things like gnutls. Unfortunately, it turns out that the world wide web is turning against PKCS#11 tokens as having usability problems and is moving on to something called FIDO2 tokens which have the web browser talking directly to the USB token. In my next technical post I hope to explain how you can use the Linux Kernel USB gadget system to connect a TPM up easily as a FIDO2 token so you can use the new passwordless webauthn protocol seamlessly.

Measuring the Horizontal Attack Profile of Nabla Containers

6 Replies

One of the biggest problems with the current debate about Container vs Hypervisor security is that no-one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors “feel” more secure than containers because of the interface breadth) but no-one actually has done a quantitative comparison. The purpose of this blog post is to move the debate forwards by suggesting a quantitative methodology for measuring the Horizontal Attack Profile (HAP). For more details about Attack Profiles, see this blog post. I don’t expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well.

Well begin by looking at the Nabla technology through the relatively uncontroversial metric of performance. In most security debates, it’s acceptable that some performance is lost by securing the application. As a rule of thumb, placing an application in a hypervisor loses anywhere between 10-30% of the native performance. Our goal here is to show that, for a variety of web tasks, the Nabla containers mechanism has an acceptable performance penalty.

Performance Measurements

We took some standard benchmarks: redis-bench-set, redis-bench-get, python-tornado and node-express and in the latter two we loaded up the web servers with simple external transactional clients. We then performed the same test for docker, gVisor, Kata Containers (as our benchmark for hypervisor containment) and nabla. In all the figures, higher is better (meaning more throughput):

The red Docker measure is included to show the benchmark. As expected, the Kata Containers measure is around 10-30% down on the docker one in each case because of the hypervisor penalty. However, in each case the Nabla performance is the same or higher than the Kata one, showing we pay less performance overhead for our security. A final note is that since the benchmarks are network ones, there’s somewhat of a penalty paid by userspace networking stacks (which nabla necessarily has) for plugging into docker network, so we show two values, one for the bridging plug in (nabla-containers) required to orchestrate nabla with kubernetes and one as a direct connection (nabla-raw) showing where the performance would be without the network penalty.

One final note is that, as expected, gVisor sucks because ptrace is a really inefficient way of connecting the syscalls to the sandbox. However, it is more surprising that gVisor-kvm (where the sandbox connects to the system calls of the container using hypercalls instead) is also pretty lacking in performance. I speculate this is likely because hypercalls exact their own penalty and hypervisors usually try to minimise them, which using them to replace system calls really doesn’t do.

HAP Measurement Methodology

The Quantitative approach to measuring the Horizontal Attack Profile (HAP) says that we take the bug density of the Linux Kernel code and multiply it by the amount of unique code traversed by the running system after it has reached a steady state (meaning that it doesn’t appear to be traversing any new kernel paths). For the sake of this method, we assume the bug density to be uniform and thus the HAP is approximated by the amount of code traversed in the steady state. Measuring this for a running system is another matter entirely, but, fortunately, the kernel has a mechanism called ftrace which can be used to provide a trace of all of the functions called by a given userspace process and thus gives a reasonable approximation of the number of lines of code traversed (note this is an approximation because we measure the total number of lines in the function taking no account of internal code flow, primarily because ftrace doesn’t give that much detail). Additionally, this methodology works very well for containers where all of the control flow emanates from a well known group of processes via the system call information, but it works less well for hypervisors where, in addition to the direct hypercall interface, you also have to add traces from the back end daemons (like the kvm vhost kernel threads or dom0 in the case of Xen).

HAP Results

The results are for the same set of tests as the performance ones except that this time we measure the amount of code traversed in the host kernel:

As stated in our methodology, the height of the bar should be directly proportional to the HAP where lower is obviously better. On these results we can say that in all cases the Nabla runtime tender actually has a better HAP than the hypervisor contained Kata technology, meaning that we’ve achieved a container system with better HAP (i.e. more secure) than hypervisors.

Some of the other results in this set also bear discussing. For instance the Docker result certainly isn’t 10x the Kata result as a naive analysis would suggest. In fact, the containment provided by docker looks to be only marginally worse than that provided by the hypervisor. Given all the hoopla about hypervisors being much more secure than containers this result looks surprising but you have to consider what’s going on: what we’re measuring in the docker case is the system call penetration of normal execution of the systems. Clearly anything malicious could explode this result by exercising all sorts of system calls that the application doesn’t normally use. However, this does show clearly that a docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.

The other surprising result is that, in spite of their claims to reduce the exposure to Linux System Calls, gVisor actually is either equivalent to the docker use case or, for the python tornado test, significantly worse than the docker case. This too is explicable in terms of what’s going on under the covers: gVisor tries to improve containment by rewriting the Linux system call interface in Go. However, no-one has paid any attention to the amount of system calls the Go runtime is actually using, which is what these results are really showing. Thus, while current gVisor doesn’t currently achieve any containment improvement on this methodology, it’s not impossible to write a future version of the Go runtime that is much less profligate in the way it uses system calls by developing a Secure Go using the same methodology we used to develop Nabla.

Conclusions

On both tests, Nabla is far and away the best containment technology for secure workloads given that it sacrifices the least performance over docker to achieve the containment and, on the published results, is 2x more secure even than using hypervisor based containment.

Hopefully these results show that it is perfectly possible to have containers that are more secure than hypervisors and lays to rest, finally, the arguments about which is the more secure technology. The next step, of course, is establishing the full extent of exposure to a malicious application and to do that, some type of fuzz testing needs to be employed. Unfortunately, right at the moment, gVisor is simply crashing when subjected to fuzz testing, so it needs to become more robust before realistic measurements can be taken.

A New Method of Containment: IBM Nabla Containers

The Basics: Looking for Better Containment

The HAP attack worry with standard containers is shown on the left: that a malicious application can breach the containment wall and attack an innocent application. This attack is thought to be facilitated by the breadth of the syscall interface in standard containers so the guiding star in developing Nabla Containers was a methodology for measuring the reduction in the HAP (and hence the improvement in containment), but the initial impetus came from the observation that unikernel systems are nicely modular in the libOS approach, can be used to emulate systemcalls and, thanks to rumprun, have a wide set of support for modern web friendly languages (like python, node.js and go) with a fairly thin glue layer. Additionally they have a fairly narrow set of hypercalls that are actually used in practice (meaning they can be made more secure than conventional hypervisors). Code coverage measurements of standard unikernel based kvm images confirmed that they did indeed use a far narrower interface.

Replacing the Hypervisor Interface

One of the main elements of the hypervisor interface is the transition from a less privileged guest kernel to a more privileged host one via hypercalls and vmexits. These CPU mediated events are actually quite expensive, certainly a lot more expensive than a simple system call, which merely involves changing address space and privilege level. It turns out that the unikernel based kvm interface is really only nine hypercalls, all of which are capable of being rewritten as syscalls, so the approach to running this new sandbox as a container is to do this rewrite and seccomp restrict the interface to being only what the rewritten unikernel runtime actually needs (meaning that the seccomp profile is now CSP enforced). This vision, by the way, of a broad runtime above being mediated to a narrow interface is where the name Nabla comes from: The symbol for Nabla is an inverted triangle (∇) which is broad at the top and narrows to a point at the base.

Using this formulation means that the nabla runtime (or nabla tender) can be run as a single process within a standard container and the narrowness of the interface to the host kernel prevents most of the attacks that a malicious application would be able to perform.

DevOps and the ParaVirt conundrum

Back at the dawn of virtualization, there were arguments between Xen and VMware over whether a hypervisor should be fully virtual (capable of running any system supported by the virtual hardware description) or paravirtual (the system had to be modified to run on the virtualization system and thus would be incapable of running on physical hardware). Today, thanks in a large part to CPU support for virtualization primtives, fully paravirtual systems have long since gone the way of the dodo and everyone nowadays expects any OS running on a hypervisor to be capable of running on physical hardware⁶. The death of paravirt also left the industry with an aversion to ever reviving it, which explains why most sandbox containment systems (gVisor, Kata) try to require no modifications to the image.

With DevOps, the requirement is that images be immutable and that to change an image you must take it through the full develop build, test, deploy cycle. This development centric view means that, provided there’s no impact to the images you use as the basis for your development, you can easily craft your final image to suit the deployment environment, which means a step like linking with the nabla tender is very easy. Essentially, this comes down to whether you take the Dev (we can rebuild to suit the environment) or the Ops (the deployment environment needs to accept arbitrary images) view. However, most solutions take the Ops view because of the anti-paravirt bias. For the Nabla tender, we take the Dev view, which is born out by the performance figures.

Conclusion

Like most sandbox models, the Nabla containers approach is an alternative to namespacing for containment, but it still requires cgroups for resource management. The figures show that the containment HAP is actually better than that achieved with a hypervisor and the performance, while being marginally less than a namespaced container, is greater than that obtained by running a container inside a hypervisor. Thus we conclude that for tenants who have a real need for HAP reduction, this is a viable technology.

Containers and Cloud Security

9 Replies

Introduction

The idea behind this blog post is to take a new look at how cloud security is measured and what its impact is on the various actors in the cloud ecosystem. From the measurement point of view, we look at the vertical stack: all code that is traversed to provide a service all the way from input web request to database update to output response potentially contains bugs; the bug density is variable for the different components but the more code you traverse the higher your chance of exposure to exploitable vulnerabilities. We’ll call this the Vertical Attack Profile (VAP) of the stack. However, even this axis is too narrow because the primary actors are the cloud tenant and the cloud service provider (CSP). In an IaaS cloud, part of the vertical profile belongs to the tenant (The guest kernel, guest OS and application) and part (the hypervisor and host OS) belong to the CSP. However, the CSP vertical has the additional problem that any exploit in this piece of the stack can be used to jump into either the host itself or any of the other tenant virtual machines running on the host. We’ll call this exploit causing a failure of containment the Horizontal Attack Profile (HAP). We should also note that any Horizontal Security failure is a potentially business destroying event for the CSP, so they care deeply about preventing them. Conversely any exploit occurring in the VAP owned by the Tenant can be seen by the CSP as a tenant only problem and one which the Tenant is responsible for locating and fixing. We correlate size of profile with attack risk, so the large the profile the greater the probability of being exploited.

From the Tenant point of view, improving security can be done in one of two ways, the first (and mostly aspirational) is to improve the security and monitoring of the part of the Vertical the Tenant is responsible for and the second is to shift responsibility to the CSP, so make the CSP responsible for more of the Vertical. Additionally, for most Tenants, a Horizontal failure mostly just means they lose trust in the CSP, unless the Tenant is trusting the CSP with sensitive data which can be exfiltrated by the Horizontal exploit. In this latter case, the Tenant still cannot do anything to protect the CSP part of the Security Profile, so it’s mostly a contractual problem: SLAs and penalties for SLA failures.

Examples

To see how these interpretations apply to the various cloud environments, lets look at some of the Cloud (and pre-Cloud) models:

Physical Infrastructure

The left hand diagram shows a standard IaaS rented physical system. Since the Tenant rents the hardware it is shown as red indicating CSP ownership and the the two Tenants are shown in green and yellow. In this model, barring attacks from the actual hardware, the Tenant owns the entirety of the VAP. The nice thing for the CSP is that hardware provides air gap security, so there is no HAP which means it is incredibly secure.

However, there is another (much older) model shown on the right, called the shared login model, where the Tenant only rents a login on the physical system. In this model, only the application belongs to the Tenant, so the CSP is responsible for much of the VAP (the expanded red area). Here the total VAP is the same, but the Tenant’s VAP is much smaller: the CSP is responsible for maintaining and securing everything apart from the application. From the Tenant point of view this is a much more secure system since they’re responsible for much less of the security. From the CSP point of view there is now a because a tenant compromising the kernel can control the entire system and jump to other tenant processes. This actually has the worst HAP of all the systems considered in this blog.

Hypervisor based Virtual Infrastructure

In this model, the total VAP is unquestionably larger (worse) than the physical system above because there’s simply more code to traverse (a guest and a host kernel). However, from the Tenant’s point of view, the VAP should be identical to that of unshared physical hardware because the CSP owns all the additional parts. However, there is the possibility that the Tenant may be compromised by vulnerabilities in the Virtual Hardware Emulation. This can be a worry because an exploit here doesn’t lead to a Horizontal security problem, so the CSP is apt to pay less attention to vulnerabilities in the Virtual Hardware simply because each guest has its own copy (even though that copy is wholly under the control of the CSP).

The HAP is definitely larger (worse) than the physical host because of the shared code in the Host Kernel/Hypervisor, but it has often been argued that because this is so deep in the Vertical stack that the chances of exploit are practically zero (although venom gave the lie to this hope: stack depth represents obscurity, not security).

However, there is another way of improving the VAP and that’s to reduce the number of vulnerabilities that can be hit. One way that this can be done is to reduce the bug density (the argument for rewriting code in safer languages) but another is to restrict the amount of code which can be traversed by narrowing the interface (for example, see arguments in this hotcloud paper). On this latter argument, the host kernel or hypervisor does have a much lower VAP than the guest kernel because the hypercall interface used for emulating the virtual hardware is very narrow (much narrower than the syscall interface).

The important takeaways here are firstly that simply transferring ownership of elements in the VAP doesn’t necessarily improve the Tenant VAP unless you have some assurance that the CSP is actively monitoring and fixing them. Conversely, when the threat is great enough (Horizontal Exploit), you can trust to the natural preservation instincts of the CSP to ensure correct monitoring and remediation because a successful Horizontal attack can be a business destroying event for the CSP.

Container Based Virtual Infrastructure

The total VAP here is identical to that of physical infrastructure. However, the Tenant component is much smaller (the kernel accounting for around 50% of all vulnerabilities). It is this reduction in the Tenant VAP that makes containers so appealing: the CSP is now responsible for monitoring and remediating about half of the physical system VAP which is a great improvement for the Tenant. Plus when the CSP remediates on the host, every container benefits at once, which is much better than having to crack open every virtual machine image to do it. Best of all, the Tenant images don’t have to be modified to benefit from these fixes, simply running on an updated CSP host is enough. However, the cost for this is that the HAP is the entire linux kernel syscall interface meaning the HAP is much larger than then hypervisor virtual infrastructure case because the latter benefits from interface narrowing to only the hypercalls (qualitatively, assuming the hypercall interface is ~30 calls and the syscall interface is ~300 calls, then the HAP is 10x larger in the container case than the hypervisor case); however, thanks to protections from the kernel namespace code, the HAP is less than the shared login server case. Best of all, from the Tenant point of view, this entire HAP cost is borne by the CSP, which makes this an incredible deal: not only does the Tenant get a significant reduction in their VAP but the CSP is hugely motivated to keep on top of all vulnerabilities in their part of the VAP and remediate very fast because of the business implications of a successful horizontal attack. The flip side of this is that a large number of the world’s CSPs are very unhappy about this potential risks and costs and actually try to shift responsibility (and risk) back to the Tenant by advocating nested virtualization solutions like running containers in hypervisors. So remember, you’re only benefiting from the CSP motivation to actively maintain their share of the VAP if your CSP runs bare metal containers because otherwise they’ve quietly palmed the problem back off on you.

Other Avenues for Controlling Attack Profiles

The assumption above was that defect density per component is roughly constant, so effectively the more code the more defects. However, it is definitely true that different code bases have different defect densities, so one way of minimizing your VAP is to choose the code you rely on carefully and, of course, follow bug reduction techniques in the code you write.

Density Reduction

The simplest way of reducing defects is to find and fix the ones in the existing code base (while additionally being careful about introducing new ones). This means it is important to know how actively defects are being searched for and how quickly they are being remediated. In general, the greater the user base for the component, the greater the size of the defect searchers and the faster the speed of their remediation, which means that although the Linux Kernel is a big component in the VAP and HAP, a diligent patch routine is a reasonable line of defence because a fixed bug is not an exploitable bug.

Another way of reducing defect density is to write (or rewrite) the component in a language which is less prone to exploitable defects. While this approach has many advocates, particularly among language partisans, it suffers from the defect decay issue: the idea that the maximum number of defects occurs in freshly minted code and the number goes down over time because the more time from release the more chance they’ve been found. This means that a newly rewritten component, even in a shiny bug reducing language, can still contain more bugs than an older component written in a more exploitable language, simply because a significant number of bugs introduced on creation have been found in the latter.

Code Reduction (Minimization Techniques)

It also stands to reason that, for a complex component, simply reducing the amount of code that is accessible to the upper components reduces the VAP because it directly reduces the number of defects. However, reducing the amount of code isn’t as simple as it sounds: it can only really be done by components that are configurable and then only if you’re not using the actual features you eliminate. Elimination may be done in two ways, either physically, by actually removing the code from the component or virtually by blocking access using a guard (see below).

Guarding and Sandboxing

Guarding is mostly used to do virtual code elimination by blocking access to certain code paths that the upper layers do not use. For instance, seccomp in the Linux Kernel can be used to block access to system calls you know the application doesn’t use, meaning it also blocks any attempt to exploit code that would be in those system calls, thus reducing the VAP (and also reducing the HAP if the kernel is shared).

The deficiencies in the above are obvious: if the application needs to use a system call, you cannot block it although you can filter it, which leads to huge and ever more complex seccomp policies. The solution for the system call an application has to use problem can sometimes be guarding emulation. In this mode the guard code actually emulates all the effects of the system call without actually making the actual system call into the kernel. This approach, often called sandboxing, is certainly effective at reducing the HAP since the guards usually run in their own address space which cannot be used to launch a horizontal attack. However, the sandbox may or may not reduce the VAP depending on the bugs in the emulation code vs the bugs in the original. One of the biggest potential disadvantages to watch out for with sandboxing is the fact that the address space the sandbox runs in is often that of the tenant, often meaning the CSP has quietly switched ownership of that component back to the tenant as well.

Conclusions

First and foremost: security is hard. As a cloud Tenant, you really want to offload as much of it as possible to people who are much more motivated to actually do it than you are (i.e. the Cloud Service Provider).

The complete Vertical Attack Profile of a container bare metal system in the cloud is identical to a physical system and better than a Hypervisor based system; plus the tenant owned portion is roughly 50% of the total VAP meaning that Containers are by far the most secure virtualization technology available today from the Tenant perspective.

The increased Horizontal Attack profile that containers bring should all rightly belong to the Cloud Service Provider. However, CSPs are apt to shirk this responsibility and try to find creative ways to shift responsibility back to the tenant including spreading misinformation about the container Attack profiles to try to make Tenants demand nested solutions.

Before you, as a Tenant, start worrying about the CSP owned Horizontal Attack Profile, make sure that contractual remedies (like SLAs or Reputational damage to the CSP) would be insufficient to cover the consequences of any data loss that might result from a containment breach. Also remember that unless you, as the tenant, are under external compliance obligations like HIPPA or PCI, contractual remedies for a containment failure are likely sufficient and you should keep responsibility for the HAP where it belongs: with the CSP.

Why Microsoft is a good steward for GitHub

4 Replies

There seems to be a lot of hysteria going on in various communities that depend on GitHub for their project hosting around the Microsoft acquisition (just look in the comments here and here). Obviously a lot of social media ink will be expended on this, so I’d just like to explain why as a committed open source developer, I think this will actually be a good thing.

Firstly, it’s very important to remember that git may be open source, but GitHub isn’t: none of the scripts that run the service have much published source code at all. It may be a closed source hosting infrastructure that a lot of open source projects rely on but that doesn’t make it open source itself. So why is GitHub not open source? Well, it all goes back to the business model. Notwithstanding fantastic market valuations there are lots of companies that play in the open source ecosystem, like GitHub, which struggle to find a sustainable business model (or even revenue). This leads to a lot of open closed/open type models like GitHub (the reason GitHub keeps the code closed is so they can sell it to other companies for internal source management) or Docker Enterprise.

Secondly, even if GitHub were fully open source, as I’ve argued in my essays about the GPL, to trust a corporate player in the ecosystem, you need to be able to understand fully its business motivation for being there and verify the business goals align with the community ones. As long as the business motivation is transparent and aligned with the community, you know you can trust it. However, most of the new supposedly “open source” companies don’t have clear business models at all, which means their business motivation is anything but transparent. Paradoxically this means that most of the new corporate idols in the open source ecosystem are remarkably untrustworthy because their business model changes from week to week as they struggle to please their venture capitalist overlords. There’s no way you can get the transparency necessary for open source trust if the company itself doesn’t know what its business model will be next week.

Finally, this means that companies with well established open source business models and motivations that don’t depend on the whims of VCs are much more trustworthy in open source in the long term. Although it’s a fairly recent convert, Microsoft is now among these because it’s clearly visible how its conversion from desktop to cloud both requires open source and requires Microsoft to play nicely with open source. The fact that it has a trust deficit from past actions is a bonus because from the corporate point of view it has to be extra vigilant in maintaining its open source credentials. The clinching factor is that GitHub is now ancillary to Microsoft’s open source strategy, not its sole means of revenue, so lots of previous less community oriented decisions, like keeping the GitHub code closed source, can be revisited in time as Microsoft seeks to gain community trust.

For the record, I should point out that although I have a github account, I host all my code on kernel.org mostly because the GitHub workflow really annoys me, having spent a lot of time trying to deduce commit motivations in a sparse git commit messages which then require delving into github issues and pull requests only to work out that most of the necessary details are in some private slack back channel well away from public view. Regardless of who owns GitHub, I don’t see this workflow problem changing any time soon, so I’ll be sticking to my current hosting setup.

GPL as the Best Licence – Governance and Philosophy

2 Replies

In the first part I discussed the balancing mechanisms the GPL provides for enabling corporate contributions, giving users a voice and the right one for mutually competing corporations to collaborate on equal terms. In this part I’ll look at how the legal elements of the GPL licences make it pretty much the perfect one for supporting a community of developers co-operating with corporations and users.

As far as a summary of my talk goes, this series is complete. However, I’ve been asked to add some elaboration on the legal structure of GPL+DCO contrasted to other CLAs and also include AGPL, so I’ll likely do some other one off posts in the Legal category about this.

Free Software vs Open Source

There have been many definitions of both of these. Rather than review them, in the spirit of Humpty Dumpty, I’ll give you mine: Free Software, to me, means espousing a set of underlying beliefs about the code (for instance the four freedoms of the FSF). While this isn’t problematic for many developers (code freedom, of course, is what enables developer driven communities) it is an anathema to most corporations and in particular their lawyers because, generally applied, it would require the release of all software based intellectual property. Open Source on the other hand, to me, means that you follow all the rules of the project (usually licensing and contribution requirements) but don’t necessarily sign up to the philosophy underlying the project (if there is one; most Open Source projects won’t have one).

Open Source projects are compatible with Corporations because, provided they have some commonality in goals, even a corporation seeking to exploit a market can march a long way with a developer driven community before the goals diverge. This period of marching together can be extremely beneficial for both the project and the corporation and if corporate priorities change, the corporation can simply stop contributing. As I have stated before, Community Managers serve an essential purpose in keeping this goal alignment by making the necessary internal business adjustments within a corporation and by explaining the alignment externally.

The irony of the above is that collaborating within the framework of the project, as Open Source encourages, could work just as well for a Free Software project, provided the philosophical differences could be overcome (or overlooked). In fact, one could go a stage further and theorize that the four freedoms as well as being input axioms to Free Software are, in fact, the generated end points of corporate pursuit of Open Source, so if the Open Source model wins in business, there won’t actually be a discernible difference between Open Source and Free Software.

Licences and Philosophy

It has often been said that the licence embodies the philosophy of the project (I’ve said it myself on more than one occasion, for which I’d now like to apologize). However, it is an extremely reckless statement because it’s manifestly untrue in the case of GPL. Neither v2 nor v3 does anything to require that adopters also espouse the four freedoms, although it could be said that the Tivoization Clause of v3, to which the kernel developers objected, goes slightly further down the road of trying to embed philosophy in the licence. The reason for avoiding this statement is that it’s very easy for an inexperienced corporation (or pretty much any corporate legal counsel with lack of Open Source familiarity) to take this statement at face value and assume adopting the code or the licence will force some sort of viral adoption of a philosophy which is incompatible with their current business model and thus reject the use of reciprocal licences altogether. Whenever any corporation debates using or contributing to Open Source, there’s inevitably an internal debate and this licence embeds philosophy argument is a powerful weapon for the Open Source opponents.

Equity in Contribution Models

Some licensing models, like those pioneered by Apache, are thought to require a foundation to pass the rights through under the licence: developers (or their corporations) sign a Contributor Licence Agreement (CLA) which basically grants the foundation redistributable licences to both copyrights and patents in the code and then the the Foundation licenses the contribution to the Project under Apache-2. The net result is the outbound rights (what consumers of the project gets) are Apache-2 but the inbound rights (what contributors are required to give away) are considerably more. The danger in this model is that control of the foundation gives control of the inbound rights, so who controls the foundation and how control may be transferred forms an important part of the analysis of what happens to contributor rights. Note that this model is also the basis of open core, with a corporation taking the place of the foundation.

Inequity in the inbound versus the outbound rights creates an imbalance of power within the project between those who possess the inbound rights and everyone else (who only possess the outbound rights) and can damage developer driven communities by creating an alternate power structure (the one which controls the IP rights). Further, the IP rights tend to be a focus for corporations, so simply joining the controlling entity (or taking a licence from it) instead of actually contributing to the project can become an end goal, thus weakening the technical contributions to the project and breaking the link with end users.

Creating equity in the licensing framework is thus a key to preserving the developer driven nature of a community. This equity can be preserved by using the Inbound = Outbound principle, first pioneered by Richard Fontana, the essential element being that contributors should only give away exactly the rights that downstream recipients require under the licence. This structure means there is no need for a formal CLA and instead a model like the Developer Certificate of Origin (DCO) can be used whereby the contributor simply places a statement in the source control of the project itself attesting to giving away exactly the rights required by the licence. In this model, there’s no requirement to store non-electronic copies of the the contribution attestation (which inevitably seem to get lost), because the source control system used by the project does this. Additionally, the source browsing functions of the source control system can trace a single line of code back exactly to all the contributor attestations thus allowing fully transparent inspection and independent verification of all the inbound contribution grants.

The Dangers of Foundations

Foundations which have no special inbound contribution rights can still present a threat to the project by becoming an alternate power structure. In the worst case, the alternate power structure is cemented by the Foundation having a direct control link with the project, usually via some Technical Oversight Committee (TOC). In this case, the natural Developer Driven nature of the project is sapped by the TOC creating uncertainty over whether a contribution should be accepted or not, so now the object isn’t to enthuse fellow developers, it’s to please the TOC. The control confusion created by this type of foundation directly atrophies the project.

Even if a Foundation specifically doesn’t create any form of control link with the project, there’s still the danger that a corporation’s marketing department sees joining the Foundation as a way of linking itself with the project without having to engage the engineering department, and thus still causing a weakening in both potential contributions and the link between the project and its end users.

There are specific reasons why projects need foundations (anything requiring financial resources like conferences or grants requires some entity to hold the cash) but they should be driven by the need of the community for a service and not by the need of corporations for an entity.

GPL+DCO as the Perfect Licence and Contribution Framework

Reciprocity is the key to this: the requirement to give back the modifications levels the playing field for corporations by ensuring that they each see what the others are doing. Since there’s little benefit (and often considerable down side) to hiding modifications and doing a dump at release time, it actively encourages collaboration between competitors on shared features. Reciprocity also contains patent leakage as we saw in Part 1. Coupled with the DCO using the Inbound = Outbound principle, means that the Licence and DCO process are everything you need to form an effective and equal community.

Equality enforced by licensing coupled with reciprocity also provides a level playing field for corporate contributors as we saw in part 1, so equality before the community ensures equity among all participants. Since this is analogous to the equity principles that underlie a lot of the world’s legal systems, it should be no real surprise that it generates the best contribution framework for the project. Best of all, the model works simply and effectively for a group of contributors without necessity for any more formal body.

Contributions and Commits

Although GPL+DCO can ensure equity in contribution, some human agency is still required to go from contribution to commit. The application of this agency is one of the most important aspects to the vibrancy of the project and the community. The agency can be exercised by an individual or a group; however, the composition of the agency doesn’t much matter, what does is that the commit decisions of the agency essentially (and impartially) judge the technical merit of the contribution in relation to the project.

A bad commit agency can be even more atrophying to a community than a Foundation because it directly saps the confidence the community has in the ability of good (or interesting) code to get into the tree. Conversely, a good agency is simply required to make sound technical decisions about the contribution, which directly preserves the confidence of the community that good code gets into the tree. As such, the role requires leadership, impartiality and sound judgment rather than any particular structure.

Governance and Enforcement

Governance seems to have many meanings depending on context, so lets narrow it to the rules by which the project is run (this necessarily includes gathering the IP contribution rights) and how they get followed. In a GPL+DCO framework, the only additional governance component required is the commit agency.

However, having rules isn’t sufficient unless you also follow them; in other words you need some sort of enforcement mechanism. In a non-GPL+DCO system, this usually involves having an elaborate set of sanctions and some sort of adjudication system, which, if not set up correctly, can also be a source of inequity and project atrophy. In a GPL+DCO system, most of the adjudication system and sanctions can be replaced by copyright law (this was the design of the licence, after all), which means licence enforcement (or at least the threat of it) becomes the enforcement mechanism. The only aspect of governance this doesn’t cover is the commit agency. However, with no other formal mechanisms to support its authority, the commit agency depends on the trust of the community to function and could easily be replaced by that community simply forking the tree and trusting a new commit agency.

The two essential corollaries of the above is that enforcement does serve an essential governance purpose in a GPL+DCO ecosystem and lack of a formal power structure keeps the commit agency honest because the community could replace it.

The final thing worth noting is that too many formal rules can also seriously weaken a project by encouraging infighting over rule interpretations, how exactly they should be followed and who did or did not dot the i’s and cross the t’s. This makes the very lack of formality and lack of a formalised power structure which the GPL+DCO encourages a key strength of the model.

Conclusions

In the first part I concluded that the GPL fostered the best ecosystem between developers, corporations and users by virtue of the essential ecosystem fairness it engenders. In this part I conclude that formal control structures are actually detrimental to a developer driven community and thus the best structural mechanism is pure GPL+DCO with no additional formality. Finally I conclude that this lack of ecosystem control is no bar to strong governance, since that can be enforced by any contributor through the copyright mechanism, and the very lack of control is what keeps the commit agency correctly serving the community.

GPL as the best licence – Community, Code and Licensing

4 Replies

This article is the first of a set supporting the conclusion that the GPL family of copy left licences are the best ones for maintaining a healthy development pace while providing a framework for corporations and users to influence the code base. It is based on an expansion of the thoughts behind the presentation GPL: The Best Business Licence for Corporate Code at the Compliance Summit 2017 in Yokohama.

A Community of Developers

The standard definition of any group of people building some form of open source software is usually that they’re developers (people with the necessary technical skills to create or contribute to the project). In pretty much every developer driven community, they’re doing it because they get something out of the project itself (this is the scratch your own itch idea in the Cathedral and the Bazaar): usually because they use the project in some form, but sometimes because they’re fascinated by the ideas it embodies (this latter is really how the Linux Kernel got started because ordinarily a kernel on its own isn’t particularly useful but, for a lot of the developers, the ideas that went into creating unix were enormously fascinating and implementations were completely inaccessible in Europe thanks to the USL vs BSDi lawsuit).

The reason for discussing developer driven communities is very simple: they’re the predominant type of community in open source (Think Linux Kernel, Gnome, KDE etc) which implies that they’re the natural type of community that forms around shared code collaboration. In this model of interaction, community and code are interlinked: Caring for the code means you also care for the community. The health of this type of developer community is very easily checked: ask how many contributors would still contribute to the project if they weren’t paid to do it (reduction in patch volume doesn’t matter, just the desire to continue sending patches). If fewer than 50% of the core contributors would cease contributing if they weren’t paid then the community is unhealthy.

Developer driven communities suffer from three specific drawbacks:

They’re fractious: people who care about stuff tend to spend a lot of time arguing about it. Usually some form of self organising leadership fixes a significant part of this, but it’s not guaranteed.
Since the code is built by developers for developers (which is why they care about it) there’s no room for users who aren’t also developers in this model.
The community is informal so there’s no organisation for corporations to have a peer relationship with, plus developers don’t tend to trust corporate motives anyway making it very difficult for corporations to join the community.

Trusting Corporations and Involving Users

Developer communities often distrust the motives of corporations because they think corporations don’t care about the code in the same way as developers do. This is actually completely true: developers care about code for its own sake but corporations care about code only as far as it furthers their business interests. However, this business interest motivation does provide the basis for trust within the community: as long as the developer community can see and understand the business motivation, they can trust the Corporation to do the right thing; within limits, of course, for instance code quality requirements of developers often conflict with time to release requirements for market opportunity. This shared interest in the code base becomes the framework for limited trust.

Enter the community manager: A community manager’s job, when executed properly, is twofold: one is to take corporate business plans and realign them so that some of the corporate goals align with those of useful open source communities and the second is to explain this goal alignment to the relevant communities. This means that a good community manager never touts corporate “community credentials” but instead explains in terms developers can understand the business reasons why the community and the corporation should work together. Once the goals are visible and aligned, the developer community will usually welcome the addition of paid corporate developers to work on the code. Paying for contributions is the most effective path for Corporations to exert significant influence on the community and assurance of goal alignment is how the community understands how this influence benefits the community.

Involving users is another benefit corporations can add to the developer ecosystem. Users who aren’t developers don’t have the technical skills necessary to make their voices and opinions heard within the developer driven community but corporations, which usually have paying users in some form consuming the packaged code, can respond to user input and could act as a proxy between the user base and the developer community. For some corporations responding to user feedback which enhances uptake of the product is a natural business goal. For others, it could be a goal the community manager pushes for within the corporation as a useful goal that would help business and which could be aligned with the developer community. In either case, as long as the motives and goals are clearly understood, the corporation can exert influence in the community directly on behalf of users.

Corporate Fear around Community Code

All corporations have a significant worry about investing in something which they don’t control. However, these worries become definite fears around community code because not only might it take a significant investment to exert the needed influence, there’s also the possibility that the investment might enable a competitor to secure market advantage.

Another big potential fear is loss of intellectual property in the form of patent grants. Specifically, permissive licences with patent grants allow any other entity to take the code on which the patent reads, incorporate it into a proprietary code base and then claim the benefit of the patent grant under the licence. This problem, essentially, means that, unless it doesn’t care about IP leakage (or the benefit gained outweighs the problem caused), no corporation should contribute code to which they own patents under a permissive licence with a patent grant.

Both of these fears are significant drivers of “privatisation”, the behaviour whereby a corporation takes community code but does all of its enhancements and modifications in secret and never contributes them back to the community, under the assumption that bearing the forking cost of doing this as less onerous than the problems above.

GPL is the Key to Allaying these Fears

The IP leak fear is easily allayed: whether the version of GPL that includes an explicit or implicit patent licence, the IP can only leak as far as the code can go and the code cannot be included in a proprietary product because of the reciprocal code release requirements, thus the Corporation always has visibility into how far the IP rights might leak by following licence mandated code releases.

GPL cannot entirely allay the fear of being out competed with your own code but it can, at least, ensure that if a competitor is using a modification of your code, you know about it (as do your competition), so everyone has a level playing field. Most customers tend to prefer active participants in open code bases, so to be competitive in the market places, corporations using the same code base tend to be trying to contribute actively. The reciprocal requirements of GPL provide assurance that no-one can go to market with a secret modification of the code base that they haven’t shared with others. Therefore, although corporations would prefer dominance and control, they’re prepared to settle for a fully level playing field, which the GPL provides.

Finally, from the community’s point of view, reciprocal licences prevent code privatisation (you can still work from a fork, but you must publish it) and thus encourage code sharing which tends to be a key community requirement.

Conclusions

In this first part, I conclude that the GPL, by ensuring fairness between mutually distrustful contributors and stemming IP leaks, can act as a guarantor of a workable code ecosystem for both developers and corporations and, by using the natural desire of corporations to appeal to customers, can use corporations to bridge the gap between user requirements and the developer community.

In the second part of this series, I’ll look at philosophy and governance and why GPL creates self regulating ecosystems which give corporations and users a useful function while not constraining the natural desire of developers to contribute and contrast this with other possible ecosystem models.