Figuring out how ipsec transforms work in Linux

I’ve had a couple of reasons recently to wonder about ipsec: one was doing private overlay networks in confidential VMs and the other was trying to be more efficient than my IPv4 openVPN when I’m remote on an IPv6 capable network. Usually ipsec descriptions begin with tools like raccoon or strong/open/libreswan; however, I’m going to try to explain how you do ipsec at a very basic level within Linux networking stack without using an ipsec toolkit. I’m going to concentrate on my latter use case, so this post is going to be ipsec over IPv6 (although most of the concepts should be applicable to IPv4). To attempt to do this, I’ll be delving into the ip xfrm commands extensively and trying to explain how the transform filters and policy work with the rest of the Linux networking stack.

The basics of ipsec

ipsec has two “protocols”: Authentication Header (AH) which means the packet is fully authenticated (or integrity protected) by an HMAC but not encrypted; and Encapsulating Security Payload (ESP), where the packet is encrypted but not necessarily integrity protected (ipsec was invented before AEAD ciphers, so previously you used both protocols to ensure confidentiality and integrity but, in the modern world, you can use ESP with an AEAD cipher and dispense entirely with AH). Once you have the protocol set, you encapsulate either in transport or tunnel mode. Transport mode means that the protocol headers are simply added to an existing IP packet (so the source and destination address remain the same) in the case of AH and an added header plus an encryption transformed payload with ESP and tunnel mode means that the entire IP packet is encapsulated and a new outer source and destination address is added (this is sometimes referred to as an ipsec VPN).

Understanding ipsec flows

This diagram should help understand how ipsec transforms work. There are two aspects to this: policy which basically does accept/reject and tagging and state, which does the encode/decode

The square boxes are the firewall filters and the ellipses are the ip xfrm policy and state transforms. xfrm decode is unconditionally activated whenever an ipsec packet reaches the input flow (provided there’s a matching state rule), output encoding only occurs if a matching output policy says it should (otherwise the packet is passed unencoded) and the xfrm policy fwd has no matching encode/decode, so it’s not possible to ipsec transform forwarded packets.

ip xfrm policy

To decapsulate, there is no requirement for a policy: every ipsec packet coming into the INPUT flow will be checked for a state match and decapsulated if one is found. However, for the packet to progress further you may need a policy. For transport mode, the decapsulated packet will go back around the input loop, so a dir in policy likely isn’t required but for tunnel mode, the decapsulated packet will likely traverse the FORWARD table and a dir fwd policy plus a firewall rule is likely required to permit the packet. Forwarded packets also hit the dir out policy as well.

A policy is specified with two parts: a selector which contains a set of matches (the only mandatory part of which is the direction dir) and which may match on partial addresses an action (block or allow, with allow being default) and a template (tmpl) which can specify the encoding (for dir out) or additional rules based on encapsulation.

Encoding Templates

Encoding only applies to dir out policies. Transport mode simply requires a statement of which encapsulation to use (proto ah or esp) and doesn’t require IP addresses in the ID section. In tunnel mode, the template must also have the source and destination outer IP addresses (the current source and destination become the inner addresses).

Every packet matching an encoding policy must also have a corresponding ip xfrm state match to specify the encapsulation parameters. Note that if a state transform is missing, the kernel will signal this on a netlink socket (which you can monitor with ip xfrm monitor). This socket is mostly used by ipsec toolkits to add state transforms just in time.

Other Policy Templates

For the in and fwd directions, the template acts as an additional filter on what packets to allow and what to block. For instance, if only decapsulated packets should be forwarded, then there should be a policy like

ip xfrm policy add dst net/netmask dir fwd tmpl proto ah mode tunnel level required

Which says the only allowed packets are those which were encapsulated in tunnel mode. Note that for this policy to be reached at all, the FORWARD table must allow the packets to pass. For most network security people, having a blanket forward permit rule is an anathema, so they often achieve the same thing by applying a firewall mark in decapsulation (the output-mark option of ip xfrm state) and only allowing market packets to pass the FORWARD chain (which dispenses with the need for a xfrm dir fwd policy). The two levers for controlling filter policies are the action (allow or block) the default is allow which is why this statement usually doesn’t appear) and the template level (required or use). The default level is required, which means for the allow rule to match the packet must be decapsulate (level use means pass regardless of decapsulation status)

Security Parameters Index (SPI) and reqid

For ipsec to work, every encapsulated packet must have a SPI value. You can specify this in the state transformation. The standards (RFC2409, RFC4303, etc) specify that SPI values 1-255 are “reserved”. Additionally the standards allow SPI value 0 to be used internally, which the Linux Kernel takes advantage of. SPI is mostly used to distinguish packet streams from the same host for complex ipsec policy, and don’t have much use in a simple policy situation. However, you must provide a value that isn’t 0-255 otherwise strange things can happen. In particular 0, the value you’ll get if you don’t specify spi, often causes the packet to get lost after decapsulation, so always specify a large number for spi. requid is a label which is attached to an unencapsulated packet that effectively remembers what the SPI value was; it’s mostly used as a label based discriminator in policy template to state transforms.

For the purposes of the following example I’ll simply use the randomly chosen 4321 for the spi value (but you could choose anything outside the 0-255 range).

ip xfrm state

Unlike policy, which attaches to a particular location (in, out or fwd), state is location agnostic and the same state match could theoretically be used both to encapsulate or decapsulate. State matching also isn’t subnet based: the address matching is either exact or fully wild card (match everything). However, a state encapsulation transform rule must match on dst and a decapsulation one may match on either src or dst, but must have an exact match on one or other. The main thing the state specifies is the algorithm to encapsulate (see man ip-xfrm for a full list). Remember you also must specify spi. The only other thing you might want to specify is the sel parameter. The selector applies to the inner address of decapsulated packets and is to ensure that a mode tunnel packet is going to an address you approve of.

Simple Example: HMAC authentication between two nodes

Assume a [4321::]/64 subnet with two nodes [4321::1] and [4321::2]. To set up authenticated headers one way (from 1->2) you need a policy specifying AH (can be specific or subnet based, so this is subnet)

ip xfrm policy add dst 4321::/64 dir out tmpl proto ah mode tunnel

Followed by a state that’s specific to the destination (using random spi 4321 and short key 1234):

ip xfrm state add dst 4321::2 proto ah spi 4321 auth "hmac(sha1)" 1234 mode transport

If you ping from [4321::1] and do a tcp dump from [4321::2] you’ll see

IP6 4321::1 > 4321::2: AH(spi=0x000010e1,seq=0x1b,icv=0x530cdd96149288da7a35fc6d): ICMP6, echo request, id 4, seq 104, length 64

But nothing will come back until you add on [4321::2]

ip xfrm state add dst 4321::2 proto ah spi 4321 auth "hmac(sha1)" 1234 mode transport

Which will cause a ping response to be seen. Note the ping packet has an authentication header, but the response is a simple icmp6 response packet (no AH) demonstrating ipsec can be set up asymmetrically.

Example: Private network for Cloud Nodes

Assume we have N nodes with public IP addresses [4321::1]…[4321::N] (which could be provided by the cloud overlay or simply by virtue of the physical network the nodes are on) and we want to connect them in a private mesh network using encryption. There are two ways of doing this: the first is to encrypt all traffic between the nodes on the public network using transport mode and the second would be to set up overlay tunnels between the nodes (this latter can be used even if the public addresses aren’t on a single network segment).

Simple Transport Mode Encryption

Firstly each node needs a policy to require encryption both to and from the private network

ip xfrm policy add dst 4321::0/64 dir out tmpl proto esp
ip xfrm policy add scr 4321::/64 dir in tmpl proto esp

And then for each node on the star and encrypt and a decrypt policy (the aes cipher type is taken from the key length, so I’ve chosen a 128 bit key “1234567890123456”)

ip xfrm state add dst 4321::1 proto esp spi 4321 enc "cbc(aes)" 1234567890123456 mode transport
ip xfrm state add src 4321::1 proto esp sip 4321 enc "cbc(aes)" 1234567890123456 mode transport
...
ip xfrm state add dst 4321::N proto esp spi 4321 enc "cbc(aes)" 1234567890123456 mode transport
ip xfrm state add src 4321::N proto esp sip 4321 enc "cbc(aes)" 1234567890123456 mode transport

This encryption scheme has one key for the entire network, but you could use 1 key per node if you wished (although this wouldn’t necessarily increase security that much). Note that what’s described above is not an overlay network because it relies on using the characteristics of the underlying network (in this case that all nodes are on an IPv6 /64 segment) to do opportunistic transport encryption. To get a true single subnet overlay on top of a disjoint network there must be some sort of tunnel. One way to get the tunnel is simply to use ipsec in tunnel mode, but another is to set up gre tunnels (or another, not necessarily trusted, network overlay which the cloud can likely provide) for the virtual overlay and then use ipsec in transport mode to ensure the packets are always encrypted.

Tunnel Mode Overlay Network

For this example, we’ll allow unencrypted packets to flow over a routed network [4321::1] and [4322::2] (assume network device eth0 on each node) but set up an overlay network on [6666::N]/64 which is fully encrypted. Firstly, each node requires a local address addition for the [6666::N] address. So on node [4321::1] do

ip addr add 6666::1/128 dev eth0

Now add policies and state transforms in both directions (in this case a dir in policy is required otherwise the decapsulated packets won’t get sent up the input flow):

# required policy for encapsulation
ip xfrm policy add dst 6666::2 dir out tmpl src 4321::1 dst 4322::2 proto esp mode tunnel
# state transform for encapsulation
ip xfrm state add src 4321::1 dst 4322::2 proto esp spi 4321 enc "cbc(aes)" 1234567890123456 mode tunnel
# policy to allow passing of decapsulated packets
ip xfrm policy add dst 6666::1 dir in tmpl proto esp mode tunnel level required
# automatic decapsulation.  sel ensures addresses after decapsulation
ip xfrm state add src 4322::2 dst 4321::1 proto esp sip 4321 enc "cbc(aes)" 1234567890123456 mode tunnel sel src 6666::2 dst 6666::1

And on the other node [4322::2] do the same in reverse

ip addr add 6666::2/128 dev eth0
ip xfrm policy add dst 6666::1 dir out tmpl src 4321::1 dst 4322::2 proto esp mode tunnel
ip xfrm state add src 4322::2 dst 4321::1 proto esp spi 4321 enc "cbc(aes)" 1234567890123456 mode tunnel
ip xfrm policy add dst 6666::2 dir in tmpl proto esp mode tunnel level required
ip xfrm state add src 4321::1 dst 4322::2 proto esp sip 4321 enc "cbc(aes)" 1234567890123456 mode tunnel sel src 6666::1 dst 6666::2

Note there’s no need to add any routing entries because the decapsulation is point to point (incoming decapsulated packets always end up in the input flow destined for the local [6666::N] address). With the above rules you should be able to ping from [6666::1] to [6666::2] and tcpdump should show fully encrypted packets going over the wire.

Obviously, you can add more nodes but each time you have to add rules for all the other nodes, making this an N(N-1) scaling problem. The need for a specific source and destination template in the out policy means you must have one for each connection. The in policy can be subnet based. The reason why people use ipsec toolkits is that they can add the transforms just in time for the subset of nodes you’re actually communicating with rather than having to add all the rules up front.

Final Example: correcly keyed AH Inbound Packet Acceptance

The final example is me trying to penetrate my router firewall when on an external IPv6 connection. I have a class /60 set of IPv6 space, so each of my systems has its own IPv6 address but, as is usual, inbound packets in the NEW state are blocked. It occurred to me that I should be able to use ipsec AH (no real need for encryption since most of the protocols I use are encrypted anyway) to accept packets to an internal destination with the NEW state. This would be an asymmetric use of ipsec because inbound would have AH but return wouldn’t.

My initial thought was to use AH in transport mode, but as you can see from the diagram above that won’t work because the router merely forwards the packets and to get to a state decapsulation on the router they have to go up the INPUT flow.

The next attempt used tunnel mode, so the packet was aimed at the router and then the inner destination was the real node. The next problem was the fwd policy to permit this: the xfrm policy has no connection tracker and return packets from connections originating in the interior node also have to pass this filter. The solution to this conundrum is to install a level use policy and rely on the FORWARD table firewall rules to allow RELATED,ESTABLISHED and MARKed packets (so the decapsulation can add the MARK to pass this rule).

IPSEC on the Router

Assume my external router address is [4321::1] and my internal network is [4444::]/60. On the router, I install a catch all state transform

ip xfrm state add add dst 4321::1 proto ah spi 4321 auth "hmac(sha1)" 1234 mode tunnel sel dst 4444::/60 output-mark 0x1

But I also need a policy to permit the decapsulated packets (and the state RELATED,ESTABLISHED unencapsulated ones) to pass:

ip xfrm policy add dst 4444::/60 dir fwd tmpl proto ah spi 4321 mode tunnel level use

And finally I need an addition to the firewall rules to allow packets in state NEW but with mark 0x1 to pass

ip6tables -A FORWARD -m conntrack --ctstate NEW -m mark --mark 0x1/0x1 -j ACCEPT

Which should be placed directly after the RELATED,ESTABLISHED state check.

Since there’s no encapsulation on outbound, the return packets simply pass through the firewall as normal. This means that any external entity wishing to use this AH packet acceptance simply needs a policy and state to tunnel

ip xfrm policy dst 4444::/60 dir out tmpl proto ah dst 4321::1 mode tunnel
ip xfrm state add dst 4321::1 spi 4321 proto ah auth "hmac(sha1)" 1234 mode tunnel

And with that, any machine known by inner IPv6 address can be reached (for an IPv6 connected remote machine).

Conclusion

Hopefully this post has demystified some of the ip xfrm rules for you. I’m afraid the commands have a huge range of options, so I’ve only covered the essential ones above and there are still loads of interesting but not at all well documented ones remaining, but thanks to the examples you should have some scope now for playing with them.

James Bottomley's random Pages

A collection of information