Tag Archives: sip

Securing the Google SIP Stack

A while ago I mentioned I use Android-10 with the built in SIP stack and that the Google stack was pretty buggy and I had to fix it simply to get it to function without disconnecting all the time. Since then I’ve upported my fixes to Android-11 (the jejb-11 branch in the repositories) by using LineageOS-19.1. However, another major deficiency in the Google SIP stack is its complete lack of security: both the SIP signalling and the media streams are all unencrypted meaning they can be intercepted and tapped by pretty much anyone in the network path running tcpdump. Why this is so, particularly for a company that keeps touting its security credentials is anyone’s guess. I personally suspect they added SIP in Android-4 with a view to basing Google Voice on it, decided later that proprietary VoIP protocols was the way to go but got stuck with people actually using the SIP stack for other calling services so they couldn’t rip it out and instead simply neglected it hoping it would die quietly due to lack of features and updates.

This blog post is a guide to how I took the fully unsecured Google SIP stack and added security to it. It also gives a brief overview of some of the security protocols you need to understand to get secure VoIP working.

What is SIP

What I’m calling SIP (but really a VoIP system using SIP) is a protocol consisting of several pieces. SIP (Session Initiation Protocol), RFC 3261, is really only one piece: it is the “signalling” layer meaning that call initiation, response and parameters are all communicated this way. However, simple SIP isn’t enough for a complete VoIP stack; once a call moves to in progress, there must be an agreement on where the media streams are and how they’re encoded. This piece is called a SDP (Session Description Protocol) agreement and is usually negotiated in the body of the SIP INVITE and response messages and finally once agreement is reached, the actual media stream for call audio goes over a different protocol called RTP (Real-time Transport Protocol).

How Google did SIP

The trick to adding protocols fast is to take them from someone else (if you’re open source, this is encouraged) so google actually chose the NIST-SIP java stack (which later became the JAIN-SIP stack) as the basis for SIP in android. However, that only covered signalling and they had to plumb it in to the android Phone model. One essential glue piece is frameworks/opt/net/voip which supplies the SDP negotiating layer and interfaces the network codec to the phone audio. This isn’t quite enough because the telephony service and the Dialer also need to be involved to do the account setup and call routing. It always interested me that SIP was essentially special cased inside these services and apps instead of being a plug in, but that’s due to the fact that some of the classes that need extending to add phone protocols are internal only; presumably so only manufacturers can add phone features.

Securing SIP

This is pretty easy following the time honoured path of sending messages over TLS instead of in the clear simply by using a TLS wrappering technique of secure sockets and, indeed, this is how RFC 3261 says to do it. However, even this minor re-engineering proved unnecessary because the nist-sip stack was already TLS capable, it simply wasn’t allowed to be activated that way by the configuration options Google presented. A simple 10 line patch in a couple of repositories (external/nist_sip, packages/services/Telephony and frameworks/opt/net/voip) fixed this and the SIP stack messaging was secured leaving only the voice stream insecure.


As I said above, the google frameworks/opt/net/voip does all the SDP negotiation. This isn’t actually part of SIP. The SDP negotiation is conducted over SIP messages (which means it’s secured thanks to the above) but how this should be done isn’t part of the SIP RFC. Instead SDP has its own RFC 4566 which is what the class follows (mainly for codec and port negotiation). You’d think that if it’s already secured by SIP, there’s no additional problem, but, unfortunately, using SRTP as the audio stream requires the exchange of additional security parameters which added to SDP by RFC 4568. To incorporate this into the Google SIP stack, it has to be integrated into the voip class. The essential additions in this RFC are a separate media description protocol (RTP/SAVP) for the secure stream and the addition of a set of tagged a=crypto: lines for key negotiation.

As will be a common theme: not all of RFC 4568 has to be implemented to get a secure RTP stream. It goes into great detail about key lifetime and master key indexes, neither of which are used by the asterisk SIP stack (which is the one my phone communicates with) so they’re not implemented. Briefly, it is best practice in TLS to rekey the transport periodically, so part of key negotiation should be key lifetime (actually, this isn’t as important to SRTP as it is to TLS, see below, which is why asterisk ignores it) and the people writing the spec thought it would be great to have a set of keys to choose from instead of just a single one (The Master Key Identifier) but realistically that simply adds a load of complexity for not much security benefit and, again, is ignored by asterisk which uses a single key.

In the end, it was a case of adding a new class for parsing the a=crypto: lines of SDP and doing a loop in the audio protocol for RTP/SAVP if TLS were set as the transport. This ended up being a ~400 line patch.

Secure RTP

RTP itself is governed by RFC 3550 which actually contains two separate stream descriptions: the actual media over RTP and a control protocol over RTCP. RTCP is mostly used for multi-party and video calls (where you want reports on reception quality to up/downshift the video resolution) and really serves no purpose for audio, so it isn’t implemented in the Google SIP stack (and isn’t really used by asterisk for audio only either).

When it comes to securing RTP (and RTCP) you’d think the time honoured mechanism (using secure sockets) would have applied although, since RTP is transmitted over UDP, one would have to use DTLS instead of TLS. Apparently the IETF did consider this, but elected to define a new protocol instead (or actually two: SRTP and SRTCP) in RFC 3711. One of the problems with this new protocol is that it also defines a new ciphersuite (AES_CM_…) which isn’t found in any of the standard SSL implementations. Although the AES_CM ciphers are very similar in operation to the AES_GCM ciphers of TLS (Indeed AES_GCM was adopted for SRTP in a later RFC 7714) they were never incorporated into the TLS ciphersuite definition.

So now there are two problems: adding code for the new protocol and performing the new encyrption/decryption scheme. Fortunately, there already exists a library (libsrtp) that can do this and even more fortunately it’s shipped in android (external/libsrtp2) although it looks to be one of those throwaway additions where the library hasn’t really been updated since it was added (for cuttlefish gcastv2) in 2019 and so is still at a pre 2.3.0 version (I did check and there doesn’t look to be any essential bug fixes missing vs upstream, so it seems usable as is).

One of the really great things about libsrtp is that it has srtp_protect and srtp_unprotect functions which transform SRTP to RTP and vice versa, so it’s easily possible to layer this library directly into an existing RTP implementation. When doing this you have to remember that the encryption also includes authentication, so the size of the packet expands which is why the initial allocation size of the buffers has to be increased. One of the not so great things is that it implements all its own crypto primitives including AES and SHA1 (which most cryptographers think is always a bad idea) but on the plus side, it’s the same library asterisk uses so how much of a real problem could this be …

Following the simple layering approach, I constructed a patch to do the RTP<->SRTP transform in the JNI code if a key is passed in, so now everything just works and setting asterisk to SRTP only confirms the phone is able to send and receive encrypted audio streams. This ends up being a ~140 line patch.

So where does DTLS come in?

Anyone spending any time at all looking at protocols which use RTP, like webRTC, sees RTP and DTLS always mentioned in the same breath. Even asterisk has support for DTLS, so why is this? The answer is that if you use RTP outside the SIP framework, you still need a way of agreeing on the keys using SDP. That key agreement must be protected (and can’t go over RTCP because that would cause a chicken and egg problem) so implementations like webRTC use DTLS to exchange the initial SDP offer and answer negotiation. This is actually referred to as DTLS-SRTP even though it’s an initial DTLS handshake followed by SRTP (with no further DTLS in sight). However, this DTLS handshake is completely unnecessary for SIP, since the SDP handshake can be done over TLS protected SIP messaging instead (although I’ve yet to find anyone who can convincingly explain why this initial handshake has to go over DTLS and not TLS like SIP … I suspect it has something to do with wanting the protocol to be all UDP and go over the same initial port).


This whole exercise ended up producing less than 1000 lines in patches and taking a couple of days over Christmas to complete. That’s actually much simpler and way less time than I expected (given the complexity in the RFCs involved), which is why I didn’t look at doing this until now. I suppose the next thing I need to look at is reinserting the SIP stack into Android-12, but I’ll save that for when Android-11 falls out of support.

Using SIP to Replace Mobile and Land Lines

If you read more than a few articles in my blog you’ve probably figured out that I’m pretty much a public cloud Luddite: I run my own cloud (including my own email server) and don’t really have much of my data in any public cloud. I still have public cloud logins: everyone wants to share documents with Google nowadays, but Google regards people who don’t use its services “properly” with extreme prejudice and I get my account flagged with a security alert quite often when I try to log in.

However, this isn’t about my public cloud phobia, it’s about the evolution of a single one of my services: a cloud based PBX. It will probably come as no surprise that the PBX I run is Asterisk on Linux but it may be a surprise that I’ve been running it since the early days (since 1999 to be exact). This is the story of why.

I should also add that the motivation for this article is that I’m unable to get a discord account: discord apparently has a verification system that requires a phone number and explicitly excludes any VOIP system, which is all I have nowadays. This got me to thinking that my choices must be pretty unusual if they’re so pejoratively excluded by a company whose mission is to “Create Space for Everyone to find Belonging”. I’m sure the suspicion that this is because Discord the company also offers VoIP services and doesn’t like the competition is unworthy.

Early Days

I’ve pretty much worked remotely in the US all my career. In the 90s this meant having three phone lines (These were actually physical lines into the house): one for the family, one for work and one for the modem. When DSL finally became a thing and we were running a business, the modem was replaced by a fax machine. The minor annoyance was knowing which line was occupied but if line 1 is the house and line 2 the office, it’s not hard. The big change was unbundling. This meant initially the call costs to the UK through the line provider skyrocketed and US out of state rates followed. The way around this was to use unbundled providers via dial-around (a prefix number), but finding the cheapest was hard and the rates changed almost monthly. I needed a system that could add the current dial-around prefix for the relevant provider automatically. The solution: asterisk running on a server in the basement with two digium FX cards for the POTS lines (fax facility now being handled by asterisk) and Aastra 9113i SIP phones wired over the house ethernet with PoE injectors. Some fun jiggery pokery with asterisk busy lamp feature allowed the lights on the SIP phone to indicate busy lines and extensions.conf could be programmed to keep the correct dial-around prefix. For a bonus, asterisk can be programmed to do call screening, so now if the phone system doesn’t recognize your number you get told we don’t accept solicitation calls and to hang up now, otherwise press 0 to ring the house phone … and we’ve had peaceful dinner times ever after. It was also somewhat useful to have each phone in the house on its own PBX extension so people could call from the living room to my office without having to yell.

Enter SIP Trunking

While dial-arounds worked successfully for a few years, they always ended with problems (usually signalled by a massive phone bill) and a new dial-around was needed. However by 2007 several companies were offering SIP trunking over the internet. The one I chose (Localphone, a UK based company) was actually a successful ring back provider before moving into SIP. They offered a pay as you go service with phone termination in whatever country you were calling. The UK and US rates were really good, so suddenly the phone bills went down and as a bonus they gave me a free UK incoming number (called a DID – Direct Inward Dialing) which family and friends in the UK could call us on at local UK rates. Pretty much every call apart from local ones was now being routed over the internet, although most incoming calls, apart for those from the UK, were still over the POTS lines.

The beginning of Mobile (For Me)

I was never really a big consumer of mobile phones, but that all changed in 2009 when google presented all kernel developers with a Nexus One. Of course, they didn’t give us SIM cards to go with it, so my initial experiments were all over wifi. I soon had CyanogenMod installed and found a SIP client called Sipdroid. This allowed me to install my Nexus One as a SIP extension on the house network. SIP calls over 2G data were not very usable (the bandwidth was too low), but implementing multiple codecs and speex support got it to at least work (and is actually made me an android developer … scratching my own itch again). The bandwidth problems on 2G evaporated on 3G and SIP became really usable (although I didn’t have a mobile “plan”, I did use pay-as-you-go SIMs while travelling). It already struck me that all you really needed the mobile network for was data and then all calls could simply travel to a SIP provider. When LTE came along it seemed to be confirming this view because IP became the main communication layer.

I suppose I should add that I used the Nexus One long beyond its design life, even updating its protocol stack so it kept working. I did this partly because it annoyed certain people to see me with an old phone (I have a set of friends who were very amused by this and kept me supplied with a stock of Nexus One phones in case my old one broke) but mostly because of inertia and liking small phones.

SIP Becomes My Only Phone Service

In 2012, thanks to a work assignment, we relocated from the US to London. Since these moves take a while, I relocated the in-house PBX machine to a dedicated server in Los Angeles (my nascent private cloud), ditched the POTS connections and used the UK incoming number as our primary line that could be delivered to us while we were in temporary accommodation as well as when we were in our final residence in London. This did have the somewhat inefficient result that when you called from the downstairs living room to the upstairs office, the call was routed over an 8,000 mile round trip from London to Los Angeles and back, but thanks to internet latency improvements, you couldn’t really tell. The other problem was that the area code I’d chosen back in 2007 was in Whitby, some 200 Miles north of London but fortunately this didn’t seem to be much of an issue except for London Pizza delivery places who steadfastly refused to believe we lived locally.

When the time came in 2013 to move back to Seattle in the USA, the adjustment was simply made by purchasing a 206 area code DID and plugging it into the asterisk system and continued using a fully VoIP system based in Los Angeles. Although I got my incoming UK number for free, being an early service consumer, renting DIDs now costs around $1 per month depending on your provider.

SIP and the Home Office

I’ve worked remotely all my career (even when in London). However, I’ve usually worked for a company with a physical office setup and that means a phone system. Most corporate PBX’s use SIP under the covers or offer a SIP connector. So, by dint of finding the PBX administrator I’ve usually managed to get a SIP extension that will simply plug into my asterisk PBX. Using correct dial plan routing (and a prefix for outbound calling), the office number usually routes to my mobile and desk phone, meaning I can make and receive calls from my office number wherever in the world I happen to be. For those who want to try this at home, the trick is to find the phone system administrator; if you just ask the IT department, chances are you’ll simply get a blanket “no” because they don’t understand it might be easy to do and definitely don’t want to find out.

Evolution to Fully SIP (Data Only) Mobile

Although I said above that I maintained a range of in-country Mobile SIMs, this became less true as the difficulty in running in-country SIMs increased (most started to insist you add cash or use them fairly regularly). When COVID hit in 2020, and I had no ability to travel, my list of in-country SIMs was reduced to one from 3 UK largely because they allowed you to keep your number provided you maintained a balance (and they had a nice internet roaming agreement which meant you paid UK data rates in a nice range of countries). The big problem giving up a mobile number was no text messaging when travelling (SMS). For years I’ve been running a xmpp server, but the subset of my friends who have xmpp accounts has always been under 10% so it wasn’t very practical (actually, this is somewhat untrue because I wrote an xmpp to google chat bridge but the interface became very impedance mismatched as Google moved to rich media).

The major events that got me to move away from xmpp and the Nexus One were the shutdown of the 3G network in the US and the viability of the Matrix federated chat service (the Matrix android client relied on too many modern APIs ever to be backported to the version of android that ran on the Nexus One). Of the available LTE phones, I chose the Pixel-3 as the smallest and most open one with the best price/performance (and rapidly became acquainted with the fact that only some of them can actually be rooted) and LineageOS 17.1 (Android 10). The integration of SIP with the Dialer is great (I can now use SIP on the car’s bluetooth, yay!) but I rapidly ran into severe bugs in the Google SIP implementation (which hasn’t been updated for years). I managed to find and fix all the bugs (or at least those that affected me most, repositories here; all beginning with android_ and having the jejb-10 branch) but that does now mean I’m stuck on Android 10 since Google ripped SIP out in Android 12.

For messaging I adopted matrix (Apart from the Plumbers Matrix problem, I haven’t really written about it since matrix on debian testing just works out of the box) and set up bridges to Signal, Google Chat, Slack and WhatsApp (The WhatsApp one requires you be running WhatsApp on your phone, but I run mine on an Android VM in my cloud) all using the 3 UK Sim number where they require a mobile number confirmation. The final thing I did was to get a universal roaming data SIM and put it in my phone, meaning I now rely on matrix for messaging and SIP for voice when I travel because the data SIM has no working mobile number at all (either for voice or SMS). In many ways, this is no hardship: I never really had a permanent SMS number when travelling because of the use of in-country SIMs, so no-one has a number for me they rely on for SMS.

Conclusion and Problems

Although I implied above I can’t receive SMS, that’s not quite true: one of my VOIP numbers does accept SMS inbound and is able to send outbound, the problem is that it doesn’t come over the SIP MESSAGE protocol, but instead goes to a web page in the provider backend, making it inconvenient to use and meaning I have to know the message is coming (although I do use it for things like Delta Boarding passes, which only send the location of the web page to receive pkpasses over SMS). However, this isn’t usually a problem because most people I know have moved on from SMS to rich messaging over one of the protocols I have (and if one came along with a new protocol, well I can install a bridge for that).

In terms of SIP over an IP substrate giving rise to unbundled services, I could claim to be half right, since most of the modern phone like services have a SIP signalling core. However, the unbundling never really came about: the silo provider just moved from landline to mobile (or a mobile resale service like Google Fi). Indeed, today, if you give anyone your US phone number they invariably assume it is a mobile (and then wonder why you don’t reply to their SMS messages). This mobile assumption problem can be worked around by emphasizing “it’s a landline” every time you give out your VOIP number, but people don’t always retain the information.

So what about the future? I definitely still like the way my phone system works … having a single number for the house which any household member can answer from anywhere and side numbers for travelling really suits me, and I have the technical skills to maintain it indefinitely (provided the SIP trunking providers sill exist), but I can see the day coming where the Discord intolerance of non-siloed numbers is going to spread and most silos will require non-VOIP phone numbers with the same prejudice, thus locking people who don’t comply out in much the same way as it’s happening with email now; however, hopefully that day for VoIP is somewhat further off.