The recently completed Linux Plumbers Conference (LPC) 2021 used the Big Blue Button (BBB) project again as its audio/video online conferencing platform and Matrix for IM and chat. Why we chose BBB has been discussed previously. However this year we replaced RocketChat with Matrix to achieve federation, allowing non-registered conference attendees to join the chat. Also, based on feedback from our attendees, we endeavored to replace the BBB chat window with a Matrix one so anyone could see and participate in one contemporaneous chat stream within BBB and beyond. This enabled chat to be available before, during and after each session.
One thing that emerged from our initial disaster with Matrix on the first day is that we failed to learn from the experiences of other open source conferences (i.e. FOSDEM, which used Matrix and ran into the same problems). So, an object of this post is to document for posterity what we did and how to repeat it.
Integrating Matrix Chat into BBB
Most of this integration was done by Guy Lunardi.
It turns out that Chat is fairly deeply embedded into BBB, so replacing the existing chat module is hard. Fortunately, BBB also contains an embedded etherpad which is simply produced via an iFrame redirection. So what we did is to disable the BBB chat panel and replace it with a new iFrame based component that opened an embedded Matrix chat client. The client we chose was riot-embedded, which is a relatively recent project but seemed to work reasonably well. The final problem was to pass through user credentials. Up until three days before the conference, we had been happy with the embedded Matrix client simply creating a one-time numbered guest account every time it was opened, but we worried about this being a security risk and so implemented pass through login credentials at the last minute (life’s no fun unless you live dangerously).
Our custom front end for BBB (lpcfe) was created last year by Jon Corbet. It uses a fairly simple email/registration confirmation code for username/password via LDAP. The lpcfe front end Jon created is here git://git.lwn.net/lpcfe.git; it manages the whole of the conference log in process and presents the current and future sessions (with join buttons) according to the timezone of the browser viewing it.
The credentials are passed through directly using extra parameters to BBB (see commit fc3976e “Pass email and regcode through to BBB”). We eventually passed these through using a GET request. Obviously if we were using a secret password, this would be a problem, but since the password was a registration code handed out by a third party, it’s acceptable. I imagine if anyone wishes to take this work forward, add native Matrix device/session support in riot-embedded would be better.
The main change to get this working in riot-embedded is here, and the supporting patch to BBB is here.
Note that the Matrix room ID used by the client was added as an extra parameter to the flat text file that drives the conference track layout of lpcfe. All Matrix rooms were created as public (and published) so anyone going to our :lpc.events matrix domain could see and join them.
Setting up Matrix for the Conference
We used the matrix-synapse server and did a standard python venv pip install on Ubuntu of the latest tag. We created around 30+ public rooms: one for each Microconference and track of the conference and some admin and hallway rooms. We used LDAP to feed the authentication portion of lpcfe/Matrix, but we had a problem using email addresses since the standard matrix user name cannot have an ‘@’ symbol in it. Eventually we opted to transform everyone’s email to a matrix compatible form simply by replacing the ‘@’ with a ‘.’, which is why everyone in our conference appeared with ridiculously long matrix user names like @jejb.ibm.com:lpc.events
This ‘@’ to ‘.’ transformation was a huge source of problems due to the unwillingness of engineers to read instructions, so if we do this over again, we’ll do the transformation silently in the login javascript of our Matrix web client. (we did this in riot-embedded but ran out of time to do it in Element web as well).
Because we used LDAP, the actual matrix account for each user was created the first time they log into our server, so we chose at this point to use auto-join to add everyone to the 30+ LPC Matrix rooms we’d already created. This turned out to be a huge problem.
Testing our Matrix and BBB integration
We tried to organize a “Town Hall” event where we invited lots of people to test out the infrastructure we’d be using for the conference. Because we wanted this to be open, we couldn’t use the pre-registration/LDAP authentication infrastructure so Jon quickly implemented a guest mode (and we didn’t auto join anyone to any Matrix rooms other than the townhall chat).
In the end we got about 220 users to test during which time the Matrix and BBB infrastructure behaved quite well. Based on this test, we chose a 2 vCPU Linode VM for our Matrix server.
What happened on the Day
Come the Monday of the conference, the first problem we ran into was procrastination: the conference registered about 1,000 attendees, of whom, about 500 tried to log on about 5 minutes prior to the first session. Since accounts were created and rooms joined upon the first login, this is clearly a huge thundering herd problem of our own making … oops. The Matrix server itself shot up to 100% CPU on the python synapse process and simply stayed there, adding new users at a rate of about one every 30 seconds. All the chat tabs froze because logins were taking ages as well. The first thing we did was to scale the server up to a 16 CPU bare metal system, but that didn’t help because synapse is single threaded … all we got was the matrix synapse python process running at 100% one one of the CPUs, still taking 30 seconds per first log in.
Fixing the First Day problems
The first thing we realized is we had to multi-thread the synapse server. This is well known but the issue is also quite well hidden deep in the Matrix documents. It also happens that the Matrix documents are slightly incomplete. The first scaling attempt we tried: simply adding 16 generic worker apps to scale across all our physical CPUs failed because the Matrix server stopped federating and then the database crashed with “FATAL: remaining connection slots are reserved for non-replication superuser connections”.
Fixing the connection problem (alter system set max_connections = 1000;) triggered a shared memory too small issue which was eventually fixed by bumping the shared buffer segment to 8GB (alter system set shared_buffers=1024000;). I suspect these parameters were way too large, but the Linode we were on had 32GB of main memory, so fine tuning in this emergency didn’t seem a good use of time.
Fixing the worker problem was way more complex. The way Matrix works, you have to use a haproxy to redirect incoming connections to individual workers and you have to ensure that the same worker always services the same transaction (which you achieve by hashing on IP address). We got a lot of advice from FOSDEM on this aspect, but in the end, instead of using an external haproxy, we went for the built in forward proxy load balancing in nginx. The federation problem seems to be that Matrix simply doesn’t work without a federation sender. In the end, we created 15 generic workers and one each of media server, frontend server and federation sender.
Our configuration files are
- systemd target file
- systemd generic worker (yaml)
- systemd federation sender (yaml)
- systemd frontend (yaml)
- systemd media server (yaml)
- nginx file
once you have all the units enabled in systemd, you can then simply do systemctl start/stop matrix-synapse.target
Finally, to fix the thundering herd problem (for people who hadn’t already logged in), we ran through the entire spreadsheet of email/confirmation numbers doing an automatic login using the user management API on the server itself. At this point we had about half the accounts auto created, so this script created the rest.
emaillist=lpc2021-all-attendees.txt
IFS=' '
while read first last confirmation email; do
bbblogin=${email/+*@/@}
matrixlogin=${bbblogin/@/.}
curl -XPOST -d '{"type":"m.login.password", "user":"'${matrixlogin}'", "password":"'${confirmation}'"}' "http://localhost:8008/_matrix/client/r0/login"
sleep 1
done < ${emaillist}
The lpc2021-all-attendees.txt is a tab separated text file used to drive the mass mailings to plumbers attendees, but we adapted it to log everyone in to the matrix server.
Conclusion
With the above modifications, the matrix server on a Dedicated 32GB (16 cores) Linode ran smoothly for the rest of the conference. The peak load got to 17 and the peak total CPU usage never got above 70%. Finally, the peak memory usage was around 16GB including cache (so the server was a bit over provisioned).
In the end, 878 of the 944 registered attendees logged into our BBB servers at one time or another and we got a further 100 external matrix users (who may or may not also have had a conference account).
Pingback: Activitypub announce from Masanori Ogino
Pingback: Activitypub like from Masanori Ogino
Pingback: Activitypub announce from James Bottomley
Pingback: Activitypub announce from Geert Uytterhoeven
Pingback: Activitypub announce from Account: Computers
Pingback: Activitypub announce from Sergey Bugaev
Pingback: Activitypub like from :debian::trisquel:
Pingback: Activitypub announce from :debian::trisquel:
Pingback: Activitypub announce from Oreolek
Pingback: Activitypub like from Christian :manjaro: :tux:
Pingback: Activitypub announce from Christian :manjaro: :tux:
Pingback: Activitypub announce from
Pingback: Activitypub announce from Abhiseck Paira :gnu: :gnuhurd:
Pingback: Activitypub announce from
Pingback: Activitypub announce from Trygve
Pingback: Activitypub like from Vivek K J :debian: :linux:
Pingback: Activitypub announce from Vivek K J :debian: :linux:
Pingback: Activitypub like from Dick Smiths Fair Go Supporters
Pingback: Activitypub announce from Gidi Kroon
Pingback: Activitypub announce from sahilister :debian:
Pingback: Activitypub like from Paul Sutton
Pingback: Activitypub announce from Paul Sutton
Pingback: Activitypub announce from
Pingback: Activitypub like from George Dorn
Pingback: Activitypub like from cos
Pingback: Activitypub like from old sysops
Pingback: Activitypub like from Abbie Normal
Pingback: Activitypub announce from Carlo Gubitosa
Pingback: Activitypub announce from AliMurtezaYesil :antiverified:
Pingback: Activitypub like from ˗ˏˋ wakest ˎˊ˗
Pingback: Activitypub like from McSinyx
Pingback: Activitypub announce from McSinyx
Pingback: Activitypub like from fsan
Pingback: Activitypub like from Arjen P. de Vries
Pingback: Activitypub announce from Arjen P. de Vries
Pingback: Activitypub announce from Tanguy Fardet
Pingback: Activitypub announce from raised on REPLs
Pingback: Activitypub like from muio
Pingback: Activitypub like from Djoerd Hiemstra
Pingback: Activitypub like from 徒 ᛋᛖᛏᛏᚩ セット
Pingback: Activitypub announce from 徒 ᛋᛖᛏᛏᚩ セット
Pingback: Activitypub announce from stragu
Pingback: Activitypub like from dywen
Pingback: Activitypub announce from dywen
Pingback: Activitypub like from
Pingback: Activitypub like from Simon Repp
Pingback: Activitypub like from Juice Ya Jaba
Pingback: Activitypub announce from pilou
Thanks for this write-up James – it is definitely helping fill in some of the questions I was left with after reading the Synapse “workers” docs.
I have one question regarding your systemd setup. You’ve kindly provided the files defining the new matrix-synapse.target and the .service files for the workers. I was wondering if you changed anything about the out-of- the-box matrix-synapse.service file for the main process in order to link it to the new matrix-synapse.target?