At around 00:08 UTC, all of our application servers appeared to fall out of service at the same time. We believed this to be related to TCP socket exhaustion and made some changes to our kernel configuration to help with the problem. We thought about making a more difficult change, but to prioritize our volunteer time, we decided to wait and see if the previous changes would help. However, due to another incident on 2024-09-17, did end up changing our firewall configuration so that they can communicate with the frontend servers over additional internal IP addresses. This increases the amount of sockets available (this was the previously mentioned “difficult change”, although we ended up applying it to a different set of servers than we were originally planning to). Since making this change, we have not seen any similar disruption.