On Wednesday, September 16 at 13:57 CEST we deployed a configuration change which changed the way request volume limits are evaluated. As a result of that, small number of users were throttled when working in Commander (3% of all requests had been throttled).
On Thursday, September 17 at 04:55 CEST we were notified about the issue by the Customer Care team. We identified the configuration change as the cause and deployed a fix at 07:03, reverting the configuration change.
1. We only estimated the impact of the configuration change, without looking more closely at the actual request volume patterns.
2. We did not have alerts in place to notify us about excessive throttling of web requests.
3. Users were not clearly told by the application that they are throttled. Instead, a generic error message was served.
1. Follow a more rigorous, data-driven procedure when deploying configuration changes with potentially a large impact to our customers.
2. Set up alerts on excessive throttling.
3. Let the user clearly know the request has been throttled.