First of all, we would like to apologize for unresponsiveness of Mews on the 24th of October 2018 during night hours (21:40 — 22:05 UTC). We take such moments seriously and always work hard to ensure such and similar problems won't occur in the future.
As we are continuously updating our system in order to deliver new features, we update the database model accordingly during deployments. The last deployment introduced changes that took unexpected amount of database resources and after a while the operation timed-out. But one of the application instances kept retrying this process, due to built-in recovery procedure. That caused the database to be unresponsive for longer period.
We have reverted the change and deployed our application without it.
In order to properly resolve the issue and prevent anything similar from reoccurring, we have introduced several improvements to our processes: