Eric Adams - Director of Operations
From 20:02 UTC - 20:21 UTC (19 minutes) the US Kimono Platform was unavailable due to an issue with one of our databases.
17 Scheduled collections in the 19:00 UTC scheduled hour failed to complete and were left in a non-terminal state. These had to be manually restarted by Kimono personnel. AFter restart, they completed successfully. The Kimono Dashboard and all Integrations were unavailable for the entirety of the outage. Due to the durable, asynchronous messaging architecture of the Kimono Platform most integrations simply picked up where they left off and were not impacted other than experiencing a 19 minute delay.
The write-ahead logs for our high-availability failover database increased dramatically above the allowed threshold causing the database to go into a recovery mode to prevent any data loss.
Our provider increased the limits for the write-ahead logs of the affected database and restarted the database.
At 20:02 UTC, Kimono received an alert from our provider that one of our databases was not longer communicating with their monitoring tools. Immediately after that our Kimono alerts notified us that our Dashboard and Integrations were unavailable.
|20:02||Received notification from provider that they had been unable to communicate with one of our databases|
|20:02||Kimono Alerts came in that our database was unable to accept communications|
|20:03||Began working with Provider to remedy situation|
|20:10||Provider had identified the issue and instructed Kimono Ops team on next steps.|
|20:21||All Kimono processes were restored. The outage was over.|