๐ The issue has been resolved and we are back to normal.
The incident is now fully resolved, and we won't need to schedule a maintenance window regarding the DB scale up. Impact of downtime: * period of 2 hours with higher latencies * average of 10% of requests were timing out. * /classify was most hit with 80% of requests failing
Resolved
๐ The issue has been resolved and we are back to normal.
The incident is now fully resolved, and we won't need to schedule a maintenance window regarding the DB scale up. Impact of downtime: * period of 2 hours with higher latencies * average of 10% of requests were timing out. * /classify was most hit with 80% of requests failing
Monitoring
๐ We are monitoring to make sure the incident has been fully resolved.
A fix has been implemented, error rates & latency response times have been resolved since 2:10 PM.
Identified
๐ ๏ธ We have identified the root cause of the incident, and are working diligently to fix.
We have identified an issue with the database related to increased pressure on the system. A subset of requests experienced high latency during a window from 12:05PM. We have root caused and are deploying mitigating issues until we can schedule a bigger maintenance window for the fix.