Resolved -
We’re now closing this incident, several hours after restoring full system stability. Over the past 4 hours, we’ve continued to monitor key requests and queries closely. During that time, we identified a number of previously long-running queries that we’ve either:
- Optimized immediately, based on new platform characteristics; or
- Added to a short-term optimization backlog for tuning over the next few days.
These efforts are part of our ongoing work to adapt all app queries to the updated infrastructure context.
Apr 24, 16:06 PDT
Update -
The site remains fully operational, and performance for all new builds is normal. We’re continuing to monitor request and query times closely to identify any long-running queries that may have contributed to recent job processing delays or latency spikes.
Apr 24, 13:57 PDT
Update -
Performance has been restored to standard, and site is fully operational, but we will continue to clear any previously blocked (or retry) jobs we discover in background job queues and monitor performance stats as they clear.
Apr 24, 12:35 PDT
Update -
Monitoring for further issues. Performance for new builds is normal. Waiting for dequeue metrics to fall below 50% normal before we lift "degraded performance" rating.
Apr 24, 11:01 PDT
Update -
We believe the issue is resolved. We are scaling infrastructure to clear any delayed background jobs, and monitoring to ensure latency stays within normal range.
Apr 24, 10:25 PDT
Update -
We are continuing to monitor for any further issues.
Apr 24, 09:46 PDT
Monitoring -
A fix has been implemented and we are monitoring the results.
Apr 24, 09:15 PDT
Identified -
The issue has been identified and a fix is being implemented.
Apr 24, 08:54 PDT
Investigating -
We are investigating elevated latency in our background jobs system. Some users have also reported receiving Timeout errors while trying to load web pages.
Apr 24, 08:35 PDT