We are closing the incident after normal performance throughout the day. If you are experiencing issues, including performance issues, let us know at email@example.com.
Posted Sep 21, 2021 - 22:12 PDT
We will keep this incident open for at least another hour as we monitor performance, which has remained steady for the past ~45-min.
Apologies for the inconvenience.
We will share our findings in our postmortem here, but in the meantime a summary for those interested:
AWS/RDS support notified us mid-week last week that we were headed for transaction ID wraparound in a few days, so we decided to put the site in unplanned maintenance mode on Fri, Sep 17 at 5pm US PDT, expecting to be out before Mon, Sep 20. However, as we worked through suggested steps to avoid wraparound we encountered error after error, as well as extremely long-running steps like vacuum/auto-vacuum and reindexing tasks that took days instead of minutes or hours, even in non-concurrent mode. (Several attempts at concurrent mode proved fatal, forcing us to throw the site back into maintenance mode several times through the weekend, extending through Mon.) To improve performance on some of these long-running tasks, we tried upgrading Postgres to 13 (not available) and increasing our instance size (not available from RDS), so these attempts just added time to our efforts.
Posted Sep 21, 2021 - 09:29 PDT
We believe this incident is resolved. We have taken the site out of maintenance mode and are monitoring performance, which has been normal and steady for 15-minutes.
Posted Sep 21, 2021 - 08:46 PDT
Database index has finished rebuilding, but queries to the table are failing. Investigating new errors.
Posted Sep 21, 2021 - 06:48 PDT
We are still monitoring progress on rebuild of a database index.
Posted Sep 20, 2021 - 20:13 PDT
Continuing to monitor progress of database reindex.
Posted Sep 20, 2021 - 11:16 PDT
We are continuing to monitor progress on database reindex.
Posted Sep 20, 2021 - 11:14 PDT
We are continuing to monitor for any further issues.
Posted Sep 20, 2021 - 09:48 PDT
We are currently performing emergency database maintenance. We had completed to 95% by 3am and closed the previous maintenance window this morning when one index failed to rebuild and caused transactions to eventually slow and fail. We have taken the site back into maintenance mode to allow the remaining index to rebuild, and will take it back out of maintenance as soon as possible. We will provide progress updates when available.
Posted Sep 20, 2021 - 09:13 PDT
This incident affected: Coveralls.io Web and Coveralls.io API.