Coveralls is in Read Only mode while we work on updating the system. Sorry for the inconvenience.

VIGILANCE! Check this page any time you notice a problem with coveralls

Overnight issue with SOURCE FILES table

Incident Report for Coveralls

Postmortem

The root cause of the the backed up Web and API requests was slow reads against the source_files table in our database (our largest table), themselves caused by a long running database maintenance task.

While that task (“repacking” the source_files table) had been planned for, and started, over the previous weekend, it unexpectedly proceeded well into the week. After seeing normal site behavior Mon and Tue, we decided to let the procedure continue because of its importance to general database performance, but we believe that when we hit our weekly usage peak (Wed-Thu), even as the maintenance task was nearly complete, the database was overwhelmed with read requests against the table while the maintenance activity held transaction locks against relevant rows.

In addition to the temporary action of restricting reads from our Web app, we have also curtailed all maintenance activity against the table until we can guarantee tasks will complete within normal maintenance windows (late evenings and weekends PDT).

We’ve also identified a longer term solution that involves a different approach to partitioning tables that will take 1-2 weeks and is planned for later this month.

Posted Dec 08, 2022 - 08:21 PST

Resolved

We had an issue between late afternoon yesterday, WED, DEC 7 PDT and this morning, THU, DEC 8 PDT affecting all builds.

For some time, both the Web app and Coveralls API returned 504 Timeouts or 503 "This website is under heavy load" errors, which originated from our load balancers.

The root cause of the the backed up requests was slow reads against the source_files table in our database themselves caused by a long running database maintenance task.

To resolve---to reduce the number of reads against that table, to continue processing incoming Web requests and clear background jobs relying on such reads---we made the SOURCE FILES table unavailable (in all build and job pages) overnight with the message "WE'RE SORRY, THIS FUNCTIONALITY IS TEMPORARILY UNAVAILABLE."

The 504 Timeouts and 503 Heavy Load errors were mostly resolved by 6p PDT, and backed up background jobs completed early this morning. The restriction on the SOURCE FILES table was lifted around 7a PDT.

We apologize for the inconvenience to any users who tried to access SOURCE FILES overnight.

UPDATE: We've received reports that the Coveralls API was also affected during the early periods of this incident. In parallel with Web requests, many API requests were rejected with the 503 Heavy Load error.

Posted Dec 08, 2022 - 07:36 PST

This incident affected: Coveralls.io Web.