Coveralls is in Read Only mode while we work on updating the system. Sorry for the inconvenience.

VIGILANCE! Check this page any time you notice a problem with coveralls

500 Errors

Incident Report for Coveralls

Postmortem

Reason for the incident:

We failed to upgrade to a new RDS CA and update complementary SSL certs on all clients by the expiration date for the previous CA. We misunderstood the potential impact of not making this change in time and planned to make the changes as housekeeping with normal to low priority. In doing so, we failed to prioritize the ticket and make the changes necessary in time to avoid this incident.

Reason for the response time:

While trying to implement the fix, we confronted very verbose documentation that made it hard to understand how to apply the fix in our context, especially while under pressure. When we did identify the correct procedure for our context, and implemented a fix, for some reason we could not get our database clients to establish a connection in production with a freshly downloaded cert that worked in tests from local machines. In the end, we manually copied the contents of the cert into an existing file before our app recognized it. We still don’t know why, but the confusion surrounding this added at least an extra hour to our response time as we cycled through other applicable certs and recovered from failed deployments.

How to avoid the incident in the future:

We will consider all notices from infrastructure providers as requiring review by multiple stakeholders at different levels, and will apply an already established procedure for handling priority infrastructure upgrades in a timely manner, as scheduled events, with review and sign-off.

Posted Aug 22, 2024 - 17:34 PDT

Resolved

This incident has been resolved.

Posted Aug 22, 2024 - 15:13 PDT

Update

We are resolving a remaining issue.

Posted Aug 22, 2024 - 14:45 PDT

Update

We will come out of maintenance mode as soon as we confirm the fix.

Posted Aug 22, 2024 - 14:13 PDT

Update

We are continuing to monitor for any further issues.

Posted Aug 22, 2024 - 14:11 PDT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Aug 22, 2024 - 14:11 PDT

Identified

The issue has been identified and a fix is being implemented.

Posted Aug 22, 2024 - 13:08 PDT

Investigating

We have received reports of 500 errors received as responses from the Coveralls API upon coverage report uploads. We are investigating.

Posted Aug 22, 2024 - 12:56 PDT

This incident affected: Coveralls.io Web and Coveralls.io API.