Coveralls is in Read Only mode while we work on updating the system. Sorry for the inconvenience.

VIGILANCE! Check this page any time you notice a problem with coveralls

Performance slowdowns

Incident Report for Coveralls

Resolved

This incident has been resolved. The associated infrastructure upgrade is complete. Performance should return to normal immediately for most projects, with the possibility of some projects with jobs in backed up queues taking several hours to fully clear. Please email us at support@coveralls.io if you're still experiencing performance issues with a particular project.

Posted Aug 14, 2020 - 16:03 PDT

Update

The infrastructure upgrade process has begun. It should be complete in several hours. No planned downtime.

Posted Aug 14, 2020 - 13:15 PDT

Update

We are monitoring and addressing performance slowdowns as they occur.

Posted Aug 13, 2020 - 15:54 PDT

Update

We are still trying to complete an infrastructure upgrade (without downtime) but are still blocked by an issue at AWS. We continue in this attempt with hopes of resolving as soon as possible. In the meantime, we are monitoring and addressing performance slowdowns as they occur on a case-by-case basis.

Posted Aug 12, 2020 - 15:50 PDT

Update

Apologies to those customers still experiencing slow builds. Even after coordinating with AWS, we were thwarted from completing an infrastructure upgrade again last night by limited inventory on our instance type in our regions. We are working with Amazon again today and plan another attempt at our upgrade tonight, 12-3AM PST. Check back here for updates.

Posted Aug 11, 2020 - 12:30 PDT

Update

We are monitoring and addressing performance slowdowns as they occur.

Posted Aug 10, 2020 - 17:53 PDT

Monitoring

For the last week to ten (10) days, we've had reports of slow builds and lags in our data processing infrastructure that caused temporary inaccuracies in our reporting tools, sometimes taking hours to resolve. For affected projects, no data was lost and both reports and notifications should have eventually caught up.

On initial investigation, we came to believe the effect was limited to larger projects, which use a dedicated segment of our infrastructure, and, while we failed to find a root cause, we started monitoring the issue to address any slowdowns and clear them as quickly as possible.

In recent days, however, we've seen performance slowdowns affect general web use, and appear in metrics related to RDS CPU. New highs in builds/day and new builds/min, convince us that we're experiencing another wave of growth that requires us to increase our infrastructure resources.

We planned an infrastructure upgrade for this past weekend (with no downtime), but were unable to complete it due to an unexpected capacity limit on our instance class at AmazonRDS:

> Service: AmazonRDS;
> Status Code: 400;
> Error Code: InsufficientDBInstanceCapacity;
> Request ID: b5935b39-1cff-4f02-8dc0-c0a9b9cfe470

We have since resolved the issue with Amazon and are planning another upgrade overnight tonight, between 12-3AM PST. Again, with no planned downtime for users.

Note:
It's come to our attention during this time that the current SYSTEM METRICS reports on our status page, which are a measure of general performance, are not sufficient for users with larger projects, whose performance may diverge greatly from mean. Therefore, we're committed to adding additional reports for this class of project soon after we complete this planned upgrade.

Posted Aug 10, 2020 - 17:52 PDT

This incident affected: Coveralls.io Web and Coveralls.io API.