We experienced multiple intermittent errors over the past several days before we were able to identify the true root cause and resolve the issue.
Root Cause
The errors were caused by a single outlier repository generating extremely high-volume requests (750–1,800+ coverage report uploads per build). Combined with the default “sticky request” behavior in Passenger Enterprise (which routes repeat requests from the same IP to the same HTTP server), this overwhelmed individual servers. Once a server’s request queue was exhausted, subsequent requests returned a 503
error with the message: “This website is under heavy load.”
Although each server was able to process individual requests within normal timeframes, the concentrated traffic volume from a single repo and source IP could not be evenly distributed across servers. This led to repeated saturation of request queues and customer-visible errors.
Solutions Implemented
Next Steps
Closing
We appreciate your patience as we worked through this issue. These changes are intended to permanently guard against similar incidents going forward. If you encounter unexpected errors, please contact us at support@coveralls.io.
Related incidents