We believe this issue has been resolved for now.
The underlying cause still appears to be large spikes in incoming Web traffic from other outlier repositories that we have not yet identified or not yet paused.
Interim Solution:
To reduce the risk of recurrence, we have applied temporary load balancer adjustments that change how requests are distributed, which should lower—if not eliminate—the frequency of 503 “This website is under heavy load” errors.
Permanent Solution:
We are also designing a permanent solution to rate-limit abnormal request patterns. This will require coordination at the policy/SLA level before it can be fully implemented.
In the meantime, we will continue to closely monitor traffic and use targeted load balancer and web server configurations to mitigate the impact of outlier traffic spikes.
More details:
For a more detailed assessment / RCA of this incident and its recent, related incidents, see this postmortem.
Update (Thu, Aug 21):
We have identified a different permanent solution, which does not entail changes to SLA-level details for “outlier repos.” We may still implement such a solution, but our alternative solution should avoid further 503 errors and be implemented in the next 48-72 hrs.