Admin Portal Service Interruption
Incident Report for Ping Identity

Incident Summary

On April 5th, 2018 beginning at 01:50 UTC, the PingOne administrative web-portal ( went offline for a period of 48 minutes. This was caused by three independent defects in internal tools, all of which were required to cause the outage. These tools are responsible for deploying new application servers to production and routing traffic in a safe 'canary release' strategy where code is automatically monitored for any errors. The system failure resulted in a condition where load balancers were set to route traffic to servers which did not exist.

Customer Impacts

On April 5th, 2018 beginning at 01:50 UTC, customers experienced the inability to login to the PingOne administrative web portal ( Full services and performance were restored to all customers at 02:38 UTC.

Incident Timeline

April 5th, 2018 (all times in UTC) * 01:50 - Monitoring systems detect issues with On call SRE notified.

  • 01:57 - On call SRE escalates to Incident Commander.

  • 02:05 - Investigation shows no servers capable of carrying web portal traffic.

  • 02:30 - Services begin recovering after redeploy of the application.

  • 02:38 - Services fully recovered.

  • 02:40 - Automated deployment process blocked as root cause investigation continues.

April 6th, 2018 (all times in UTC) * 19:38 - Automated deployment process re-opened.

Affected Services

PingOne Admin Portal (North America)


Services restored after application was re-deployed to production.

Ping Action Items

  • Address defect in build pipeline allowing future released to be deactivated. RESOLVED

  • Address defect in build pipeline allowing releases to be stuck activating. RESOLVED

Posted 12 months ago. Apr 10, 2018 - 16:03 UTC

This incident has been resolved.
Posted 12 months ago. Apr 05, 2018 - 02:38 UTC
Our engineers have identified the issue and are working to restore service. Next update in 15 minutes.
Posted 12 months ago. Apr 05, 2018 - 02:26 UTC
Monitoring systems have detected an issue with PingOne's global administration portal ( The Site Reliability Engineering team has been notified and is currently working the issue to resolution. Site Reliability will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact, or follow this incident on for real-time service updates.
Posted 12 months ago. Apr 05, 2018 - 01:50 UTC