PingOne Dock Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

A change in a customer environment introduced a significant increase in sustained traffic to the PingOne Dock (.com).  Systems became slow to respond and eventually became unavailable at 13:32 UTC. Additional capacity was added to the system and services began recovering at 14:06 UTC.  Service was fully restored for all users by 14:14 UTC.

Customer Impacts

Access to the PingOne Dock (.com) was either slow to respond or unavailable during this time.

Incident Timeline

August 6, 2018 (all time in UTC)

  • 13:32 – Automated monitoring alerts on call Site Reliability Engineer.
  • 13:39 – Issue escalated to Incident Commander.
  • 13:44 – Site Reliability Engineer restarts application servers.
  • 13:54 – After no improvement, Site Reliability Engineer deploys additional server capacity.
  • 14:06 – Services begin recovering.
  • 14:14 – Services returned to normal.

Affected Services

PingOne Dock (.com)

Resolution

Adding additional server capacity resolved the problem until traffic returned to normal.  It was initially believed the issue was caused by a recent feature deployment, which resulted in an unnecessary server restart.

Ping Action Items

  • Improve PingOne Dock implementation to reduce reliance on the database.
  • Research architectural improvements to improve response time and increase resilience.
  • Improve process to more quickly and accurately determine root cause.
Posted 5 days ago. Aug 09, 2018 - 21:03 UTC

Resolved
This incident has been resolved.
Posted 9 days ago. Aug 06, 2018 - 14:28 UTC
Monitoring
We have implemented a fix and are monitoring the systems.
Posted 9 days ago. Aug 06, 2018 - 14:21 UTC
Update
Engineering is still investigating the issue. Users are experiencing slow response times or timeouts when accessing the PingOne Dock.
Posted 9 days ago. Aug 06, 2018 - 14:12 UTC
Investigating
Monitoring systems have detected an issue with PingOne Dock Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted 9 days ago. Aug 06, 2018 - 13:38 UTC
This incident affected: PingOne Services (PingOne dock - North America (.com)).