PingID Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

A scheduled PingID database repair encountered an issue on the database cluster, and rendered the database unavailable to the application. Service was restored after a rolling restart of the database cluster was completed and the application nodes were restarted.

Customer Impact

MFA with PingID for customers serviced by our North American data center was slow or unavailable.

Incident Timeline

Aug 15, 2018 (all time in UTC)

  • 08:16 A scheduled maintenance on the PingID database begins
  • 11:32 Automated monitoring alerts NA PingID IDP is down. On call Site Reliability Engineer is notified.
  • 11:36 The database maintenance is paused.
  • 11:55 The Load Balancer of the PingID Authenticator cluster is restarted.
  • 12:05 Rolling restart of database cluster instances initiated.
  • 12:50 Automated monitoring alerts NA PingID IDP is up. Service is restored.

Affected Services

  • PingID Service (North America)

Resolution

Service restoration occurred after all database instances were restarted.

Ping Action Items

  • Improve the procedure for restarting services and triggering the PingID bypass mode.
  • Improve the repair procedure and the run book to limit the execution in non-peak traffic hours.
Posted 3 months ago. Aug 17, 2018 - 15:36 UTC

Resolved
This incident has been resolved.
Posted 3 months ago. Aug 15, 2018 - 13:52 UTC
Monitoring
A fix has been implemented and we are closely monitoring the systems for errors.
Posted 3 months ago. Aug 15, 2018 - 12:54 UTC
Update
The Site Reliability Engineering team is still working the issue. We will provide another update in 30 minutes.
Posted 3 months ago. Aug 15, 2018 - 12:30 UTC
Investigating
Monitoring systems have detected an issue with Ping Identity's PingID Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.
Posted 3 months ago. Aug 15, 2018 - 11:57 UTC
This incident affected: PingID Services (PingID Authenticator - North America (.com), PingID Server - North America (.com)).