A scheduled PingID database repair encountered an issue on the database cluster, and rendered the database unavailable to the application. Service was restored after a rolling restart of the database cluster was completed and the application nodes were restarted.
MFA with PingID for customers serviced by our North American data center was slow or unavailable.
Aug 15, 2018 (all time in UTC)
- 08:16 A scheduled maintenance on the PingID database begins
- 11:32 Automated monitoring alerts NA PingID IDP is down. On call Site Reliability Engineer is notified.
- 11:36 The database maintenance is paused.
- 11:55 The Load Balancer of the PingID Authenticator cluster is restarted.
- 12:05 Rolling restart of database cluster instances initiated.
- 12:50 Automated monitoring alerts NA PingID IDP is up. Service is restored.
- PingID Service (North America)
Service restoration occurred after all database instances were restarted.
Ping Action Items
- Improve the procedure for restarting services and triggering the PingID bypass mode.
- Improve the repair procedure and the run book to limit the execution in non-peak traffic hours.