System Uptime

System uptime in the past 90 days.

Past Incidents Past Incidents

Welcome to Ping Identity's system status site.

PingID Service Interruption

Incident Report for Ping Identity

Postmortem

Incident Summary

On December 8th, 2017 beginning at 04:45 UTC, an underlying database node in the multi-node database cluster became unresponsive. Once the database node was recovered, the application began responding, although very slowly. Once all nodes were responsive, services were restored.

This incident exposed an issue in the configuration between the application servers and the database cluster. When the database node failed, the application assumed the database was not in a consistent state and stopped responding to requests.

Customer Impacts

On December 8, 2017 beginning at 04:45 UTC, customers experienced the inability to authenticate with PingID MFA from our North American data centers (authenticator.pingone.com). Services began recovering at 05:39 UTC at which point some authentication sessions were successful but experienced longer than normal delays. Full services and performance were restored to all customers at 06:25 UTC.

During this incident, the PingID local bypass feature was not properly triggered due to the infrastructure level health check passing.

Incident Timeline

December 08, 2017 (all times in UTC)

04:45 - Monitoring systems detect issues with PingID services. On call SRE notified.
04:55 - Investigation shows a database node was not responsive.
04:59 - On call SRE escalates to Incident Commander. Database SME engaged.
05:15 - Database node successfully recovered and brought back into the cluster.
05:22 - Testing confirms that application is still not responsive. Rolling restart of application servers started.
05:39 - Services begin recovering. Some authentication requests are successful, but users experiencing longer than normal delays.
06:25 - Services fully recovered.

Affected Services

PingID Service (North America)

Resolution

Partial restoration of the PingID services occurred when the failed database node was added back into the multi-node cluster. Full service restoration occurred after all database nodes had fully replicated data sets.

Ping Action Items

Audit all database and application configurations to ensure proper database cluster information. To be completed in December, 2017.
Additional nodes will be added to the PingID database cluster to ensure proper availability and data consistency in the event of an availability zone failure. To be scheduled in December, 2017.
PingID database cluster software will be upgraded and tuned. To be scheduled in December, 2017.

Posted Dec 13, 2017 - 12:40 UTC

Resolved

This incident has been resolved.

Posted Dec 08, 2017 - 06:36 UTC

Monitoring

PingID services have recovered and authentications are successful. The Site Reliability Engineering team is monitoring to ensure the system is stable.

Posted Dec 08, 2017 - 06:25 UTC

Update

PingID services are still in the process of recovering. Successful authentication requests are increasing although the push notifications are significantly delayed.

Posted Dec 08, 2017 - 06:07 UTC

Update

PingID services are in the process of recovering. We are continuing to monitor the recovery process.

Posted Dec 08, 2017 - 05:39 UTC

Identified

The Site Reliability team has identified the issue and is working on recovering services now. Next status update in 15 minutes.

Posted Dec 08, 2017 - 05:22 UTC

Investigating

Monitoring systems have detected an issue with Ping Identity's PingID Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.

Posted Dec 08, 2017 - 05:06 UTC

This incident affected: PingID - United States (.com services) (PingID Authenticator, PingID Server).

System Status

System Status

System Uptime

System Uptime

Past Incidents Past Incidents

PingID Service Interruption

Postmortem

Incident Summary

Customer Impacts

Incident Timeline

Affected Services

Resolution

Ping Action Items

Resolved

Monitoring

Update

Update

Identified

Investigating