System Status

System Status

Welcome to Ping Identity's system status site.

System Uptime

System Uptime

System uptime in the past 90 days.

Past Incidents Past Incidents

Welcome to Ping Identity's system status site.

PingID Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

PingID stopped functioning correctly which prevented users from being able perform second factor authentication. The root cause of the outage was a data replication failure in the session management system. An unusual circumstance occurred where a failed node was replaced then rebalancing of session data stalled the entire system. Mitigation actions were taken and the system functionality was restored (see below).

Customer Impact

North America customers were not able to utilize PingID during the duration of the outage. Some customers may have experienced a broader outage depending on their specific configuration affecting users outside of North America.

Incident Timeline - May 15, 2017 (MDT)

  • 1500 - Intermittent PingID errors reported
  • 1510 - Operations Team begins investigation
  • 1600 - Internal escalation process initiated
  • 1602 - Status monitoring page updated
  • 1615 - Engaging Development Team for troubleshooting
  • 1630 - PingID service fully down
  • 1700 - Restarting web services reduces error rate
  • 1715 - Status monitoring page updated
  • 1730 - Service is restored and internal validation started
  • 1756 - Status monitoring page updated

Affected Services

  • PingID Services NA
  • PingID App NA
  • PingID Authenticator NA
  • PingID Server NA

Resolution

Restarting the web services allowed the stateless session management system to fully recover.

Ping Action Items

  • Improve error monitoring of synthetic tests to detect this type of failure sooner.
  • Improve status update process and method.
  • Implement changes to the PingID session management system to make it more resilient. ETA end of May.
Posted May 18, 2017 - 20:43 UTC

Resolved
This incident has been resolved. PingID service in North America is back to normal.
Posted May 15, 2017 - 23:56 UTC
Identified
Our Site Reliability Engineer has identified the issue and is working on a fix.
Posted May 15, 2017 - 23:15 UTC
Investigating
Monitoring systems have detected an issue with PingID Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted May 15, 2017 - 22:02 UTC
This incident affected: PingID - United States (.com services) (PingID Authenticator, PingID Server) and PingID Global (PingID App).