System Status

System Status

Welcome to Ping Identity's system status site.

System Uptime

System Uptime

System uptime in the past 90 days.

Past Incidents Past Incidents

Welcome to Ping Identity's system status site.

PingID Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

PingID stopped functioning correctly which prevented users from being able perform second factor authentication. The root cause of the outage was a data replication failure in the session management system. An unusual circumstance occurred where a failed node was replaced then rebalancing of session data stalled the entire system. Mitigation actions were taken and the system functionality was restored (see below).

Customer Impact

Customers were not able to utilize PingID during the duration of the outage.

Incident Timeline - Apr 17, 2017 (MDT)

  • 1745 - PingID errors reported
  • 1750 - Operations Team begins investigation
  • 1751 - System monitoring indicates spike in HttpServerError 500
  • 1753 - Web server stack trace shows problem connecting to the session management system
  • 1756 - Internal escalation process initiated
  • 1806 - Synthetic testing validates problem
  • 1815 - Restarting web services reduces error rate
  • 1817 - Load balancing mechanism marks all nodes as down
  • 1822 - Heartbeat for all nodes return to normal
  • 1832 - Status monitoring page updated
  • 1836 - Service is restored
  • 1851 - Status monitoring page updated

Affected Services

  • PingID Services
  • PingID App
  • PingID Authenticator
  • PingID Server

Resolution

Restarting the web services allowed the stateless session management system to fully recover.

Ping Action Items

  • Improve error monitoring of synthetic tests to detect this type of failure sooner.
  • Improve status update process and method.
  • Change the PingID session management system implementation to be more resilient.
Posted Apr 20, 2017 - 22:23 UTC

Resolved
This incident has been resolved. PingID service in all regions are back to normal.
Posted Apr 18, 2017 - 00:51 UTC
Investigating
Monitoring systems have detected an issue with the PingID service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted Apr 18, 2017 - 00:32 UTC
This incident affected: PingOne for Enterprise - Global (Administration API, AD Connect & Routing Service, Administration Portal, OAuth Configuration Service, Single Sign-on), PingOne for Enterprise - United States (.com services) (Directory API, Directory Login, Office365 Service, PingOne Dock, SCIM Provisioning), PingOne for Enterprise - Europe (.eu services) (Directory API, Directory Login, Office365 Service, PingOne Dock, SCIM Provisioning), PingOne for Enterprise - Australia (.com.au services) (Directory API, Directory Login, Office365 Service, PingOne Dock, SCIM Provisioning), PingID - Europe (.eu services) (PingID Authenticator, PingID Server), PingID - Australia (.com.au services) (PingID Authenticator, PingID Server), PingID - United States (.com services) (PingID Authenticator, PingID Server), PingID Global (PingID App), and Twilio (SMS).