PingID Service Interruption (.com)
Incident Report for Ping Identity
Postmortem

Incident Summary

On May 25th, 2018 beginning at 09:27 UTC, customers experienced the inability to authenticate with PingID MFA (authenticator.pingone.com) due to an issue with the deployment pipeline. Previous versions of the PingID application servers that were incorrectly carrying traffic were automatically removed from service. A restart of the load balancers forced traffic to the new servers.

An edge case was discovered where the deployment pipeline and the load balancers had different state information. The load balancers marked the current version as active, but directed traffic to a previous deployment. The deployment pipeline aged out the previous version (as designed), but in this case the previous deployment was incorrectly carrying live traffic. A restart of the load balancer resolved the issue and corrected the systems state.

Customer Impacts

On May 25th, 2018 beginning at 09:27 UTC, some customers experienced the inability to authenticate with PingID MFA (authenticator.pingone.com) hosted in our North American data centers. Services began recovering at 09:38 UTC with full restoration at 09:56 UTC.

Customers that had implemented MFA bypass should not have been affected by this incident.

Incident Timeline

May 25, 2018 (all times in UTC)

  • 09:27 - Monitoring systems detect issues with PingID services. SRE and Development notified.
  • 09:38 - Investigation shows problem with load balancer state. Partial service restored. Incident Command process initiated.
  • 09:54 - SRE restarts primary load balancer.
  • 09:55 - PingID IDP services fully restored.

Affected Services

  • PingID Authenticator (.com)
  • PingID Server (.com)

Resolution

  • Service restoration occurred after a restart of the primary load balancer.

Ping Action Items

  • Deploy fix to ensure load balancer properly routes traffic to current servers for this edge case (Completed - 5/25)
  • Add additional alerting for multiple “live” versions for this edge case.
Posted 20 days ago. May 29, 2018 - 15:35 UTC

Resolved
This incident has been resolved.
Posted 24 days ago. May 25, 2018 - 09:55 UTC
Investigating
Monitoring systems have detected an issue with Ping Identity's PingID Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted 24 days ago. May 25, 2018 - 09:31 UTC
This incident affected: PingID Services (PingID Authenticator - North America (.com), PingID Server - North America (.com)).