PingOne For Customers Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

On November 12, 2019 beginning at 08:45 UTC, customers experienced the inability to authenticate, manage and register new users with PingOne For Customers in our Frankfurt region due to AWS network issues within an AZ in the eu-central-1 region.
This causes a variety of issues with our underlying infrastructure, which caused failures to our User Directory services.

For AWS incident on Nov 12th visit: https://status.aws.amazon.com/#EU_block (EC2 (Frankfurt))

Customer Impacts

On November 12, 2019 beginning at 08:45 UTC, customers experienced the inability to authenticate, manage and register new users with PingOne For Customers in our Frankfurt region. Services were fully recovered at 12:50 UTC.

Incident Timeline

On November 12, 2019 (all times in UTC)

  • 08:08 - AWS started to have increased network connectivity errors.
  • 08:45 - Monitoring systems detect increased failure rates. Customers start experiencing authentication failures.
  • 10:13 - AWS recovered most of the network connectivity issues, still had some degraded performance.
  • 11:02 - Directory issues identified, SRE started to investigate.
  • 12:15 - AWS declares all Resolved
  • 12:45 - Root cause was Identified as DNS ENI routing issues. Restart was initiated.
  • 12:50 - Error rates return to normal. All services recovered.

Affected Services

  • User Login API - Europe (.eu)
  • User Login - Europe (.eu)

Resolution

Following AWS recovery most of the issues cleared on their own. The DNS server ENI required termination and automated recovery of the node.

Ping Action Items

  • Improve Directory underlying infrastructure monitoring.
  • Improve Directory logging infrastructure.
  • Systems running Directory Server had their DNS misconfigured so it did not failover properly. This has already been corrected.
Posted Nov 14, 2019 - 15:41 UTC

Resolved
This incident has been resolved.
Posted Nov 12, 2019 - 13:41 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 12, 2019 - 13:07 UTC
Update
We are continuing to investigate this issue.
Posted Nov 12, 2019 - 12:44 UTC
Update
We are continuing to investigate this issue.
Posted Nov 12, 2019 - 12:02 UTC
Update
We are continuing to investigate this issue.
Posted Nov 12, 2019 - 11:11 UTC
Update
We are continuing to investigate this issue.
Posted Nov 12, 2019 - 10:01 UTC
Investigating
Monitoring systems have detected an issue with the PingOne For Customers Service. The Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified.
Posted Nov 12, 2019 - 08:59 UTC
This incident affected: PingOne For Customers (User Login API - Europe (.eu), User Login - Europe (.eu)).