On November 12, 2019 beginning at 08:45 UTC, customers experienced the inability to authenticate, manage and register new users with PingOne For Customers in our Frankfurt region due to AWS network issues within an AZ in the eu-central-1 region.
This causes a variety of issues with our underlying infrastructure, which caused failures to our User Directory services.
For AWS incident on Nov 12th visit: https://status.aws.amazon.com/#EU_block (EC2 (Frankfurt))
On November 12, 2019 beginning at 08:45 UTC, customers experienced the inability to authenticate, manage and register new users with PingOne For Customers in our Frankfurt region. Services were fully recovered at 12:50 UTC.
On November 12, 2019 (all times in UTC)
- 08:08 - AWS started to have increased network connectivity errors.
- 08:45 - Monitoring systems detect increased failure rates. Customers start experiencing authentication failures.
- 10:13 - AWS recovered most of the network connectivity issues, still had some degraded performance.
- 11:02 - Directory issues identified, SRE started to investigate.
- 12:15 - AWS declares all Resolved
- 12:45 - Root cause was Identified as DNS ENI routing issues. Restart was initiated.
- 12:50 - Error rates return to normal. All services recovered.
- User Login API - Europe (.eu)
- User Login - Europe (.eu)
Following AWS recovery most of the issues cleared on their own. The DNS server ENI required termination and automated recovery of the node.
Ping Action Items
- Improve Directory underlying infrastructure monitoring.
- Improve Directory logging infrastructure.
- Systems running Directory Server had their DNS misconfigured so it did not failover properly. This has already been corrected.