Directory API Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

On June 18th, 2019 beginning at 17:37 UTC, customers experienced the inability to login to the PingOne Directory services (login.pingone.com and directory-api.pingone.com). Services were fully recovered by 18:12 UTC after a database restoration.

At the time of the incident, there were planned maintenance activities on a downstream database cluster. These activities triggered an edge case scenario where AWS terminated all instances in the cluster in a specific AWS region.

Incident Timeline

June 18, 2019 (all times in UTC)

  • 17:37 - Underlying branding database nodes are terminated by AWS.
  • 17:40 - New database instances created and begin replicating data from other regions.
  • 18:00 - SRE performs rolling restart of application servers to point to the new database nodes.
  • 18:12 - Services recovered.

Affected Services

  • Directory Login (.com)
  • Directory API (.com)

Resolution

Service restoration occurred after the new database nodes were created and data was recovered. There was no data lost during this incident.

Ping Action Items

  • Remove branding services as critical dependency of Directory login services.
  • Ensure instance deletion protection is enabled on all branding database instances to resolve this AWS termination edge case.
Posted Jun 20, 2019 - 13:10 UTC

Resolved
This incident has been resolved.
Posted Jun 18, 2019 - 18:13 UTC
Identified
The issue has been identified and a fix is being deployed. ETA for recovery is 15 minutes.
Posted Jun 18, 2019 - 17:46 UTC
Update
We are continuing to investigate this issue.
Posted Jun 18, 2019 - 17:44 UTC
Investigating
Monitoring systems have detected an issue with our Directory API. The Site Reliability Engineering team has been notified and is currently working the issue. Site Reliability will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact Ping Identity Technical Support by opening a case through The Community/Support site (https://www.pingidentity.com/en/account/sign-on.html), or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted Jun 18, 2019 - 17:41 UTC
This incident affected: PingOne Services (Directory API - North America (.com), Directory API - Europe (.eu), Directory API - Australia (.com.au)).