Delays in User Provisioning
Incident Report for Ping Identity

Incident Summary

The provisioning service was impacted by an extremely high rate of Directory API usage which caused the PingOne Directory cloud service to queue and hold up requests.

Customer Impact

Customers using PingOne Directory and performing administrative functions such as creating or modifying users and groups during this period would have seen a long wait for changes to take effect.

Incident Timeline - March 20, 2017 (MDT)

  • 1610 - First internal reports of increased load for provisioning.

  • 1630 - Operations and Directory team start investigation.

  • 1713 - First reports of Customers seeing updates not showing up for extended periods.

  • 1731 - Internal teams determined that there were several large concurrent updates from multiple customers.

  • 1805 - Status page updated to note the degraded service.

  • 1830 - Added 4 servers to attempt faster processing.

  • 1858 - Issue is determined to be large queue sizes. Operations team sees it start to recover.

  • 1900 - Removed extra nodes; processing did not increase.

  • 1903 - Operations team monitoring to ensure system behavior is returning to normal and not degrading.

  • 2005 - Monitoring indicates everything has returned to normal, no further delays detected.

  • 2032 - Status posting updated to indicate degraded status has cleared.

Affected Services

PingOne Services

  • Provisioning of PingOne Directory users and groups.

Resolution

  • Issue resolved itself when the queues naturally caught up given enough time.

Ping Action Items

  • Prioritize non-batch traffic to remove delays caused by batch jobs that are less sensitive to the turn-around time.

  • Adjust alerting levels to allow for an earlier detection of a queue increase.

Posted 7 months ago. Mar 30, 2017 - 11:13 MDT

Resolved
Delays with provisioning of cloud directory users and groups has resolved. The Directory service is back to normal.
Posted 7 months ago. Mar 20, 2017 - 20:33 MDT
Investigating
SRE has detected an issue causing delays with provisioning of cloud directory users and groups. SRE is currently investigating and will post an update when we find more.
Posted 7 months ago. Mar 20, 2017 - 18:06 MDT
This incident affected: PingOne Services (Directory API (North America)).