The provisioning service was impacted by an extremely high rate of Directory API usage which caused the PingOne Directory cloud service to queue and hold up requests.
Customers using PingOne Directory and performing administrative functions such as creating or modifying users and groups during this period would have seen a long wait for changes to take effect.
1610 - First internal reports of increased load for provisioning.
1630 - Operations and Directory team start investigation.
1713 - First reports of Customers seeing updates not showing up for extended periods.
1731 - Internal teams determined that there were several large concurrent updates from multiple customers.
1805 - Status page updated to note the degraded service.
1830 - Added 4 servers to attempt faster processing.
1858 - Issue is determined to be large queue sizes. Operations team sees it start to recover.
1900 - Removed extra nodes; processing did not increase.
1903 - Operations team monitoring to ensure system behavior is returning to normal and not degrading.
2005 - Monitoring indicates everything has returned to normal, no further delays detected.
2032 - Status posting updated to indicate degraded status has cleared.
Prioritize non-batch traffic to remove delays caused by batch jobs that are less sensitive to the turn-around time.
Adjust alerting levels to allow for an earlier detection of a queue increase.