Incident Summary
PingOne SCIM Provisioning Service experienced issues which caused delays in Directory user and group provisioning.
Customer Impacts
SCIM Provisioning Impact:
- Provisioning Service was unavailable during the event. Users may have seen slower response time before or after the event.
- To the best of our knowledge, there were no customer reports logged and no references to customer issues on the support slack channel.
Directory Impact:
- The potential customer impact from PingOne Directory production degradation was any account admins who tried to provision users and groups between Sept 17 14:58 MDT and Sept 18 10:15 MDT would need to wait for 8-10 hours for their users/groups to be provisioned properly.
Incident Timeline
September 17th, 2017 (all times UTC)
- 20:08 - Low severity alert for queued messages for Directory API Back End Provisioning.
- 20:26 - High severity alert for queued messages and SRE on-call was paged.
- 20:58 - Pingdom alert for SCIM down.
- 21:43 - Initial Status Page posted.
- 23:43 - Secondary MongoDB node flipped to Primary.
- 23:43 - Monitoring systems reported issue resolved.
- 23:43 - Issue identified and Status Page update.
September 18th, 2017 (all times UTC)
- 01:20 - Monitoring shows queued messages slowly decreasing.
- 01:51 - Status Page posted indicating issue resolved. Increased number of message consumers on Directory Back End provisioning service.
Affected Services
- SCIM Provisioning (North America)
- Directory Provisioning
Resolution
- Site Reliability Engineering updated remaining MongoDB node from Secondary to Primary (which re-enabled write access).
- Additional nodes and disk space were added to the cluster to handle increased traffic.
- An additional Directory API Backend cluster was deployed to help consume queued messages.
- Engineering deployed a new version of Directory API Backend to handle new traffic, while the original cluster handled the backlog of messages.
Ping Action Items
- Update MongoDB AMI to use a larger instance with more disk space.
- Update SCIM Provisioning Service monitoring to return additional MongoDB heartbeat information.