System Status

System Status

Welcome to Ping Identity's system status site.

System Uptime

System Uptime

System uptime in the past 90 days.

Past Incidents Past Incidents

Welcome to Ping Identity's system status site.

SCIM Provisioning Service Interruption
Incident Report for Ping Identity
Postmortem

Incident Summary

PingOne SCIM Provisioning Service experienced issues which caused delays in Directory user and group provisioning.

Customer Impacts

SCIM Provisioning Impact:

  • Provisioning Service was unavailable during the event. Users may have seen slower response time before or after the event.
  • To the best of our knowledge, there were no customer reports logged and no references to customer issues on the support slack channel.

Directory Impact:

  • The potential customer impact from PingOne Directory production degradation was any account admins who tried to provision users and groups between Sept 17 14:58 MDT and Sept 18 10:15 MDT would need to wait for 8-10 hours for their users/groups to be provisioned properly.

Incident Timeline

September 17th, 2017 (all times UTC)

  • 20:08 - Low severity alert for queued messages for Directory API Back End Provisioning.
  • 20:26 - High severity alert for queued messages and SRE on-call was paged.
  • 20:58 - Pingdom alert for SCIM down.
  • 21:43 - Initial Status Page posted.
  • 23:43 - Secondary MongoDB node flipped to Primary.
  • 23:43 - Monitoring systems reported issue resolved.
  • 23:43 - Issue identified and Status Page update.

September 18th, 2017 (all times UTC)

  • 01:20 - Monitoring shows queued messages slowly decreasing.
  • 01:51 - Status Page posted indicating issue resolved. Increased number of message consumers on Directory Back End provisioning service.

Affected Services

  • SCIM Provisioning (North America)
  • Directory Provisioning

Resolution

  • Site Reliability Engineering updated remaining MongoDB node from Secondary to Primary (which re-enabled write access).
  • Additional nodes and disk space were added to the cluster to handle increased traffic.
  • An additional Directory API Backend cluster was deployed to help consume queued messages.
  • Engineering deployed a new version of Directory API Backend to handle new traffic, while the original cluster handled the backlog of messages.

Ping Action Items

  • Update MongoDB AMI to use a larger instance with more disk space.
  • Update SCIM Provisioning Service monitoring to return additional MongoDB heartbeat information.
Posted Sep 22, 2017 - 21:33 UTC

Resolved
This issue has been resolved.
Posted Sep 18, 2017 - 01:51 UTC
Monitoring
The service is restored and the Site Reliability team is monitoring the system.
Posted Sep 18, 2017 - 01:20 UTC
Update
The service restoration is still in progress. Additional updates will be provided in 30 minutes.
Posted Sep 18, 2017 - 00:46 UTC
Update
We are still working on restoring the service. Additional updates will be provided in 30 minutes.
Posted Sep 18, 2017 - 00:15 UTC
Identified
A solution has been identified and teams are working on restoring the service. No ETA at this time. Additional updates will be provided in 30 minutes.
Posted Sep 17, 2017 - 23:43 UTC
Update
We are conducting additional analysis on the issue to determine the best solution. Another update will be provided in 30 minutes.
Posted Sep 17, 2017 - 23:14 UTC
Update
The Site Reliability Engineering team is investigating a solution to restore the service. An update will be provided in 30 minutes.
Posted Sep 17, 2017 - 22:39 UTC
Investigating
Monitoring systems have detected an issue with SCIM Provisioning Service. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted Sep 17, 2017 - 21:43 UTC
This incident affected: PingOne for Enterprise - United States (.com services) (SCIM Provisioning).