Degraded SSO performance
Incident Report for Ping Identity

Incident Summary

On December 16, 2016, PingOne SSO for SaaS Apps (formerly Application Provider Services (APS) and some PingOne SSO (formerly CAS Lite) customers experienced an outage Friday afternoon following an update to PingOne's Token Processor Nodes (TPN).

Customer Impact

Following the update to the TPN, attributes that were previously optional were made required resulting in SSO206 error messages to appear for end users. Accounts that did not have the required attributes set experienced SSO206 error messages for four hours.

Incident Timeline - December 16, 2016 (MT)

  • 1219 - Token Processing changes committed to production Fastlane.
  • 1402 - Token Processing changes staged (10% live) in production for canary monitoring (20 minutes long).
  • 1403 - First SSO206 events appear in the PingOne log files (but only 30 errors occur in the full 20 minutes of monitoring which is not sufficient to trigger failure in the build deployment pipeline).
  • 1425 - Token Processing changes activated (100% live) in production after passing all tests and canary monitoring.
  • 1426 - Significantly more SSO206 events appear in the PingOne log files.
  • 1622 - Support receives initial customer calls reporting SSO206 events
  • 1722 - Following triage, Support escalates to SRE and DEV teams.
  • 1745 - Token Processing code change is rolled back. SRE and DEV team observes an immediate drop in SSO206 errors through all production services.

Affected Services

  • PingOne Services - Single Sign-On (PingOne SSO for SaaS Apps/APS)

Resolution

Token Processing update was rolled back.

Ping Action Items

  • Add PingOne “Invited SSO” to system tests - [SSD-3712]
  • Make all system attributes optional - [SSD-3711]
  • Improve response code reporting for improved monitoring capability - [SSD-1904]
Posted 10 months ago. Dec 21, 2016 - 14:57 MST

Resolved
PingOne APS customers affected by this issue should no longer be seeing SSO_206 errors during Single Sign-on events. This incident is resolved.
Posted 10 months ago. Dec 16, 2016 - 18:13 MST
Monitoring
SRE has identified a code push that introduced the errors and have rolled it back. We are no longer seeing errrors and are continuing to monitor the situation.
Posted 10 months ago. Dec 16, 2016 - 18:00 MST
Investigating
SRE has been alerted to an issue with processing SSO for a number of customers and is investigating the root cause.
Posted 10 months ago. Dec 16, 2016 - 17:54 MST
This incident affected: PingOne Services (Single Sign-on).