On December 8th, 2017 beginning at 04:45 UTC, an underlying database node in the multi-node database cluster became unresponsive. Once the database node was recovered, the application began responding, although very slowly. Once all nodes were responsive, services were restored.
This incident exposed an issue in the configuration between the application servers and the database cluster. When the database node failed, the application assumed the database was not in a consistent state and stopped responding to requests.
On December 8, 2017 beginning at 04:45 UTC, customers experienced the inability to authenticate with PingID MFA from our North American data centers (authenticator.pingone.com). Services began recovering at 05:39 UTC at which point some authentication sessions were successful but experienced longer than normal delays. Full services and performance were restored to all customers at 06:25 UTC.
During this incident, the PingID local bypass feature was not properly triggered due to the infrastructure level health check passing.
December 08, 2017 (all times in UTC)
PingID Service (North America)
Partial restoration of the PingID services occurred when the failed database node was added back into the multi-node cluster. Full service restoration occurred after all database nodes had fully replicated data sets.