Beginning on May 29th, 2018, recently added users may have experienced a delay between creation and the ability to authenticate to the PingOne dock. This was due to a batch migration process impacting lag between our master and replica database nodes. New accounts were provisioned in the master database, but due to the time it took to replicate the user to the read-only replica lookup of the user failed.
May 29, 2018
May 31, 2018
13:42 - Issue escalated to Incident Commander. SRE identified replica lag as likely cause of the issue.
14:29 - SRE begins repointing applications to master database.
15:55 - Services repointed. Manual verification confirms issue resolved.
19:54 - Investigation determines batch migration process is responsible for lag. Process stopped.
June 1, 2018
02:21 - Read-only database in sync with master. No lag reported.
16:25 - SRE repoints applications back to read-only database.
Service restoration occured after application was pointed to master database node.
Increase severity of replica lag alert to page on call SRE.