Ping Identity Service Interruption (DNS DDoS)
Incident Report for Ping Identity

Incident Summary

Ping Identity's main DNS provider, Dyn, came under a large DDoS attack effectively preventing DNS resolution for Ping customers in multiple geographic regions.

Customer Impact

Customers in regions where Dyn was under attack would be unable to resolve Ping's DNS domains. This resulted in an outage of all Ping services for customers in impacted regions.

Incident Timeline:

October 21th, 2016 (all times MDT)

05:16 - Pingdom pages on EU PingOne Dock being down and SRE begins investigation
05:25 - Ping Identity Administration Portal moves to Degraded Performance
05:28 - Synthetic transactions failure in US East Coast region pages the on-call engineer
      - licensing.pingidentity.com moves to Degraded
      - documentation.pingidentity.com moves to Degraded
06:05 - First public reports of Dyn DNS being targeted by a DDoS attack
06:07 - First status posting made by SRE
06:42 - SRE begins preparation for move to new DNS provider
07:00 - SRE decides not to migrate to a secondary DNS provider. Given the limited scope of the attack, SRE decided the risk of migrating outweighed the benefits
07:20 - DDoS attack stops and services resume to normal
07:37 - Status page updated as resolved
09:53 - Second DDoS attack launched against Dyn    
10:19 - SRE begins configuring zones on a secondary DNS provider
10:22 - New status page posting is made
10:31 - Status page updated to reflect the intermittent nature of the outage and attack
10:42 - SRE begins increasing TTLs on DNS to 1 day as an attempt to mitigate failed lookups
11:06 - Status page updated to reflect mitigation efforts
11:36 - TTL increase complete
12:04 - Status update made to reflect TTL mitigation. The status page is erroneously updated to report the attack is subsiding.
12:54 - Status Page updated to reflect the continued attack. Impact is now considered to include all services
13:10 - SRE makes the decision to migrate off of Dyn DNS.
13:33 - Migration of pingone.com, pingone.eu and pingidentity.eu complete
14:28 - Migration of pingidentity.com to alternate DNS provider complete
14:34 - Pingdom reports that all endpoints are back up
14:50 - Migration of pingone.com.au and pingidentity.com.au complete
16:47 - Dyn reports all DDoS attacks are mitigated

Affected Services

PingOne Services

North America Critical Path
Europe Critical Path
Australia Critical Path
Admin Portal Monitor
Directory API
Single Sign-on
Single Sign-On (PingOne SSO for SaaS Apps/APS)
Administration Portal
OAuth Service
Administration API
AD Connect & Routing Service
PingOne dock (North America)
PingOne dock (Europe)
PingOne dock (Australia)
Directory Login (North America)
Directory Login (Europe)
Directory Login (Australia)
Directory API (North America)
Directory API (Europe)
Directory API (Australia)
Office365 Service (North America)
Office365 Service (Europe)
Office365 Service (Australia)
SCIM Provisioning (North America)
SCIM Provisioning (Europe)
SCIM Provisioning (Australia)

PingID Services

PingID App
PingID Authenticator (North America)
PingID Authenticator (Europe)
PingID Authenticator (Australia)
PingID Server (North America)
PingID Server (Europe)
PingID Server (Australia)

Pingidentity.com Website

Pingidentity.com Critical Path
www.pingidentity.com
docs.pingidentity.com
documentation.pingidentity.com
licensing.pingidentity.com
updates.pingidentity.com

Resolution

  • Resolution was to migrate our DNS to a secondary provider. This migration resulted in successful name resolution.

Detection

  • The issue was detected via our external monitoring. Ping detected failed DNS resolution from Europe. Shortly after, East coast resolution failures began.

Action Items

  • Engineer our DNS infrastructure to make use of at least two providers (Resolved)
  • Review Status Page posting automation to ensure customers have accurate and timely updates during a similar type (DNS) interruption.

Supporting Information

Dyn's RCA:

https://www.dynstatus.com/incidents/5r9mppc1kb77

Report Team: SRE-Production-Services

Posted 10 months ago. Oct 26, 2016 - 15:57 MDT

Resolved
Our external monitoring systems are showing showing no issues with resolving Ping endpoints hosted on our new provider. We will continue to monitor the DNS Attack situation but no longer perceive any more impact to Ping services.
Posted 10 months ago. Oct 21, 2016 - 15:55 MDT
Monitoring
SRE has migrated the pingidentity.com zone as well to a new DNS provider. We are seeing successful name resolution and our monitoring systems are showing recovery. Customers who are still experiencing issues resolving Ping endpoints can try clearing their DNS caches.

SRE will continue to monitor the situation
Posted 10 months ago. Oct 21, 2016 - 14:34 MDT
Identified
We've migrated the pingone.com, pingone.eu, and pingidentity.eu DNS zones to an alternate DNS provider. Resolution of these zones should propagate through DNS infrastructure in the next hour. SRE is continuing to work though other impacted zones.
Posted 10 months ago. Oct 21, 2016 - 13:37 MDT
Investigating
At this time, Ping's DNS provider continues to be under attack. Ping's SRE team is continuing to monitor the situation and investigate our own mitigations.
Posted 10 months ago. Oct 21, 2016 - 12:54 MDT
Monitoring
The attack on DNS seems to have subsided for the time being but SRE will continue to monitor the situation. We've increased our DNS TTLs for now so a recurrence of the attack should have less of an impact on Ping service endpoints.
Posted 10 months ago. Oct 21, 2016 - 12:02 MDT
Identified
We are updating the Time To Live (TTL) of many of our service endpoints in an attempt to mitigate the impact of the attack on our DNS provider.
Posted 10 months ago. Oct 21, 2016 - 11:06 MDT
Update
Our Site Reliability team has been made aware of a return of the previous DNS resolution failures affecting several geographical locations in both the United States as well as Europe and Australia. While this issue does not impact all customers, many may see issues loading PingOne services, including the Dock, SSO, and PingID. This issue also impacts our website, pingidentity.com.

The issue is due to a DDoS attack on an upstream managed DNS provider. We are continuing to monitor the status of their outage and will provide updates as we are able.
Posted 10 months ago. Oct 21, 2016 - 10:31 MDT
Update
We are seeing a return of issues related to our DNS provider impacting multiple services.
Posted 10 months ago. Oct 21, 2016 - 10:31 MDT
Investigating
Monitoring systems have detected an issue with PingOne's global administration portal (https://admin.pingone.com). The Site Reliability Engineering team has been notified and is currently working the issue to resolution. Site Reliability will update this message when the incident has been identified. Automated monitoring systems will update affected components and will resolve operational status as systems recover.

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted 10 months ago. Oct 21, 2016 - 10:22 MDT
This incident affected: PingOne Services (North America Critical Path, Europe Critical Path, Australia Critical Path, Admin Portal Monitor, Directory API, Administration Portal) and Pingidentity.com Website (www.pingidentity.com, documentation.pingidentity.com, licensing.pingidentity.com).