PingOne Admin Portal Amazon S3 issues.
Incident Report for Ping Identity

Incident Summary

A service outage in Amazon’s AWS region US-EAST-1 affected their Simple Email Service (SES) and Simple Storage Service (S3) services, which our internal applications depend on for certain functions. Ping Site Reliability Engineering (SRE) responded, posting notification of impact to certain services relying on the S3 and SES services, and communicating expected client impact.

Customer Impact

Following an unintended update to the Amazon S3 infrastructure by engineers at Amazon, systems began to error out and fail during. Due to the distributed nature of the PingOne and PingID infrastructure, this disruption did not halt the functioning of all components. During this disruption, customers may have experienced the following abnormal application behavior:

  • dock icons not loading
  • admin portal applications unable to edit or load
  • admin portal sometimes receiving 504 gateway timeouts
  • PingID emails not sent due to impact on Amazon SES
  • PingID authenticator 504 gateway timeouts

The root cause and preventive actions from our provider, Amazon Web Services can be found at this URL: https://aws.amazon.com/message/41926/.

Incident Timeline - February 28, 2017 (MT)

  • 1132 - First reports of issue - Customer cannot access dashboard
  • 1135 - Other support team members confirm issue
  • 1209 - SRE creates incident with status Identified
  • 1249 - Support team reports of PingID email being affected
  • 1352 - AWS reports S3 service operations beginning to recover
  • 1342 - Support team reports some improvement, but still issues with icons
  • 1412 - AWS reports object retrieval, listing, and deletion functions are fully recovered. Adding new objects still impacted
  • 1508 - AWS reports functionality to S3 fully recovered
  • 1530 - After confirming functionality restored, SRE updates incident to resolved
  • 1610 - Ebay reports continued issues with PingID emails due to SES impact
  • 1745 - AWS reports SES functionality fully recovered

Affected Services

PingOne Services

  • Admin Portal Monitor
  • Administration Portal
  • PingOne dock (North America)

PingID Services

  • PingID App
  • PingID Authenticator (North America)

Resolution

Resolution centered on Amazon AWS restoring availability to S3 and SES services.

Ping Action Items

  • Improve the PingOne Admin Portal to allow creation and editing of applications while S3 is not available.
  • Cross-Region replication for critical S3 resources.
  • Support multiple regions for SES or other redundant email service - MX records multiple regions.
  • Add detection of SES outages
  • If AWS is detected to be down then switch to a redundant email service or the ability to switch to a different AWS region of SES.
Posted 6 months ago. Mar 08, 2017 - 08:54 MST

Resolved
Amazon has resolved and fully recovered operations for the S3 service. Previously reported impact with PingOne Admin Portal, PingID, and PingOne Dock are resolved.
Posted 6 months ago. Feb 28, 2017 - 15:30 MST
Update
Amazon is reporting that S3 has started to recover from the service impact. Users should expect to see improved error rates within the hour.
Posted 6 months ago. Feb 28, 2017 - 14:11 MST
Update
Due to Amazon S3 issues in the US-East we are seeing issues with PingOne Admin Portal, PingID, and PingOne Dock. PingOne SSO is not impacted at this time. SRE is aware and monitoring and will update when further info is available.

Amazon S3 service status can be tracked at https://status.aws.amazon.com/

For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted 6 months ago. Feb 28, 2017 - 13:44 MST
Identified
Due to Amazon S3 issues in the US-East we are seeing issues with PingOne Admin Portal. SSO is not impacted. SRE is aware and monitoring and will update when further info is available.
For additional questions please contact support@pingidentity.com, or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted 6 months ago. Feb 28, 2017 - 12:09 MST
This incident affected: PingID Services (PingID App) and PingOne Services (Administration Portal, PingOne dock (North America)).