Issues with Email notification in PingOne Services
Incident Report for Ping Identity
Postmortem

Incident Summary

During the process of removing dedicated IPs for emails, we put the IPs into a “standby” mode based on Amazon’s recommendation. During this process, an issue in Amazon AWS SES (Simple Email Service) resulted in zero active IP addresses available in the dedicated pool, and the services failed to send any email from 20:25 UTC to 23:19 UTC on 06/11/2020. The issue was resolved after Amazon removed these IPs from the “standby” mode.

Customer Impact

All workflows that depend on emails were disrupted.

Incident Timeline

June 11, 2020 (all time in UTC)

  • Around 20:19 - AWS put our IPs into "standby"
  • 20:25 - All emails stopped going out
  • 21:19 - SRE was notified by Support that customers’ OTP email is down
  • 21:29 - SRE opened case about emails being down (case # 7089253821)
  • 22:31 - Based on the errors coming from SES and troubleshooting done by AWS support indicating that emails are being throttled, SRE opened another ticket to increase the throttling limits (case # 7089347421)
  • 23:19 - Additional troubleshooting with AWS indicating the “standby IP” may be the issue. AWS removed the IPs from the "standby" mode. Emails started flowing from this point on.

Affected Services

All workflows sending email through AWS SES in NA and EU (Ireland only).

Resolution

Removing the IPs from the “standby” mode resolved the issue.

Ping Action Items

  • Improve alerting around SES when the amount of emails being sent drops under a threshold.
  • Improve the process to limit future AWS provisioning events to a single region where possible.
  • Improve email delivery resiliency by leveraging multiple IP pools in multiple regions.
Posted Jun 16, 2020 - 21:36 UTC

Resolved
This incident has been resolved.
Posted Jun 12, 2020 - 05:59 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 11, 2020 - 23:29 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 11, 2020 - 23:17 UTC
Investigating
We are experiencing issues with Email notifications in PingOne Services for NA. The Site Reliability Engineering team has been notified and is currently working the issue. We will update this message when the incident has been identified.

For additional questions please contact Ping Identity Technical Support by opening a case through The Community/Support site (https://www.pingidentity.com/en/account/sign-on.html), or follow this incident on https://status.pingidentity.com for real-time service updates.
Posted Jun 11, 2020 - 22:01 UTC
This incident affected: PingID Services (PingID Authenticator - North America (.com), Notification Services).