On September 16, 2024 from 5am Singapore to 137pm Singapore AppKit Embedded Wallet Email login & cloud.walletconnect.com email delivery were broken due to an outage of Postmark, an email delivery service.
We don’t know the exact numbers of customers affected but assume at least dozens.
The issue was reported:
The issue started at 5am Singapore. An internal user reported at 9:46am Singapore. An operator started investigating at 11:07am Singapore and reproduced the issue.
The operator suspected Magic, the key management service/authentication layer backing the AppKit Wallet, would be at fault. Operator paged Magic in Slack providing evidence that it doesn’t look like Postmark.
At 11:32am Magic provided evidence that it appears that the issue is constrained to Postmark.
Operator made an account with Sendgrid, an alternative mailing provider, but got blocked by their fraud detection for unknown reasons and was unable to proceed.
At 1:38pm operator noticed that they could disable the custom SMTP provider and rely on Magic’s email provider which fails over to Sendgrid.
Around the same time another operator switched Cloud over to Supabase mailing instead of Postmark.
The other operator created a Sendgrid account as well and switched Cloud to Sendgrid as Cloud was getting rate limited by Supabase.
At ~430pm the second Sendgrid account also got blocked.
At 640pm Singapore the Magic configuration was switched back to Postmark such that the sender of emails would appear as @walletconnect.com
again.
The root cause was Postmark’s SSL certificate expiring at 5am Singapore.
Because emails were not delivered.
Because Postmark, the outgoing email service we use for both platforms, had an outage.
We don’t execute email login on either Cloud or AppKit as a Canary flows. The Canary flows we have don’t exercise sign up (Cloud) or email login (AppKit).
The operator was not aware that disabling the custom SMTP provider setting was an option.
The operator should have asked Magic - who were helping to remediate - if they have ideas of how to resolve this quicker.
Have a Sendgrid account ready for redundancy or even investigate automatic failover.
Mid-term: contemplate covering email flows in Canaries
https://status.postmarkapp.com/notices/5jmmv4cyfqboak2v-service-issue-outbound-smtp-sending-issues