Final Post Mortem Summary
At approximately 12:30 UTC on Monday 12th December, a breaking change was released to the Yapily production instance causing some payments to fail for some financial institutions. Yapily teams were first alerted at 13:01 and our major incident management processes were launched at 13:41. Service was impacted for 1 hour and 58 minutes. Root Cause: A refactoring and simplification of the registration logic for constructing an Open Banking Statement introduced an error in Key ID registration in one of our bank-facing services. This error was problematic only for a subset of institutions; the impact was limited to 1) the specific banks integrated via this service and 2) those requiring Key IDs. Resolution and Recovery: After investigation, the breaking change was subsequently rolled back at 14:27. This rollback mitigated the impact for customers. Corrective and Preventative Measures: In response to this incident, we have identified the following improvements: Increased testing on legacy code changes, to prevent recurrence. Increased ownership and accountability on code changes, to prevent recurrence. Review of our incident escalation plan, to decrease our overall impact duration. Review of our current alerting, to decrease our overall impact duration. Timeline (UTC): 12:30 - A breaking change was released to production, causing some financial institutions to become unavailable. 13:01 - First alert of the ongoing issue. 13:41 - Major incident was raised by Yapily staff. 14:01 - Our engineering team began rolling back the breaking change identified. 14:27 - The roll back was completed and all services were restored.