Report on downtime – 25 April 2020
Over the past weekend, Bitstamp experienced a period of downtime, during which our web page and trading services were temporarily unavailable. After we restored all systems, we began carrying out a detailed internal investigation into the event and taking steps to prevent similar situations in the future. The following is a summary of the information we have available at this time.
What happened?
Technical problems affecting core parts of our infrastructure, including our matching engine, started on Saturday 25 April 2020 at 20:00 UTC, because the servers of one of our infrastructure providers were experiencing issues. This caused a period of downtime lasting until 00:30 UTC on Sunday 26 April.
Trading, withdrawals and deposits were offline during this period and our web page, mobile app and APIs were unavailable.
How did Bitstamp respond?
Our systems immediately detected the issue, automatically put our trading services on hold and alerted our technical team. Our engineers started investigating and fixing the issue as soon as they were alerted, working on multiple parts of the platform in parallel.
We discovered that, due to our infrastructure provider’s servers experiencing issues, our active failovers were also affected. At this point, we used our off-site backup, which worked as intended, bringing our systems to a complete recovery.
Due to the unprecedented nature of this event, our engineers conducted extensive reviews in order to make sure the issues were resolved and all systems were functioning as expected before re-launching our services. This cost us some time but allowed us to ensure a 100% successful recovery with no lingering issues.
Our matching engine was fully operational by 0:30 UTC, when our platform was back online in cancel-only mode, allowing clients to cancel previously published orders. All trading services were fully restored and operational by 0:45 UTC.
Next steps
Our backups worked as intended and recovery was 100% successful. Apart from a period of downtime, this event had no other consequences on our services or our customers’ accounts.
We are now implementing a remediation plan that will improve our system’s ability to handle critical failures from infrastructure providers and developing robust protocols which will help us respond even more efficiently in the future.
Thank you for your continued support during this time.