Skip to main content

The digital realm’s reliance on cloud services reached new heights in 2023, and with it came a series of unprecedented disruptions. This article delves into the 10 most significant cloud outages of the year, unraveling the lessons they offer for wealth and asset managers. Beyond a mere recounting of incidents, we’ll intertwine insights from these outages with broader considerations such as supplier standards, data security, and strategies for future-proofing.

The year kicked off with Microsoft Teams and 365 users facing a substantial outage in North America. As thousands grappled with server, application, and login issues, the incident underscored the far-reaching impact of disruptions in Microsoft’s ecosystem. This outage, coupled with a networking issue later in January, set the tone for challenges ahead.

Lesson Learned: Placing exclusive reliance on a single provider can lead to significant vulnerabilities.

In the same month, IT Glue, a documentation software vendor, underwent emergency maintenance, disrupting services for users globally. While the platform restored functionality, the incident highlighted the susceptibility of even niche services to unexpected interruptions.

Lesson Learned: Even seemingly niche providers should be scrutinized for their resilience and recovery capabilities.

Despite Oracle’s bold claims that their cloud infrastructure “doesn’t go down,” February witnessed a multi-day outage affecting users globally. The issue, rooted in backend infrastructure challenges, dispelled the myth of invincibility surrounding major cloud players.

Lesson Learned: Assurances from providers should be validated, and contingency plans should be in place.

March brought about an outage in Microsoft Exchange Online, preventing users from accessing their mailboxes. The incident’s resolution involved addressing directory-based edge blocking, revealing the intricate web of dependencies within Microsoft’s services.

Lesson Learned: Understanding the interplay of services is crucial for mitigating the impact of disruptions.

Datadog’s almost two-day outage in March prompted concerns about revenue and raised questions about the resilience of cloud monitoring and security tools. The incident, attributed partly to an operating system update, emphasised the need for effective communication during crises.

Lesson Learned: Regularly update and communicate with users about potential challenges to maintain trust.

April saw hundreds of AWS users grappling with an outage that lasted over three hours. The disruption affected services from account sign-ups to voice assistant Alexa, highlighting the broad impact a cloud outage can have on various applications.

Lesson Learned: Diversification across cloud services can mitigate the impact of a single provider’s outage.

Microsoft’s April brought a series of outages affecting M365 online applications, Teams, SharePoint Online, and Outlook. The recurrence of disruptions underscored the need for comprehensive contingency plans.

Lesson Learned: Regularly review and update contingency plans to adapt to evolving challenges.

A fire in a Paris data centre wreaked havoc on Google Cloud services, affecting more than 90 cloud services for European users. The incident shed light on the physical risks that can impact cloud infrastructure.

Lesson Learned: Consider physical risks and geographic diversity when choosing cloud providers.

April brought disruptions to the Oracle-Cerner Electronic Health Record system, impacting critical healthcare services. The incidents highlighted the potential consequences of outages in essential systems.

Lesson Learned: Mission-critical services should have robust backup and recovery mechanisms.

As June unfolded, Microsoft faced multiple outages, with Microsoft 365 users and Azure cloud platform portal experiencing disruptions. The incidents, including a claimed DDoS attack, showcased the evolving nature of cyber threats and their potential to cause widespread outages.

Lesson Learned: Cybersecurity measures should be dynamic and adaptive to emerging threats.

  1. Multi-Cloud Strategy: Embrace a multi-cloud strategy to distribute dependencies across different providers, mitigating the impact of outages from a single source.
  2. Data Backup: Prioritise regular data backups to ensure quick recovery in the event of a cloud outage. This practice safeguards essential data and minimises potential losses.
  3. Service Level Agreements (SLAs): Familiarise yourself with service level agreements, enabling you to claim credits and refunds in case of service disruptions. Understanding SLAs empowers users to hold providers accountable.
  4. Continuous Monitoring: Implement continuous monitoring of cloud services to detect abnormalities and potential issues early on. Proactive monitoring allows for swift responses to emerging threats.
  5. Communication and Transparency: Learn from incidents like Datadog’s outage and prioritise clear communication with users during disruptions. Transparent communication builds trust and helps manage user expectations.
  6. Diversification of Suppliers: Extend the evaluation of cloud service reliability to suppliers and their suppliers. A comprehensive assessment of the entire supply chain enhances overall resilience.
  7. Evaluating Disaster Recovery Plans: Assess and refine disaster recovery plans, considering both virtual and physical threats. The incident involving Google’s Paris data center underscores the importance of preparing for unforeseen events.
  8. Understanding SLA Credits: Familiarise yourself with the terms of SLA credits offered by cloud service providers. This knowledge can be crucial in negotiating compensation for downtime.

The 10 cloud outages of 2023 serve as an intricate tapestry of challenges and lessons. Wealth and asset managers must not only learn from these specific incidents but also weave these lessons into the broader fabric of their digital strategies. From supplier standards to data security and diversification, the key to navigating the storm lies in a comprehensive and adaptive approach. As the digital landscape evolves, so must the strategies that underpin it, ensuring a resilient and secure future for wealth and asset management in the cloud era.

Click here to set up a call with one of our experts

VENDOR iQ Weekly
VENDOR iQ by Graphene

Related Posts

OFFICE ADDRESS: 4 Royal Crescent Glasgow Scotland G3 7SL


PHONE: 0800 538 5405