On 19 July 2024, a global technical failure affected Windows operating systems worldwide, causing the infamous ‘blue screen of death’ (BSoD). This problem knocked out 8.5 million Windows devices, including banks, airports, emergency services and other industries, unable to perform any activity.
CrowdStrike issued an official statement clarifying that this was not a cyber attack or security incident and offered a temporary solution. CEO George Kurtz also reiterated on the social network X that ‘there was no security incident or hacker attack’. The company pledges to publish a full analysis of the causes once the investigation is complete.
What did we learn from this blackout?
1. Only trust official sources: CrowdStrike has made it clear that this is not a cyber attack, so it is crucial to only refer to official company releases to avoid misinformation.
2. Beware of fake domains: during these crises, cyber criminals take advantage of the situation to carry out real scams and cyber attacks. For example, it has been reported that cyber criminals are creating fake websites similar to the official CrowdStrike websites.
3. Always have a plan B: The technical breakdown has particularly affected the Western world. The East suffered fewer disruptions of service because companies had a security alternative. This highlights the need to have a plan of action and diversification of suppliers, especially for companies with a global presence.
4. Crisis management and corporate communication: companies must be prepared to react quickly, provide accurate information and manage public expectations to safeguard their reputation and customer trust. Emergency preparedness is crucial to minimise damage and restore trust quickly.
How can these accidents be prevented?
Let’s look at some strategies to prevent these types of incidents and avoid downtime.
1. Software resilience and testing:
- Implement different types of tests for rapid response (local developer testing, content update and rollback testing, stress testing, fuzzing, fault injection, stability testing and content interface testing).
- Add validation checks to the Content Validator to prevent the release of problematic content in the future.
- Enhance existing error handling in the Content Interpreter.
2. Rapid Response Content Distribution:
- Implement a phased deployment strategy for rapid response content, starting with a canary deployment.
- Improve sensor and system performance monitoring by collecting feedback during the deployment of rapid response content to drive a phased rollout.
- Allow customers greater control over the delivery of rapid response content updates, allowing granular selection of when and where these updates are deployed.
- Provide details of content updates through release notes that customers can subscribe to.
Conclusion
Companies should carefully assess the risks and opportunities associated with each supplier, and develop an action plan to switch to another supplier if needed. In the future, possible blackouts may not be about grounding flights or financial activities, but about the infrastructure of health services, electricity, which at crucial times of the year could put the lives of thousands of people at risk.
Analysis by Vasily Kononov – Threat Intelligence Lead, CYBEROO