5th October 2021
Mark Edwards, Director of Cyber Security and Network Services gives his view on the recent Facebook outage
Facebook have confirmed that the outage was caused by a scheduled change to BGP peering which went wrong causing users to be unable to connect. Technicians were unable to back out as the change caused them to lose remote access to
the routers they updated. This is a common risk when making routing updates as if you get it wrong you might also be unable to connect to the device you changed requiring a physical visit to a router which might be in a different country. Local ‘hands on’ engineers do not usually have the specialist skillset to be able to resolve such problems which can extend resolution time.
Although this outage was caused by non-malicious failed change it proves how a simple routing change can have a catastrophic impact to service availability. A malicious actor with access to a routing infrastructure can easily cause more damage than accessing the servers or applications themselves. Network infrastructure security is often neglected and misunderstood, with many organisations failing to patch or secure routing devices, or replace obsolete hardware. In my experience network infrastructure such as routers and switches are not often even considered during many security audits, with penetration testers preferring to focus all their efforts on servers or the applications themselves. Expert networking skills are becoming increasingly scarce with very few people with an in depth understanding of the core Internet routing protocols such as BGP which has been used on the Internet since 1994.
This major incident potentially affecting billions of users illustrates the importance of robust change management including back out plans as recommended by standards such as ISO27001 and IASME Governance. It also highlights how network infrastructure devices can be used to cause major service outages, and therefore must be secured against malicious intent. Network infrastructure security is often neglected during common ‘penetration tests’ but standards such as the UK Government IT Health Check (ITHC) implemented by a competent partner will address this.