BGP leak causing Internet outages in Japan and beyond.

Posted by Andree Toonk - August 26, 2017 - BGP instability - No Comments
Yesterday some Internet users would have seen issues with their Internet connectivity, experiencing slowness or parts of the Internet as unreachable. This incident hit users in Japan particularly hard and it caused the Internal Affairs and Communications Ministry of Japan to start an investigation into what caused the large-scale internet disruption that slowed or blocked access to websites and online services for dozens of Japanese companies.

In this blog post we will take a look at the root cause of these outages, who was affected and what networks were involved.

Starting at 03:22 UTC yesterday (aug 25) followers of @BGPstream would have seen an increase in alerts involving Google. The BGPstream alerts were informing us that Google was announcing the peering lan prefixes of a few well known Internet exchanges. This in itself is actually a fairly common type of incident and typically indicates something isn’t quite right within the networks hijacking those prefixes and so these alerts were the first clues that something wasn’t quite right with Google’s BGP advertisements.

  A closer look at our data shows not only BGP hijack incidents but also a high number of BGP leak events. A random example is this one: 171.5.0.0/17 announced by AS45629 (Jastel out of Thailand), which all of a sudden became reachable with Google as a provider for Jastel. To demonstrate this let’s look at some of the example paths (not an exclusive list):

 
1103 286 701 15169 45629
13335 9498 5511 701 15169 45629
202140 29075 5511 701 15169 45629
52342 20299 262206 701 15169 45629
If we take a closer look at the AS paths involved starting at the right side, we see the prefix was announced by 45629 (Jastel) as expected. Since Jastel peers with Google (15169) that’s the next AS we see. The next AS in the path is 701 (Verizon) and this is where it is getting interesting as Verizon has now started to provide transit for Jastel via Google. Verizon (701) then announced that to several of it’s customers, some of them very large such as KPN (286) and Orange (5511). So by just looking at 4 example paths we can see it hit large networks in Europe, Latin America, the US, and India (9498 Airtel).

In the example above we can see how Google accidentally became a transit provider for Jastel  by announcing peer prefixes to Verizon. Since verizon would select this path to Jastel it would have sent traffic for this network towards Google. Not only did this happen for Jastel, but thousands of other networks as well. 

Google is not a transit provider and traffic for 3rd party networks should never go through the Google network. Jastel has a few upstream providers and with the addition of Google and Verizon to the path, it’s likely only Verizon customers (which is still significant) would have chosen this path and only those that had no other alternative or specifically prefered Verizon over shorter paths.  However this is just the start.

 

A word about traffic engineering

Google is one of the largest (CDN) networks in the world. It has an open peering policy and is extremely well connected with many peers. It’s also the source of a large amount of traffic with popular websites such as Youtube, Google search, Google Drive, Google Compute, etc. As a result many networks exchange a significant volume of traffic with just Google and those with direct peering with Google will want to make sure Google picks the right peering link with them. So as result large networks will start to deploy traffic engineering tricks to make sure traffic flows over the correct peering links with Google. The most powerful trick in the book is to start de-aggregating and announce more specifics. This means no matter the AS path length or whatever local-pref Google sets locally, the more specific prefixes are always preferred.

  A unique insight into Google’s network

Since Google essentially leaked a full table towards Verizon, we get to peek into what Google’s peering relationships look like and how their peers traffic engineer towards Google. Analyzing this data set we find many more specific prefixes. Meaning prefixes that are not normally seen in the global Internet routing table (DFZ) and only made visible to Google for traffic engineering requirements. Let’s take a look at an example. The prefix 114.154.133.0/24 is not normally seen on the Internet, instead it is announced as the larger aggregate 114.144.0.0/12 by AS4713 NTT OCN, the largest service provider in Japan. During the time of the incident we see over 20,000 new OCN prefixes, all more specifics of their larger aggregate blocks (mainly their /11, /12’s, /13’s, 14’s and /15’s).  In this case OCN announced these more specific prefixes primarily to control how traffic comes in from Google. Now that Google leaked these prefixes to Verizon as well, everyone seeing announcements for these prefixes would have sent traffic for this prefix towards Verizon and Google, essentially changing the local traffic engineering trick into a much more global traffic engineering setup. Verizon customers and peers that would have seen this announcement would have preferred this over any other path since more specifics always win.

   

Size and impact of this incident

If we look at what networks were impacted the most we can see that AS4713 NTT OCN, the largest service provider in Japan was impacted most severe. Our data shows over 24,000 new more specific prefixes for OCN were visible via Google and Verizon during the time of the incident.

We also saw over 7,000 new more specifics for AS7029 (Windstream). The total list of new (mostly more specifics) is around 50,000. For those interested, the top 30 affected networks can be found below.  

All of these leaks were visible between 03:22 UTC and 04:01 UTC. Or in local Japan time: 12:22 PM and 1:01 PM. 

     
Number of new prefixes via Google and Verizon ASN ASN name
24834 AS4713   OCN - NTT Communications Corporation  
7715 AS7029   Windstream Communications Inc                            
4650 AS8151   Uninet S.A. de C.V.                                      
2852 AS1659   Taiwan Academic Network (TANet) Information Center       
1746 AS3209   Vodafone GmbH                                            
1315 AS2519   ARTERIA Networks Corporation                             
1218 AS28573   CLARO S.A.                                              
614 AS9394   China TieTong Telecommunications Corporation             
560 AS12715   Orange Espagne S.A.U.                                   
506 AS27747   Telecentro S.A.                                         
463 AS16814   NSS S.A.                                                
430 AS12066   TRICOM                                                  
428 AS45510   TELCOINABOX PTY LTD                                     
404 AS11830   Instituto Costarricense de Electricidad y Telecom.      
369 AS39651   Com Hem AB                                              
357 AS6400   Compañía Dominicana de Teléfonos, C. por A. - CODETEL    
316 AS10318   CABLEVISION S.A.                                        
280 AS5615   KPN B.V.                                                 
225 AS4181   TDS TELECOM                                              
224 AS43205   Bulsatcom EAD                                           
221 AS17908   Tata Communications                                     
183 AS395105   HYTEC-7779                                             
179 AS45194   Syscon Infoway Pvt. Ltd.                                
166 AS9676   SaveCom Internation Inc.                                 
164 AS4764   Wideband Networks Pty Ltd, Transit AS                    
152 AS18106   Viewqwest Pte Ltd                                       
140 AS45069   china tietong Shandong net                              
131 AS10481   Prima S.A.                                              
128 AS13445   Cisco Webex LLC                                         
126 AS13156   Cabovisao, televisao por cabovisao, sa                  
 

Closing thoughts

In total we saw over 135,000 prefixes visible via the Google - Verizon path. Widespread outages, particularly in Japan (OCN) were because of the more specifics, causing many networks to reroute traffic toward verizon and Google which likely would have congested that path or perhaps hit some kind of acl, resulting in the outages. Many BGPmon users would have seen an alert similar like the one below, informing them new prefixes were being originated and visible global. 

====================================================================
New prefix for AS14061 (Code: 60)
====================================================================
Detected new prefix:  178.62.96.0/19
Update time:          2017-08-25 03:25 (UTC)
Detected by #peers:   18
Announced by:         AS14061 (Digital Ocean, Inc.)
Upstream AS:          AS15169 (Google Inc.)
ASpath:               18356 38794 45796 2516 701 15169 14061
Monitoring is one simple thing operators can do to quickly detect this and take action. In this case the recommended course of action would have been to shutdown the peering sessions with Google.

BGP leaks continue to be a great risk to the Internet's stability. It’s easy to make configuration mistakes that can lead to incidents like this. In this case it appears a configuration error or software problem in Google's network led to inadvertently announcing thousands of prefixes to Verizon, who in turn propagated the leak to many of its peers.

Since it is easy to make configurations errors, it clearly is a necessity to have filters on both sides of an EBGP session. In this case it appears Verizon had little or no filters, and accepted most if not all BGP announcements from Google which lead to widespread service disruptions. At the minimum Verizon should probably have a maximum-prefix limit on their side and perhaps some as-path filters which would have prevented the wide spread impact.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *