BGP flaps often occur due to packet loss along the path between underlying GRE endpoints. Gathering the right data will help to identify where this loss is occurring.
- This article will show you the data you can gather to help identify the cause of BGP flaps
- After reading, you will be able to identify the most likely source of BGP flaps while they are happening
- Note: Short BGP flaps are very common and not necessarily an indication of a traffic impacting event. Review: What Is BGP Flapping?
- Silverline DDoS
- GRE Tunnels
1. First, identify if the issue can be found in local configurations and BGP logs
- cisco: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/22166-bgp-trouble-main.html
- juniper: https://www.juniper.net/documentation/en_US/junos/topics/topic-map/troubleshooting-bgp-sessions.html#id-checklist-for-verifying-the-bgp-protocol-and-peers
If the reason for the flapped sessions is unclear after this point, continue to 2.
2. Then, gather path health data for submitting to Silverline SOC
- While the session is flapping, take traceroute from your local GRE endpoint to the Silverline GRE endpoint that the BGP session resides on.
- How To Take Traceroutes
- To find the remote Silverline GRE endpoint, you can check on your router or in Portal under
Config > Routed Configuration > GRE Tunnel Management > Deployed
- While the session is flapping, run a ping from your local GRE endpoint to the Silverline GRE endpoint that the BGP session resides on.
- Identifying if the flapped BGP sessions share a common endpoint can quickly point to an area of focus
- In a scenario where flapped BGP sessions share a single GRE endpoint locally, but terminate at two different Silverline regions, the issue is more likely to reside at the common endpoint or the upstream carrier
- In a scenario where flapped BGP sessions are on multiple local endpoints over different carriers and/or regions and share a common Silverline scrubbing center, the issue is more likely to reside at Silverline or a Silverline production carrier
- Traceroute data can then be used to continue this investigation path
- The most useful data is captured in the moment.
- Traceroutes help to show an issue along the path while it is ongoing. Once the BGP sessions re-establish, any impacting packet loss on the path will likely have recovered.
- Short BGP flaps are very common and not necessarily an indication of a traffic impacting event