Imagine you’re a provider routing a PI space prefix for one of your customers. Now imagine that one of your IX peers started to advertise a more specific subnet of that customer network to you. How would and how should you forward traffic destined for that prefix? This quirk looks at just a such a scenario from the point of view of an ISP that adheres to BCP38 best practice filtering policies…
So here’s the scenario:
In this setup Xellent IT Ltd is both a customer and a provider. It provides transit for ACME Consulting but it is a customer of Provider A. ACME owns PI space and choses to implement some traffic engineering. It advertises a /23 to Xellent IT and a /24 to Provider B.
Now Provider B just happens to peer with Provider A over a public internet exchange. The quirk appears when traffic from the internet, destined to 18.104.22.168/32, enters Provider A’s network, especially when you consider that Provider A implements routing policies that adhere to BCP38.
But first, what is BCP38?
You can read it yourself here, but in short, it is a Best Current Practice document that advocates for prefix filtering to minimise threats like DDoS attacks. It does this by proposing inbound PE filtering on customer connections that block traffic whose source address does not match that of a known downstream customer network. DDoS attacks have spoofed source addresses. So if every Provider filtered traffic from their customers, to make sure that the source address was from the right subnet (and not spoofed) then these kinds of DoS attacks would disappear overnight.
To quote the BCP directly:
In other words, if an ISP is aggregating routing announcements for multiple downstream networks, strict traffic filtering should be used to prohibit traffic which claims to have originated from outside of these aggregated announcements.
BCP38 – P. Ferguson, D. Senie
To put it in diagram form, the basic idea is as follows:
A provider can also implement outbound filtering to achieve the same result. That is to say, outbound filters can be applied at peering and transit points to ensure that the source addresses of any packets sent out come from within the customer cone of the provider (a customer cone is the set of prefixes sourced by a provider, either as PI or PA space, that makes up the address space for is customer base). This can be done in conjunction with, or instead of, the inbound filtering approach.
There are multiple ways a provider can build their network to adhere to BCP38. As an example, an automated tool could be built that references an RIR database like RIPE. This tool could perform recursive route object lookups on all autonomous systems listed in the providers AS-SET and build an ACL that blocks all outbound border traffic whose source address is not in that list.
Regardless of the method used, this quirk assumes that Provider A is using both inbound and outbound filtering. But as we’ll see, it is the outbound filtering that causes all the trouble… here’s the traffic flow:
Now you might ask why the packet would follow this particular path. Isn’t Provider B advertising the more specific /24 it receives from ACME? How come the router that sent the packet to Provider A over the transit link can’t see the /24?
There are a number of reason for this and it depends on how the network of each Autonomous System along the way is designed. However, one common reason could be due to a traffic engineering service offered but Internet Providers call prefix scoping.
Prefix scoping allows a customer to essentially tell its provider how to advertise its prefix to the rest of the internet. This is done by including predetermined BGP communities in the prefix advertisements. The provider will recognise these communities and alter how they advertise that prefix to the wider internet. This could be done through something like route-map filtering on these communities.
In this scenario, perhaps Provider B is offering such a service. ACME may have chosen to attach the ‘do not advertise this prefix to your transit provider x’ community to its BGP advertisement to Provider B. As a result, the /24 prefix doesn’t reach the router connecting to Provider A over its transit link, so it forwards according to the /23.
This is just one example of how traffic can end up at Provider A. For now, let’s get back to the life of this packet as it enters Provider A.
Upon receipt of the packet destined for 22.214.171.124/32, Provider A’s border router will look in its routing table to determine the next hop. Because it is more specific, the 126.96.36.199/24 learned over peering will be seen in the RIB as the best path, not the /23 from the Xellent IT link. The packet is placed in an LSP (assuming an MPLS core) with a next hop of the border router that peers with Provider B at the Internet Exchange.
You can probably see what’s going to happen. When Provider A’s border router at the Internet Exchange tries to forward the packet to Provider B it has to pass through an outbound ACL. This ACL has been built in accordance with BCP38. The ACL simply checks the source address to make sure it is from with the customer cone of Provider A. Since the source address is an unknown public address sourced from off-net, the packet is dropped.
Now this is inherently a good thing isn’t it? Without this filtering, Provider A would be providing transit for free! However, it does pose a problem after all, since traffic for one of its customers subnets is being blackholed.
From here, ACME Consulting gets complaints from its customers that they can’t access their webserver. ACME contacts its transit providers and before you know it, an engineer at Provider B has done a traceroute and calls Provider A to ask why the final hop in the failed trace ends in Provider As network.
So where to from here? What should Provider A do? It doesn’t want to provide transit for free, and its policy states that BCP38 filtering must be in place. Let’s explore the options.
Before I look at the options available, it worth pausing here to reference an excellent paper by Pierre Francois of the Universite catholique de Louvain entitled Exploiting BGP Scoping Services to Violate Internet Transit Policies. It can be read here and describes the principles underlying what is happening in this quirk in a more high level logistical way that sheds light on why this is happening. I won’t go into exhaustive detail, I highly recommend reading the paper yourself, but to summarise, there are 3 conditions that come together to cause this problem.
This is certainly what is happening here. Provider A sees a /24 from its peer (condition 1), a /23 from its customer (condition 2) and the Transit router that forwards the packet to Provider A cannot see the /24 (condition 3). The result of these conditions is that the packet is being forwarded from AS to AS based on a combination of the more specific route and the less specific route. To quote directly from Francois’ paper:
The scoping being performed on a more specific prefix might no longer let routing information for the specific prefix be spread to all ASes of the routing system. In such cases, some ASes will route traffic falling to the range of the more specific prefix, p, according to the routing information obtained for the larger range covering it, P.
Exploiting BGP Scoping Services to Violate
Internet Transit Policies – Pierre Francois
So what options does Provider A have? How can it ensure that traffic isn’t dropped, but at the same time, make sure it can’t be abused into providing free transit for off-net traffic? Well there’s no easy answer but there are several solutions that I’ll consider:
I’ll try to argue that allowing the transit traffic but only as an exception, is the best course of action. But before that, let’s look at the first two options.
Let’s say Provider A applies an inbound route-map on its peering with Provider B (and all other peers and transits for that matter) to block any advertised prefixes that come from its own customer cone (basically, stopping its own prefixes being advertise towards itself from a non-customer). So Provider A would see Provider B advertising 188.8.131.52/24 and recognise that it as part of Xellent ITs supernet and block it.
This would certainly solve the problem of attempting to forward the traffic out of the Internet Exchange. Unfortunately, there are two crushing flaws with this approach.
Firstly, it undermines the intended traffic engineering employed by ACME and comes will all the inherent problems that asymmetric routing holds. For example, traffic ingressing back into ACME via Xellent IT could get dropped by a session-based firewall that it didn’t go through on its way out. Asymmetric routing is a perfect example of the problems than can result from some ASes forwarding on the more specific route and others forwarding on the less specific route.
Second, consider what happens if the link to Xellent IT goes down, or if Xellent IT stops advertising the /23. Suddenly Provider A has no access to the /24 network. Provider A is, in essence, relying on a customer to access part of the internet (this is of course assuming Provider A is not relying on any default routing). This would not only undermine the dual homing of Customer B, but would also stop Provider A’s other customers reaching ACMEs services.
Clearly forwarding the traffic based on the less specific by blocking the more specific from the peer doesn’t solve anything. It might get through Provider A, but traffic is still being forwarding on a combination of prefix lengths and Provider A could end up denying traffic from its other customers reaching a part of the internet. Not a good look for an internet provider.
What about asking Xellent IT to advertise the more specific? Provider A could then simply prefer the /24 from Xellent IT using local preference. This approach has problems too. ACME isn’t actually advertising the /24 to Xellent IT. Xellent IT would need to ask ACME to do so, however they may not wish to impose such a restriction on their customer. The question then becomes, does Provider A have the right to make such a request? They certainly can’t enforce it.
There is perhaps a legal argument to be made that by not advertising the more specific Provider A is losing revenue. This will be illustrated when we look at the third option of allowing off-net traffic. I won’t broach the topic of whether or not Provider could approach Xellent IT and ask for advertisement of the more specific due to revenue loss, but it is certainly food for thought. For now though, asking Xellent IT to advertise the more specific is perhaps not the preferred approach.
Let’s turn to the third option, which sees Provider A adjust its border policies by adding to its BCP38 ACL. Not only should this ACL permit traffic with source addresses from its customer cone, it should also permit traffic that is destined to prefixes in its customer cone. The idea looks like this:
Now this might look ok. Off-net transit traffic to random public address (outside of Provider As customer cone) is still blocked, and ACMEs traffic isn’t. But this special case of off-net transit opens the door for abuse in a way that could cause Provider A to lose money.
Here’s how it works. For the sake of this explanation, I’ve removed Xellent IT and made ACME a direct customer of Provider A. I’ve also introduced a third service provider.
As we’ve already discussed, this violates BCP38 principles and turns Provider A into free transit for off-net traffic. But of perhaps greater importance is the loss of revenue that Provider A experiences. No one is paying for the increased traffic volume across Provider A’s core and Provider A gains no revenue from the increase – since it only crosses free peering boundaries. Provider B benefits as it sees more chargeable bandwidth used on its downstream link to ACME. ACME Ltd benefits since it can use the cheaper connection and utilize Provider A’s peering and transit relationships for free. If ACME had a remote site connecting to Provider C, GRE tunnels across Provider A’s core could further complicate things.
If ACME was clever enough and used looking glasses and other tools to discover the forwarding path, then there clearly is potential for abuse.
Having said all of that, I would argue that if this is done on a case by case basis, in a reactionary way, it would be an acceptable solution.
For example, in this scenario, as long as traffic flows don’t reach too high a volume (something that can be monitored using something like netflow) and only this single subnet is permitted, then for a sake of maintaining network reachability, this is a reasonable exception. It is not likely the ACME is being deliberately malicious, and as long as this exception is monitored, then the revenue loss would be miniscule and allowing a one-off policy violation would seem to be acceptable.
Rather than try and account for these scenarios beforehand, the goal would be to add exceptions and monitor them as they crop up. There are a number of way to detect when these policy violations occur. In this case, the phone call and traceroute from Provider B is a good way to spot the problem. Regrettably that does require something to go wrong for it be found and fixed (meaning a disrupted service for the customer). There are ways to detect these violation apriori, but I won’t detail them here. Francois’ paper presents the option of using an open-source IP management tool like pmacct which is worth reading about.
If off-net transit traffic levels increase, or more policy violations started to appear, more aggressive tactics might need to be looked at. Though for this particular quirk, allowing the transit traffic as an exception and monitoring its throughout seems to me to be a prudent approach.
Because I’ve spoken about this at a very high level, I won’t include a work section with CLI output. I could show an ACL permitting 184.108.40.206/24 outbound but this quirk doesn’t need that level of detail to understand the concepts.
So that’s it! A really fascinating conundrum that is as interesting to figure out as it is to troubleshoot. I’d love to hear if anyone has any thoughts or possible alternatives. I toyed with the idea of using static routing at the PE facing the customer or assigning a community to routes received from peering that are in your customer cone and reacting to that somehow, but both those ideas ran into similar problems to the ones I’ve outlined above. Let me if you have any other ideas. Thanks for reading.