From MPLS L3VPN to PBB-EVPN

This blog introduces PBB-EVPN over an MPLS network. But rather than just describe the technology from scratch, I have tried to structure the explanation assuming the reading is familiar with plain old MPLS L3VPN and is new to PBB and/or EVPN. This was certainly the case with me when I first studied this topic and I’m hoping others in a similar position will find this approach insightful.

I won’t be exploring a specifc quirk or scenario – rather I will look at EVPN followed by PBB, giving analogies and comparisons to MPLS L3VPN as I go, before combining them into PBB-EVPN. I will focus on how traffic is identified, learned and forwarded in each section.

So what is PBB-EVPN? Well, besides being hard to say 3 times fast, it is essentially an L2VPN technology. It enables a Layer 2 bridge domain to be stretched across a Service Provider core while utilizing MAC aggregation to deal with scaling issues.

Let’s look at EVPN first.

EVPN

EVPN, or Ethernet VPN, over an MPLS network works on a similar principle to MPLS L3VPN. The best way to conceptualize the difference is to draw an analogy (colour coded to highlight points of comparison)…

MPLS L3VPN assigns PE interfaces to VRFs. It then uses MP-BGP (with the vpnv4 unicast address family) to advertise customer IP Subnets as VPNv4 routes to Route Reflectors or other PEs. Remote PEs that have a VRF configured to import the correct route targets, accept the MP-BGP update and install an ipv4 route into the routing table for that VRF.

EVPN uses PE interfaces linked to bridge-domains with an EVI. It then uses MP-BGP (with the l2vpn evpn address family) to advertise customer MAC addresses as EVPN routes to Route Reflectors or other PEs. Remote PEs that have an EVI configured to import the correct route target, accept the MP-BGP update and install a MAC address into the bridge domain for that EVI.

This analogy is a little crude, but in both cases packets or frames destined for a given subnet or MAC will be imposed with two labels – an inner VPN label and an outer Transport label. The Transport label is typical communicated via something like LDP and will correspond to the next hop loopback of the egress PE. The VPN label is communicated in the MP-BGP updates.

These diagrams illustrate the comparison:

Blog6_image1a_and_b

In EVPN, customer devices tend to be switches rather than routers. PE-CE routing protocols, like eBGP, aren’t used since it operates over layer 2. The Service Provider appears as one big switch. In this sense, it accomplishes the same as VPLS but (among other differences) uses BGP to distribute MAC address information, rather than using a full mesh of pseudowires.

EVPN uses an EVI, or Ethernet Virtual Identifier, to identify a specific instance of EVPN as it maps to a bridge domain. For the purposes of this overview, you can think of an EVI as being quasi-equivalent to a VRF. A customer facing interface will be put into a bridge domain (layer 2 broadcast domain), which will have an EVI identifier associated with it.

The MAC address learning that EVPN utilizes what is called control-plane learning, since it is BGP (a control-plane routing protocol) that distributes the MAC address information. This is in contrast to data-plane learning, which is how a standard switch learns MAC addresses – by associating the source MAC address of a frame to the receiving interface.

The following Cisco IOS-XR config shows an EVPN bridge domain and edge interface setup, side by side with a MPLS L3VPN setup for comparison:

Blog6_output1a_and_b

NB. For MPLS L3VPN config  the RD config (which is usually configured under CE-PE eBGP config) is not shown. PBB config is shown in the EVPN Bridge domain, this will be explained further into the blog.

EVPN seems simple enough as first glance, but it has a scaling problem, which PBB can ultimately help with…

Any given customer site can have hundreds or even thousands of MAC addresses, as opposed to just one subnet (as in an MPLS L3VPN environment). The number of updates and withdrawals that BGP would have to send could be overwhelming if it needed to make adjustments for MAC addresses appearing and disappearing – not to mention the memory requirements. And you can’t summarise MAC addresses like you can IP ranges. It would be like an MPLS L3VPN environment advertising /32 prefixes for every host rather than just one prefix for the subnet. We need a way to summarise or aggregate the MAC addresses.

Here’s where PBB comes in…

PBB – Provider Backbone Bridging (802.1ah)

PBB can help solve the EVPN scaling issue by performing one key function – it maps each customer MAC address to the MAC address of the attaching PE. Customer MAC addresses are called C-MACs. The PE MAC addresses are call B-MACs (or Bridge MACs).

This works by adding an extra layer 2 header to frame as it is forwarded from one site to another across the provider core. The outer layer 2 header has a destination B-MAC address of the PE device that the inner frames destination C-MAC is associated with.  As a result, PBB is often called MAC-in-MAC. This diagram illustrates the concept:

Blog6_image2_pbb

NB. In PBB terminology the provider devices are called Bridges. So a BEB (Backbone Edge Bridge) is a PE and a BCB (Backbone Core Bridge) is a P. For sake of simplicity, I will continue to use PE/P terminology. Also worth noting is that PBB diagrams often show service provider devices as switches, to illustrate the layer 2 nature of the technology – which I’ve done above.

In the above diagram the SID (or Service ID) represents a layer 2 broadcast domain similar to what an EVI represents in EVPN.

Frames arriving on a PE interface will be inspected and, based on certain characteristics, it will be mapped or assigned to a particular Service ID (SID).

The characteristics that determine what SID a frame belongs to can be a number of things:

  • The customer assigned VLAN
  • The Service Provider assigned VLAN
  • Existing SID identifiers
  • The interface it arrives on
  • A combination of the above or other factors

To draw an analogy to MPLS L3VPN – the VRF that an incoming packet is assigned to is determined by whatever VRF is configured on the receiving interface (using ip vrf forwarding CUST_1 in Cisco IOS interface CLI).

Once the SID has been allocated, the entire frame is then encapsulated in the outer layer 2 header with destination MAC of the egress PE.

In this way C-MACs are mapped to either B-MACs or local attachment circuits. Most importantly however the core P routers do not need to learn all of the MAC addresses of the customers. They only deal with the MAC addresses of the PEs. This allows a PE to aggregate all of the attached C-MACs for a given customer behind its own B-MAC.

But how does a remote PE learn which C-MAC maps to which B-MAC?

In PBB learning is done in the data-plane, much like a regular layer 2 switch. When a PE receives a frame from the PBB core, it will strip off the outer layer 2 header and make a note of the source B-MAC (the ingress PE). It will map this source B-MAC to the source C-MAC found on the inner layer 2 header. When a frame arrives on a local attachment circuit, the PE will map the source C-MAC to the attachment circuit in the usual way.

PBB must deal with BUM traffic too. BUM traffic is Broadcast, Unknown Unicast or Multicast traffic. An example of BUM traffic is the arrival or frame for which the destination MAC address is unknown. Rather than broadcast like a regular layer 2 switch would, a PPB PE will set the destination MAC address of the outer layer 2 header to a special multicast MAC address that is built based on the SID and includes all the egress PEs that are part of the same bridge domain. EVPN uses a different method or handling BUM traffic but I will go into that later in the blog.

Overall, PBB is more complicated than the explanation given here, but this is the general principle (if you’re interested, see section 3 of my VPLS, PBB, EVPN and VxLAN Diagrams document that details how PBB can be combined the 802.1ad to add an aggregation layer to a provider network).

Now that we have the MAC-in-MAC features of PBB at our disposal, we can use it to solve the EVPN scaling problem and combine the two…

PBB-EVPN

With the help of PBB, EVPN can be adapted so that it deals with only the B-MACs.

To accomplish this, each EVPN EVI is linked to two bridge domains. One bridge domain is dedicated to customer MAC addresses and connected to the local attachment circuits. The other is dedicated to the PE routers B-MAC addresses. Both of these bridge domains are combined under the same bridge group.

Blog6_image3_bridge_domains

The PE devices will uses data-plane learning to build a MAC database, mapping each C-MAC to either an attachment circuit or the B-MAC of an egress PE. Source C-MAC addresses are learned and associated as traffic flows through the network just like PBB does.

The overall setup would look like this:

Blog6_image4_pbb_evpn_overview

The only thing EVPN needs to concern itself with is advertising the B-MACs of the PE devices. EVPN uses control-plane learning and includes the B-MACs in the MP-BGP l2vpn evpn updates. For example, if you were to look at MAC address known to a particular EVI on a route-reflector, you would only see MAC address for PE routers.

Looking again at the configuration output that we saw above, we can get a better idea of how PBB-EVPN works:

Blog6_output2_pbb_evpn_detail

NB. I have added the concept of a BVI, or Bridged Virtual Interface, to the above output. This can be used to provide a layer 3 breakout or gateway similar to how an SVI works on a L3 switch.

You can view the MAC addresses information using the following command:

Blog6_output3_macs

Now lets look at how PBB-EVPN handles BUM traffic. Unlike PBB on its own, which just sends to a multicast MAC address, PBB-EVPN will use unicast replication and send copies of the frame to all of the remote PEs that are in the same EVI. This is an EVPN method and the PE knows which remote PEs belong to the same EVI by looking in what is called a flood list.

But how does it build this flood list? To learn that, we need to look at EVPN route-types…

MPLS L3VPN sends VPNv4 routes in its updates. But EVPN send more than one “type” of update. The type of update, or route-type as it is called, will denote what kind of information is carried in the update. The route-type is part of the EVPN NLRI.

For the purposes of this blog we will only look at two route-types.

  • Route-Type 2s, which carry MAC addresses (analogous to VPNv4 updates)
  • Route-Type 3s, which carry information on the egress PEs that belong to an EVI.

It is these Route-Type 3s (or RT-3s for short) that are used to build the flood list.

When BUM traffic is received by a PE, it will send copies of the frame to all of its attachment circuits (except the one it received the frame on) and all of the PEs for which it has received a Route-Type 3 update. In other words, it will send to everything in its flood-list.

So the overall process for a BUM packet being forwarded across a PBB-EVPN backbone will look as follows:

Blog6_image5_bum_traffic

So that’s it, in a nutshell. In this way PBB and EVPN can work together to create an L2VPN network across a Service Provider.

There are other aspects of both PBB and EVPN, such as EVPN multi-homing using Ethernet Segment Identifiers or PBB MAC clearing with MIRP to name just a couple, but the purpose of this blog was to provide an introductory overview – specifically for those used to dealing with MPLS L3VPN. Thoughts are welcome, and as always, thank you for reading.

Bridging Layer 2 Across the Core

Welcome to netquirks

As this is my first blog I thought I should write a bit of an introduction. This site is dedicated to looking at interesting and, as the name suggests, quirky scenarios in the world of Network Engineering.

I’ve also added some of my study notes, GNS3 labs and other bits and pieces, so feel free to have a look around. Details of the site, including the layout and a bit about myself can be found on the About page.

Generally blogs will be divided up into three sections: the quirk, the search and the work. The quirk describes the scenario, the search describes how a solution was arrived at and the work shows the technical and command line details. I’ll try and add a new blog once a month.

I will add to this site as time goes by. Any feedback is more than welcome…

 

Bridging Layer 2 Across the Core

This first scenario looks at a case where two remote sites needed to be connected through layer 2 across a Service Provider Core and a single xconnect or changing of a BGP session type was not possible. We ended up having to combine bridging, pseudowires and trunking to provide access…

 

The quirk

The picture below shows the basic setup. We needed to combine two layer 2 domains across an MPLS core. A new connection was brought into a switch on VLAN 6 at Site A. It needed to connect over to Site B. Under normal circumstances we would build a layer 2 xconnect/pseudowire between the sites, however in this circumstance we were not able to…

For a layer 2 xconnect to be configured the terminating device must be able to determine the next-hop label to push on the top of the frame. However the gateway of the Site B Layer 2 domain was a Cisco 7200 router which ran an Option A eBGP session to our PE. This meant it wasn’t getting labels over BGP. In addition, there was no LDP between the 7200 and the PE.

We couldn’t simply configure an Option B session (and consequently move the xconnect onto the 7200) because this would involve potential downtime for the site which was unacceptable.

To make it worse, there were no cable runs between the two locations to bring up a simple layer 2 point-to-point.

It should also be noted that router-on-stick was used at Site B meaning there were other VLANs, all terminating on their own sub-interface, connected to the 7200.

In summary it looked as follows:

blog1_image1_setup

 

The search

Even though an xconnect could not go the full length, the decision was made to push one as far as was viable. So we began by creating an xconnect from PE1 to PE2. VLAN 6 was added to S1’s uplink trunk and the sub-interface that was created for it on PE1 was added to the xconnect (CLI to follow).

The problem we had to face was how to get the layer 2 connectivity around or through the 7200, with minimal disruption. A solution was found in bridging….

We configured a bridge domain on the 7200 and put two new sub-interfaces into the bridge-domain – one for the LAN interface and one for the WAN interface.

The gateway for this subnet was previously a layer 3 sub-interface on Gi0/0 (standard router-on-a-stick setup). This was changed to a BVI.

In a similar fashion a sub-interface was setup on the connecting interface on PE2. This was added to the other end of the xconnect.

What we ultimately ended up with was something that looked like this:

blog1_image2_solution

Once this was setup we could see MAC learning and L2 connectivity across the core.

 

The work

The below GNS3 topology was put together to test and demonstrate the solution before putting it into practice. This can be downloaded from the GNS3 page.

GNS3_bridging_mpls_and_xconnects_Lab_5

LDP is running between the service provider routers and loopbacks are distributed via IS-IS. IPv4 and VPNv4 relationships exist between the PEs. This config is not shown but is available on the lab download.

Host 4 represents the new incoming connection to VLAN 6. Host 2 represents a Site B device on VLAN 6. The other hosts are simply representative of other devices on other VLANs for the sake of variation.

If we look at the configuration of CE1 we can see the config behind a the basic bridging setup:

hostname CE1
!
!enable irb and bridging
bridge irb
bridge 1 protocol ieee
bridge 1 route ip
!

!Configure the WAN sub-interface beneath the main interface, 
!assign it to the bridge domain and set the encapsulation to 
!vlan 6
interface FastEthernet0/0
 description link to PE2
 ip address 10.1.1.1 255.255.255.252
 duplex full
!
interface FastEthernet0/0.6
 description Bridged link to PE2
 encapsulation dot1Q 6
 !Technically the WAN interface need not have the same 
 !encapsulation as the LAN interface. But the sub-interface on 
 ! the PE must have the same encapsulation as this WAN interface.
 bridge-group 1
!
!The key here is that VLAN 6's sub-interfaces is added to the 
! bridge group using the bridge-group command
interface FastEthernet1/0.5
 description VLAN 5 GATEWAY
 encapsulation dot1Q 5
 ip address 172.16.1.1 255.255.255.0
!
interface FastEthernet1/0.6
 description L2 INTERFACE FOR VLAN 6
 encapsulation dot1Q 6
 bridge-group 1
!
!Configure the bridged virtual interfaces that will act as the 
!gateway for VLAN 6.
interface BVI1
 ip address 192.168.1.1 255.255.255.0
!
!Very basic configuration of an IPv4 eBGP neighborship with the 
!PE for the purposes of making the LAB go.
router bgp 100
 bgp log-neighbor-changes
 neighbor 10.1.1.2 remote-as 500
 !
 address-family ipv4
  no synchronization
  redistribute connected
  neighbor 10.1.1.2 activate
  no auto-summary
 exit-address-family

Then, turning to PE2, we can see that the sub-interface for vlan 6 is pushed into an xconnect.

hostname PE2
!
!Psueduowire class used to set the encapsulation for the xconnect
pseudowire-class CLASS_ONE
 encapsulation mpls
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
!
!A standard /30 IP address is configured on the main interfaces. 
!The sub-interface, however, listens for a VLAN 6 tag and pushes 
!traffic into an xconnect.
interface FastEthernet1/0
 description link to Site B
 ip address 10.1.1.2 255.255.255.252
 duplex full
 speed 100
!
interface FastEthernet1/0.2
 description VLAN 6 link to CE1
 encapsulation dot1Q 6
 xconnect 1.1.1.1 100 pw-class CLASS_ONE

Likewise on the PE1 side the configuration of the xconnect is very similar:

hostname PE1
!
pseudowire-class CLASS_ONE
 encapsulation mpls
!
interface FastEthernet1/1.6
 description VLAN 6 link to S1
 encapsulation dot1Q 6
 xconnect 2.2.2.2 100 pw-class CLASS_ONE
!

We can verify the successful connection of the xconnect using the show xconnect peer <ip> vcid <id> command.

PE1#sh xconnect peer 2.2.2.2 vcid 100
Legend:    XC ST=Xconnect State  S1=Segment1 State  
S2=Segment2 State  UP=Up DN=Down  AD=Admin Down      
IA=Inactive  SB=Standby  RV=Recovering  NH=No Hardware

XC ST  Segment 1                   S1 Segment 2          S2
------+---------------------------+--+-------------------+--
UP     ac   Fa1/1.6:6(Eth VLAN)    UP mpls 2.2.2.2:100    UP
PE1#

Additionally we can see MAC learning on the bridge group of router CE1 (c208.0d06.0000 is the MAC address for Host 4):

CE1#show bridge 1

Total of 300 station blocks, 298 free
Codes: P - permanent, S - self

Bridge Group 1:

    Address       Action   Interface       Age   RX count   TX count
c206.0d04.0000   forward   Fa1/0.6           0          5          4
c208.0d06.0000   forward   Fa0/0.6           0          5          5
CE1#

And finally we can see that we are able to run a ping from Host4 to Host2:

Host4#ping 192.168.1.50

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.50, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), 
round-trip min/avg/max = 128/167/224 ms
Host4#