Multihoming without a PE-to-CE Dynamic Routing Protocol

This quirk looks at how a multihomed site without a CE-to-PE routing protocol, like eBGP, can run into failover problems when using a first hop redundancy protocol.

The setup is as follows:

blog5_image1_base_setup

The CE routers in this case are Cisco 887 routers. The WAN connections are ADSL lines. From the CE routers, PPP sessions connect to the provider LNS/BNGs routers (PE1 and PE2). These PPP sessions run over L2TP tunnels between the LAC and LNS. RADIUS is used by the LNS routers to authenticate the PPP sessions and to obtain IP and routing attributes.

CE1 and CE2 are running HSRP. CE1 is Active. The CE LAN interfaces are switchports and the IP/HSRP configurations are on SVIs for the access VLAN. Both CEs have a static default route pointing to the dialer interface for their respective WAN connections. CE1 tracks its dialer interface so that it can lower its HSRP priority if the WAN connection fails (allowing CE2 to take over).

Outbound traffic is routed via the HSRP Active router.

Inbound traffic works as follows:

When an LNS router authenticates a PPP session, it will send an Auth-Request to the RADIUS server. The RADIUS server, when sending its Access-Accept to confirm the user is valid, will also return RADIUS attributes that the LNS server parses and applies to its configuration. For example, the attributes can indicate what IP to assign to the user – a Framed-IP that will show on the dialer interface of the CE. Cisco’s Framed-Route AVP (Attribute Value Pair) can also be used to include static routes.

In this scenario Framed-IP and Framed-Route RADIUS attributes (among others not detailed here) are returned, which gives a WAN IP to the CE and installs a static route onto the LNS router. Each PPP session has one or more LAN ranges associated with it. The static route points traffic for these LAN ranges to the Framed-IP assigned for the PPP session.

The site in this scenario has a /28 network assigned to it. The primary PPP session from CE1 receives two static routes – one for each of the two /29s that the /28 is made up of. The secondary PPP session from CE2 receives a single /28 static route.

These static routes are redistributed into the iBGP running in the service provider network. In the event that a PPP session drops, the associated static routes will be removed from the LNS routers.

Under normal circumstances, incoming traffic will follow either of the two more specific /29s down the primary WAN connection.

There are other ways to prefer one WAN connection over another (using BGP attributes when redistirbuting or similar) but I’ve used this subnet splitting apporach for simplicity.

In the event that the primary WAN connection fails, the following occurs:

For outbound traffic: CE1 lowers its HSRP priority allowing CE2 to take over. Outgoing traffic now goes via CE2.

For inbound traffic: The PPP session on PE1 will drop and both of the static routes will be removed. This leaves the /28 down the secondary WAN connection for traffic to be forwarded down.

blog5_image2_wan_failover

But what happens if the FastEthernet0 LAN interface on CE1 fails?

HSRP will fail over, meaning outbound traffic will leave the site via the secondary WAN connection as expected.

However because the PPP session does not drop, the two /29 static routes to CE1 remain in place. Return traffic will traverse this WAN link and end up at CE1. CE1 has no route to the destination and will send it back over its default. Traffic will then loop until the TTL decrements to zero. The site has lost connectivity.

blog5_image3_lan_failover_problem

A reconfiguration is needed in order to allow for this situation, which is sometimes called “LAN-side failover”.

The Search

The first and most obvious question might be, why not run a routing protocol, like eBGP, between the PEs and CEs? The PE router would learn about the LAN range over this protocol rather than having static routes. The CEs would use redistribute connected and in the event that the LAN failed, this advertisement would cease.

There are a couple reasons why you might not want to run a dynamic PE-to-CE routing protocol. Firstly, there could be a lot of incoming subscriber sessions on the LNS routers. The overhead involved in running so many eBGP sessions might be too much compared to simply using RADIUS Attributes. Secondly, not all CPEs can support BGP, or whatever PE-to-CE protocol you want to run. Granted, an 887 can, but not all devices have this capability.

So with that said, let’s look at some options for how to deal with this issue…

There are several options to resolve this quirk. I’ll explore two of them here, each of which takes a different approach.

The first option is to ensure that in the event that the LAN interface goes down, the CE router automatically brings down the WAN connection.

Depending on the CPE used, there can be multiple ways to do this. In the case of a Cisco 887, a good way to do this is with EEM scripting. The EEM script can be made to trigger based on a tracking object for the LAN interface. You will also need to make sure that a second EEM script is configured to bring the WAN link back up if the LAN link is restored. I will show an example of such a script below.

An alternative approach is to ensure that there is a direct link between the Active and Standby routers in addition to the regular LAN link. Both LAN connections into each CE router would be in the same VLAN, allowing connection to the SVI. This would mean that if Fa0 dropped, HSRP would not fail over. Traffic leaving the site would still go via CE1, but it would pass through CE2 first and use the direct link between them.

blog5_image4_lan_ce_to_ce_link

As a side note, it is worth mentioning that one might mistakenly think that CE2, upon receiving outbound traffic, would forward it directly out of its WAN interface in accordance with its default route (causing asymmetric routing when the traffic returns via CE1). But this doesn’t happen. What needs to be remembered is that the routers interfaces are switchports and the destination MAC address will still be 0000.0c07.acxx (where xx is the HSRP group number). CE1 still holds this MAC meaning CE2 will pass it onwards through its switchport rather than routing the traffic.

In my experience this option is preferable. A single cable run and access port configuration is all that is needed. EEM Scripts can be unreliable at times and might not trigger when they should. Having said that, if this needs to be done on the CPE after deployment and remote hands are not possible, the EEM script might be the best approach.

The Work

The general HSRP setup could be as follows:

hostname CE1
!
interface Vlan10
 description SVI for LAN
 ip address 123.123.123.2 255.255.255.240
 standby 10 ip 123.123.123.1
 standby 10 priority 200
 standby 10 preempt
 standby 10 track 1 decrement 150
!
track 1 interface Dialer0 ip routing
!

The EEM script described above will need to trigger when Fa0 goes down. For that, the following tracker is used:

track 2 interface FastEthernet0 line-protocol

This EEM script will shut down the WAN connection if the tracker goes down and restore it if the tracker comes back up:

event manager applet LAN_FAILOVER_DOWN
 event track 2 state down
 action 1.0 syslog msg "Fa0 down. Shutting down controller interface"
 action 2.0 cli command "enable"
 action 3.0 cli command "configure terminal"
 action 4.0 cli command "controller vdsl 0"
 action 5.0 cli command "shutdown"
 action 6.0 cli command "end"
 action 7.0 syslog msg "Controller interface shutdown complete"
!
event manager applet LAN_FAILOVER_UP
 event track 2 state up
 action 1.0 syslog msg "Fa0 up. Enabling controller interface."
 action 2.0 cli command "enable"
 action 3.0 cli command "configure terminal"
 action 4.0 cli command "controller vdsl 0"
 action 5.0 cli command "no shutdown"
 action 6.0 cli command "end"
 action 7.0 syslog msg "Controller interface enabled."

When Fa0 goes drops, the syslog entries look this this:

Feb 27 14:42:18 GMT: %LINEPROTO-5-UPDOWN: Line protocol on Interface 
FastEthernet0, changed state to down
Feb 27 14:42:19 GMT: %TRACKING-5-STATE: 2 interface Fa0 line-protocol 
Up->Down
Feb 27 14:42:19 GMT: %HA_EM-6-LOG: LAN_FAILOVER_DOWN: Fa0 down. S
hutting down controller interface
Feb 27 14:42:19 GMT: %CONTROLLER-5-UPDOWN: Controller VDSL 0, 
changed state to administratively down
Feb 27 14:42:19 GMT: %SYS-5-CONFIG_I: Configured from console by on 
vty1 (EEM:LAN_FAILOVER_DOWN)
Feb 27 14:42:19 GMT: %HA_EM-6-LOG: LAN_FAILOVER_DOWN: Controller 
interface shutdown complete

And when it is restored…

Feb 27 14:43:53 GMT: %LINK-3-UPDOWN: Interface FastEthernet0, changed 
state to up
Feb 27 14:43:53 GMT: %HA_EM-6-LOG: LAN_FAILOVER_UP: Fa0 up. Enabling 
controller interface.
Feb 27 14:43:54 GMT: %SYS-5-CONFIG_I: Configured from console by on 
vty1 (EEM:LAN_FAILOVER_UP)
Feb 27 14:43:54 GMT: %HA_EM-6-LOG: LAN_FAILOVER_UP: Controller 
interface enabled.
Feb 27 14:44:54 GMT: %CONTROLLER-5-UPDOWN: Controller VDSL 0, 
changed state to up

The second option is simpler and does not require much configuration at all. All we’d need to do is run a cable from Fa1 on CE1 to Fa1 on CE2 and put the following configuration under Fa1:

interface fa1
 description link to other CE for LAN failover
 switchport
 switchport mode access
 switchport access vlan 10

There isn’t much else to show for this solution other than to re-iterate that with this in place, HSRP would not fail over and traffic in both direction would flow via CE2s switchports.

There are other ways to tackle this problem that I have not detailed here (using etherchannel on the LAN perhaps, or something involving floating static routes) and any alternatives ideas would be good to hear about and interesting to discuss. Thanks for reading.

 

MPLS Management misconfiguration

There are many different ways for ISPs to manage MPLS devices like routers and firewalls that are deployed to customer sites. This quirk explores one such solution and looks at a scenario where a misconfiguration results in VRF route leaking between customers.

The quirk

When an ISP deploys Customer Edge (CE) devices to customers sites they might, and often do, want to maintain management. For customers with a simple public internet connection this is usually straight forward – the device is reachable over the internet and  an ACL or similar policy will be configured, allowing access from only a list of approved ISP IP addresses (for extra security VPNs could be used).

However when Peer-to-Peer L3VPN MPLS is used, it is more complicated. The customer network is not directly accessible from the internet without going through some kind of a breakout site. The ISP will either need a link into their customers MPLS network or must configure access through the breakout. This can become complicated as the number of customers, and the number of sites per customer, increases.

One option, presented in this quirk, is to have all MPLS customers PE-CE WAN subnets come from a common supernet range. These WAN subnets can then be exported into a common management VRF using a specific RT. The network that will be used to demonstrate this looks as follows:

blog4_image1_base_setup

This is available for download as a GNS3 lab from here. It includes the solution to the quirk as detailed below.

The ISPs ASN is 500. The two customer have ASNs 100 and 200 (depending on the setup these would typically be private ASNs, but they have been shown here as 100 and 200 for simplicity). A management router (MGMT) in ASN 64512 has access to the PE-CE WAN ranges for all of the customers, all of which come from the supernet 172.30.0.0/16. A special subnet within this range, 172.30.254.0/24, is reserved for the Management network itself. The MGMT router, or MPLS jump box as it may also be called, is connected to this range – as would any other devices requiring access to the MPLS customers devices (backup or monitoring systems for instance… not shown).

The basic idea is that each customer VRF exports their PE-CE WAN ranges with an RT of 500:501. The MGMT VRF then imports this RT.

Along side this, the MGMT VRF will exports its own routes (from the 172.30.254.0/24 supernet) with an RT of 500:500. All of the customer VRFs import 500:500.

This has two key features:

  • Customer WAN ranges will all be from the 172.30.0.0/16 and must not overlap between customers.
  • WAN ranges and site subnets are not, at any point, leaked between customer VRFs.

To get a better idea of how it works, take a look at the following diagram:

blog4_image2_mpls_mgmt_concept

The CLI for each customer VRF setup looks as follows:

ip vrf CUST_1
 description Customer_1_VRF
 rd 500:1
 vpn id 500:1
 export map VRF_EXPORT_MAP
 route-target export 500:1
 route-target import 500:1
 route-target import 500:500
!
route-map VRF_EXPORT_MAP permit 10
 match ip address prefix-list VRF_WANS_EXCEPT_MGMT
 set extcommunity rt 500:501 additive
route-map VRF_EXPORT_MAP permit 20
!
ip prefix-list VRF_WANS_EXCEPT_MGMT seq 10 deny 172.30.254.0/24 le 32
ip prefix-list VRF_WANS_EXCEPT_MGMT seq 20 permit 172.30.0.0/16 le 32

Note that the export map used on customer VRFs makes a point to exclude the routes that the Management supernet (172.30.254.0/24). This is done on the off chance that the range exists within the customers VRF table.

The VRF for the Management network is configured as follows (note this is only configured on CE3 in the above lab):

ip vrf MGMT_VRF
 description VRF for Management of Customer CEs
 rd 500:500
 vpn id 500:500
 route-target export 500:500
 route-target import 500:500
 route-target import 500:501

This results in the WAN ranges for customers being tagged with the 500:501 RT but not the LAN ranges.

PE1#sh bgp vpnv4 unicast vrf CUST_1 172.30.1.0/30
BGP routing table entry for 500:1:172.30.1.0/30, version 9
Paths: (1 available, best #1, table CUST_1)
  Advertised to update-groups:
    1         3

  Local
    0.0.0.0 from 0.0.0.0 (1.1.1.1)
      Origin incomplete, metric 0, localpref 100, weight 32768, valid, 
       sourced, best
      Extended Community: RT:500:1 RT:500:501
      mpls labels in/out 23/aggregate(CUST_1)

PE1#sh bgp vpnv4 unicast vrf CUST_1 192.168.50.0/24
BGP routing table entry for 500:1:192.168.50.0/24, version 3
Paths: (1 available, best #1, table CUST_1)
  Advertised to update-groups:
    3

  100
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1
      mpls labels in/out 24/nolabel
PE1#

192.168.50.0/24, above, is a one of the LAN ranges and does not have the 500:501 RT.

Every VRF can see the management network and the management network can see all the PE-CE WAN ranges for every customer:

PE1#sh ip route vrf CUST_2

Routing Table: CUST_2
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1
       L2 - IS-IS level-2, ia - IS-IS inter area, * - candidate default
       U - per-user static route, o - ODR
       P - periodic downloaded static route

Gateway of last resort is not set

B       192.168.60.0/24 [20/0] via 172.30.1.10, 01:32:17
        172.30.0.0/30 is subnetted, 3 subnets
B         172.30.254.0 [200/0] via 3.3.3.3, 01:32:09
B         172.30.1.4 [200/0] via 2.2.2.2, 01:32:09
C         172.30.1.8 is directly connected, FastEthernet1/0
B       192.168.50.0/24 [200/0] via 2.2.2.2, 01:32:09

PE1#
PE3#sh ip route vrf MGMT_VRF

Routing Table: MGMT_VRF
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1
       L2 - IS-IS level-2, ia - IS-IS inter area, * - candidate default
       U - per-user static route, o - ODR
       P - periodic downloaded static route

Gateway of last resort is not set

        172.30.0.0/30 is subnetted, 4 subnets
C         172.30.254.0 is directly connected, FastEthernet0/0
B         172.30.1.0 [200/0] via 1.1.1.1, 01:32:24
B         172.30.1.4 [200/0] via 2.2.2.2, 01:32:24
B         172.30.1.8 [200/0] via 1.1.1.1, 01:32:24

PE3#

Also, note that the routing table for Customer 2 (vrf CUST_2) cannot see the 172.30.1.0/30 WAN range for Customer 1 (vrf CUST_1).

Given the proper config, the MGMT router can access the WAN ranges for customers:

MGMT#telnet 172.30.1.2
Trying 172.30.1.2 ... Open

User Access Verification
Password:
CE1-1>

NB. I’m not advocating using telnet in such an environment. Use SSH as a minimum when you can.

The quirk comes in when a simple misconfiguration introduces route leaking between customer VRFs.

Consider an engineer accidentally configuring a VRF that exports all its vpnv4 prefixes with RT 500:500 (rather than only exporting its PE-CE WAN routes with RT500:501 as described above). The mistake is easy enough to make and will cause routes from the newly configured VRF to be imported by all other customer VRFs. This will have a severe impact for any customers with the same route within their VRF.

To demonstrate this, imagine that the CUST_1 VRF is not yet configured. Pinging from site Customer 2 Site 2 (CE2-2 on the lower left side of the diagram) with a source of 192.168.60.1 to Customer 2 Site 1 (CE1-2) with a destination of 192.168.50.1 works fine

CE2-2#trace 192.168.50.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.50.1
 1 172.30.1.9 12 msec 24 msec 24 msec
 2 10.10.14.4 [AS 500] [MPLS: Labels 16/24 Exp 0] 92 msec 64 msec 44 msec
 3 172.30.1.5 [AS 500] [MPLS: Label 24 Exp 0] 48 msec 68 msec 52 msec
 4 172.30.1.6 [AS 500] 116 msec 88 msec 104 msec

CE2-2#

If the CUST_1 VRF is now setup with the aforementioned misconfiguration, route leaking between CUST_1 and CUST_2 will result:

PE1(config)#ip vrf CUST_1
PE1(config-vrf)# description Customer_1_VRF
PE1(config-vrf)# rd 500:1
PE1(config-vrf)# vpn id 500:1
PE1(config-vrf)# route-target export 500:1
PE1(config-vrf)# route-target import 500:1
PE1(config-vrf)# route-target export 500:500
PE1(config-vrf)#
PE1(config-vrf)# interface FastEthernet0/1
PE1(config-if)# description Link to CE 1 for Customer 1
PE1(config-if)# ip vrf forwarding CUST_1
PE1(config-if)# ip address 172.30.1.1 255.255.255.252
PE1(config-if)# duplex auto
PE1(config-if)# speed auto
PE1(config-if)# no shut
PE1(config-if)#exit
PE1(config)#router bgp 500
PE1(config-router)# address-family ipv4 vrf CUST_1
PE1(config-router-af)# redistribute connected
PE1(config-router-af)# redistribute static
PE1(config-router-af)# neighbor 172.30.1.2 remote-as 100
PE1(config-router-af)# neighbor 172.30.1.2 description Customer 1 Site 1
PE1(config-router-af)# neighbor 172.30.1.2 activate
PE1(config-router-af)# neighbor 172.30.1.2 default-originate
PE1(config-router-af)# neighbor 172.30.1.2 as-override
PE1(config-router-af)# neighbor 172.30.1.2 route-map CUST_1_SITE_1_IN in
PE1(config-router-af)# no synchronization
PE1(config-router-af)# exit-address-family
PE1(config-router)#

VRF CUST_1 will export its routes (including 192.168.50.0/24 from Customer 1 Site 1 – CE1-1) and the VRF CUST_2 will import these routes due to the RT of 500:500.

Looking at the BGP and routing table for the CUST_2 VRF shows that the next hop for 192.68.50.0/24 is now the CE1-1 router.

PE1#sh ip route vrf CUST_2 192.168.50.0
Routing entry for 192.168.50.0/24
  Known via "bgp 500", distance 20, metric 0
  Tag 100, type external
  Last update from 172.30.1.2 00:02:45 ago
  Routing Descriptor Blocks:
  * 172.30.1.2 (CUST_1), from 172.30.1.2, 00:02:45 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 100

PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 21
Paths: (2 available, best #1, table CUST_2)
  Advertised to update-groups:
    2

  100, imported path from 500:1:192.168.50.0/24
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1 RT:500:500

  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24

PE1#

There are now two possible paths to reach 192.168.50.0/24. One imported from the VRF for CUST_1 and one from its own (coming from CE1-2). The path via AS 100 is being preferred due to the lower IGP metric. Note the 500:500 RT in this path.

Once this is done CE2-2 cannot reach its 192.168.50/24 subnet on CE1-2.

CE2-2#trace 192.168.50.1 source lo1
Type escape sequence to abort.

Tracing the route to 192.168.50.1
1 172.30.1.9 8 msec 12 msec 12 msec
2 * * *
3 * * *
4 * * *
...output omitted for brevity

Granted, this issue is caused by a mistake, but the difference between the correct and incorrect commands is minimal. An engineer under pressure or working quickly could potentially disrupt a massive MPLS infrastructure resulting in outages for multiple customers.

The search

As mentioned at the beginning of this blog, there are multiple ways to manage an MPLS network.

One possibility is to have a single router that, rather than import and export WAN routes based on RTs, has a single loopback address in each VRF. It is from this loopback that the router will source SSH or telnet sessions to the customer CE devices. For example:

interface loopback 1
 description Loopback source for Customer 1
 ip vrf forwarding CUST_1
 ip address 100.100.100.100 255.255.255.255
!
interface loopback 2
 description Loopback source for Customer 2
 ip vrf forwarding CUST_2
 ip address 100.100.100.100 255.255.255.255

MGMT# telnet 172.30.1.2 /vrf CUST_1

This has a number of advantages:

  • This router acts as a single jump host (rather than a subnet), which could be considered more secure
  • There is no restriction on the WAN addresses for each customer. They can be any WAN range at all and can overlap between customers.
  • The same IP address can be used for each VRFs loopback (as long as it doesn’t clash with any existing IPs already in the customers VRF).

However there are a number of disadvantages:

  • Each VRF must be configured on this jump router
  • This jump router is a single point of failure
  • The command to log on is more complex and requires the users to know the VRFs exact name rather than just the router IP.
  • Migrating to this solution, from the aforementioned RT import/export solution, would be a cumbersome and long process.
  • Centralised MPLS backups could be complicated if there is a not a common subnet (like 172.30.254.0/24) reachable by all CE devices.

For these reasons it was decided not to use this solution. Rather, it was decided to use import filtering, to prevent this issue from taking place even if the misconfiguration occurred. The import filtering uses a route-map that makes the followed sequential check:

    1. If a route has the RT 500:500 and is from the management range (172.30.254.0/24) allow it.
    2. If any other route has the RT 500:500, deny it.
    3. Allow the import of all other routes.

Essentially, rather than just importing 500:500, this route-map checks to make sure that a vpnv4 prefix comes from the management range of 172.30.254.0/24. The biggest issue in this scenario was the deployment of this route-map to all VRFs on all PEs. But with a little bit of scripting (I won’t go into the details here), this was far more plausible than the option of deploying a multi-VRF jump router.

The work

The route map described in the above section looks as follows:

ip extcommunity-list standard VRF_MGMT_COMMUNITY permit rt 500:500
ip prefix-list VRF_MGMT_LAN seq 5 permit 172.30.254.0/24 le 32
!
route-map VRF_IMPORT_MAP permit 10
 match ip address prefix-list VRF_MGMT_LAN
 match extcommunity VRF_MGMT_COMMUNITY
!
route-map VRF_IMPORT_MAP deny 20
 match extcommunity VRF_MGMT_COMMUNITY
!
route-map VRF_IMPORT_MAP permit 30

NB. This is a good example of and/or operation in a route map. If the types differ (in this case a prefix list and an extcommunity list) the operation is treated as a conjunction (AND) operation. If the types are the same it is a disjunction (OR) operation.

This will prevent the issue from occurring as it will stop the import of any vpnv4 prefix that has an RT of 500:500 unless it is from the management range.

Here is the configuration of this import map on PE1 (the other PEs are not shown but it should be configured on them too):

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)# ip extcommunity-list standard VRF_MGMT_COMMUNITY permit 
rt 500:500
PE1(config)#ip prefix-list VRF_MGMT_LAN seq 5 permit 172.30.254.0/24 
le 32
PE1(config)#!
PE1(config)#route-map VRF_IMPORT_MAP permit 10
PE1(config-route-map)# match ip address prefix-list VRF_MGMT_LAN
PE1(config-route-map)# match extcommunity VRF_MGMT_COMMUNITY
PE1(config-route-map)#!
PE1(config-route-map)#route-map VRF_IMPORT_MAP deny 20
PE1(config-route-map)# match extcommunity VRF_MGMT_COMMUNITY
PE1(config-route-map)#!
PE1(config-route-map)#route-map VRF_IMPORT_MAP permit 30
PE1(config-route-map)#
PE1(config-route-map)#ip vrf CUST_2
PE1(config-vrf)#import map VRF_IMPORT_MAP

After this addition, in the event that the misconfiguration takes place when creating the CUST_1 VRF, the import map will block the 192.168.50.0/24 subnet. The only path that the CUST_2 VRF has to 192.168.50.0/24 is from CE1-2, which is correct. Here is the configuration and resulting verification:

PE1(config)#ip vrf CUST_1
PE1(config-vrf)# description Customer_1_VRF
PE1(config-vrf)# rd 500:1
PE1(config-vrf)# vpn id 500:1
PE1(config-vrf)# route-target export 500:1
PE1(config-vrf)# route-target import 500:1
PE1(config-vrf)# route-target export 500:500
PE1#sh ip route vrf CUST_2 192.168.50.0
Routing entry for 192.168.50.0/24
  Known via "bgp 500", distance 200, metric 0
  Tag 200, type internal
  Last update from 2.2.2.2 00:22:12 ago
  Routing Descriptor Blocks:
  * 2.2.2.2 (Default-IP-Routing-Table), from 5.5.5.5, 00:22:12 ago
    Route metric is 0, traffic share count is 1
    AS Hops 1
    Route tag 200

PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 12
Paths: (1 available, best #1, table CUST_2)
Advertised to update-groups:
    2
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#
CE2-2#trace 192.168.50.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.50.1

 1 172.30.1.9 12 msec 24 msec 8 msec
 2 10.10.14.4 [AS 500] [MPLS: Labels 18/24 Exp 0] 60 msec 68 msec 64 msec
 3 172.30.1.5 [AS 500] [MPLS: Label 24 Exp 0] 52 msec 68 msec 44 msec
 4 172.30.1.6 [AS 500] 84 msec 56 msec 56 msec

CE2-2#

Management of the correct WAN device is still working as well…

MGMT#telnet 172.30.1.10
Trying 172.30.1.10 ... Open

User Access Verification

Password:
CE2-2>

Just for good measure, and to double check that our route-map is making a difference, let’s see what happens if we remove the import map from the CUST_2 VRF.

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)#ip vrf CUST_2
PE1(config-vrf)#no import map VRF_IMPORT_MAP
PE1(config-vrf)#^Z
PE1#
*Mar 1 00:27:45.259: %SYS-5-CONFIG_I: Configured from console by console
PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 22
Paths: (2 available, best #1, table CUST_2)
Flag: 0x820
  Advertised to update-groups:
    2
  100, imported path from 500:1:192.168.50.0/24
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1 RT:500:500
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#

The offending route is imported into the CUST_2 VRF pretty quickly, proving that our route-map works. If the route map is put back in place, and we wait for the BGP Scanner to run (after 30 seconds or less) the vpnv4 prefix is blocked again:

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)#ip vrf CUST_2
PE1(config-vrf)#import map VRF_IMPORT_MAP
PE1(config-vrf)#^Z
PE1#
*Mar 1 00:29:51.443: %SYS-5-CONFIG_I: Configured from console by console
PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 24
Paths: (1 available, best #1, table CUST_2)
Flag: 0x820
  Advertised to update-groups:
    2
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#

This quirk shows just one way to successfully configure MPLS management and protect against misconfiguration. Give me a shout if anything was unclear or if you have any thoughts. As mentioned earlier, the GNS3 lab is available for download so have a tinker and see what you think.

Site by Site MPLS Breakout Migration

This months quirk is a bit late. I have been studying furiously and managed to pass my Deploying Cisco Service Provider Advanced Network Routing exam last week. Only two to go before I get CCNP SP. 🙂

Another plus side is that I have a tonne of study notes that I will be uploading over the next few weeks. So anyone interested in Multicast, BGP or IPv6 watch this space.

Anyways, this quirk looks at a design solution whereby a 100+ site MPLS customer needed to change the Service Provider for their primary internet breakout one site at a time…

 

The quirk     

The customer had an L3VPN MPLS cloud with a new ISP, but still had their primary internet breakout with their old ISP.

The below diagram shows a stripped down version of such a network, illustrating the basic idea:

blog3_image1_base_setup

So whilst all of the MPLS sites connected to the new ISPs core, the link to the internet was still going out through a site that connected to the old provider.

The customer needed to move the default route and primary breakout over but did not want to do a single “big bang” migration and move all of the sites at once. Rather, they wanted to migrate one site at a time.

The search

The first step in looking at how to accomplish this was to break down the requirements. The following conditions needed to be met:

  • Each site must still be able to access all other sites and the file/application servers at the primary breakout site. These servers would be moved to the new ISP connection and breakout site 2 last of all.
  • As each site moves over to the new breakout, they only need PAT to gain access to the internet – no public services are run at the remote sites.
  • The PI space held by the customer, used for public facing services on the application servers, would be moved to the new provider once all site were migrated.
  • Sites must be able to be moved one at a time without affecting any other sites.
  • The majority of MPLS sites were single homed with a static default.

Looking at these requirements gave us a good idea of what we needed to achieve.

Policy based routing was considered first. Adjusting either the next hop or VRF using the source address. However this would require too much overhead in identifying the site that had been moved, either the by community value or source prefix, combined with setting the next hop or VRF to use.

Ultimately, the use of a second VRF with “all but default” route leaking was decided upon. This involved creating a second VRF with a default route pointing to the new ISP breakout. All routes except the defaults were to be leaked between these VRFs.

This meant that all we needed to migrate a site, was change the VRF to which the attachment circuit belonged.

It is worth highlight that had there been a significant number of multihomed sites implementing BGP, using policy based routing may have been preferred. This is because a large number of BGP neighborships would need to be reconfigured to the correct VRF.

The work

The below output has been taken from a simulation. The MPLS sites have been represented using loopbacks1-3 on PE_RTR.

First we will take a look at a traceroute to the internet (to IP 50.50.50.50) and the routing table for the original VRF before any changes were made: 

PE_RTR#sh ip route vrf CUST-A-OLD-ISP

Routing Table: CUST-A-OLD-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 1.1.1.1 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:15:34
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:15:34
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:15:34
C 192.168.1.0/24 is directly connected, Loopback1
C 192.168.2.0/24 is directly connected, Loopback2
C 192.168.3.0/24 is directly connected, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:15:34
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:15:34
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:19
B* 0.0.0.0/0 [200/5] via 1.1.1.1, 00:15:43
PE_RTR#

PE_RTR#trace vrf CUST-A-OLD-ISP 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 17/20 Exp 0] 116 msec 72 msec 48 msec
 2 10.10.10.1 [MPLS: Label 20 Exp 0] 24 msec 44 msec 24 msec
 3 10.10.10.2 20 msec 20 msec 36 msec
 4 192.168.50.1 28 msec 56 msec 24 msec
 5 100.100.100.1 116 msec 52 msec 72 msec
 6 100.111.111.1 64 msec 140 msec 60 msec
PE_RTR#

So the WAN range of the breakout in this simulation is 100.100.100.0/29. This is their PI space. Notice the range 192.168.101.0/24, which is the subnet that the file/application servers are on.

The VRF configuration on the PEs is straightforward.

ip vrf CUST-A-OLD-ISP
 description VRF for Old ISP Breakout
 rd 100:1
 route-target export 100:1
 route-target import 100:1

Before we created the new VRF, we needed a way to differentiate what can and cannot be leaked. For this we used filtering when exporting RTs. We designated the RT 100:100 for routes that should be leaked.

First we started by making a prefix list that catches the default route:

ip prefix-list defaultRoute seq 5 permit 0.0.0.0/0
ip prefix-list defaultRoute seq 50 deny 0.0.0.0/0 le 32

Then we specified a route-map that attached the RT 100:100 to prefixes that are not the default route

route-map ALL-EXCEPT-DEFAULT permit 10
 match ip address prefix-list defaultRoute
!
route-map ALL-EXCEPT-DEFAULT permit 20
 set extcommunity rt 100:100 additive

Note the use of the additive keyword so as not to overwrite any existing communities.

Once we had these setup, we created the new VRF and applied this route-map in the form of an export-map to set the correct RTs. We made sure to import 100:100 and then applied the same to original VRF.

ip vrf CUST-A-NEW-ISP
 description VRF for New ISP Breakout
 rd 100:2
 export map ALL-EXCEPT-DEFAULT
 route-target export 100:2
 route-target import 100:100
 route-target import 100:2
!
ip vrf CUST-A-OLD-ISP
 description VRF for Old ISP Breakout
 rd 100:1
 export map ALL-EXCEPT-DEFAULT
 route-target export 100:1
 route-target import 100:100
 route-target import 100:1

From here, after deploying this to all the relevant PEs and injecting a new default route, the migration from one VRF to another was fairly straight forward. Below shows an example using a simulated loopback (the principle would be the same for the incoming attachment circuit to a customer site):

PE_RTR(config)#interface Loopback1
PE_RTR(config-if)# ip vrf forwarding CUST-A-NEW-ISP
% Interface Loopback1 IP address 192.168.1.1 removed due to enabling 
VRF CUST-A-NEW-ISP
PE_RTR(config-if)# ip address 192.168.1.1 255.255.255.0

If we look at the routing table for this new vrf we see the following:

PE_RTR#sh ip route vrf CUST-A-NEW-ISP

Routing Table: CUST-A-NEW-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 2.2.2.2 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:16:16
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:16:16
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:16:16
C 192.168.1.0/24 is directly connected, Loopback1
B 192.168.2.0/24 is directly connected, 00:16:17, Loopback2
B 192.168.3.0/24 is directly connected, 00:16:23, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:16:16
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:16:18
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:20
B* 0.0.0.0/0 [200/0] via 2.2.2.2, 00:16:18
PE_RTR#

An interesting side note here is that even though Loopback2 and 3 are directly connected, they are shown as having been learned through BGP. This is the result of the import from the original VRF. Indeed upon closer inspection of one of the prefixes we see the 100:100 community:

PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-NEW-ISP 192.168.3.0/24
BGP routing table entry for 100:2:192.168.3.0/24, version 47
Paths: (1 available, best #1, table CUST-A-NEW-ISP)
 Not advertised to any peer
 Local, imported path from 100:1:192.168.3.0/24
 0.0.0.0 from 0.0.0.0 (3.3.3.3)
 Origin incomplete, metric 0, localpref 100, weight 32768, valid, 
external, best
 Extended Community: RT:100:1 RT:100:100
 mpls labels in/out nolabel/aggregate(CUST-A-OLD-ISP)

And looking at the default route we see no such community and a different next hop from the original table.

PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-NEW-ISP 0.0.0.0
BGP routing table entry for 100:2:0.0.0.0/0, version 40
Paths: (1 available, best #1, table CUST-A-NEW-ISP)
 Not advertised to any peer
 65489
 2.2.2.2 (metric 3) from 2.2.2.2 (2.2.2.2)
 Origin incomplete, metric 5, localpref 200, valid, internal, best
 Extended Community: RT:100:2
 mpls labels in/out nolabel/23

The old VRFs table still shows a route for the newly migrated site (although now learned via BGP) and the default route is still as it was originally:

PE_RTR#sh ip route vrf CUST-A-OLD-ISP

Routing Table: CUST-A-OLD-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 1.1.1.1 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:15:34
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:15:34
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:15:34
B 192.168.1.0/24 is directly connected, 00:15:36, Loopback1
C 192.168.2.0/24 is directly connected, Loopback2
C 192.168.3.0/24 is directly connected, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:15:34
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:15:34
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:19
B* 0.0.0.0/0 [200/5] via 1.1.1.1, 00:15:43
PE_RTR#
PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-OLD-ISP 0.0.0.0
BGP routing table entry for 100:1:0.0.0.0/0, version 15
Paths: (1 available, best #1, table CUST-A-OLD-ISP)
 Not advertised to any peer
 65489
 1.1.1.1 (metric 3) from 1.1.1.1 (1.1.1.1)
 Origin incomplete, metric 0, localpref 100, valid, internal, best
 Extended Community: RT:100:1
 mpls labels in/out nolabel/26

Finally, a traceroute test shows that the newly migrated site accesses the internet via a different site and can still access the application server subnet

PE_RTR#trace vrf CUST-A-NEW-ISP 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 16/20 Exp 0] 44 msec 40 msec 52 msec
 2 10.20.20.1 [MPLS: Label 20 Exp 0] 32 msec 36 msec 52 msec
 3 10.20.20.2 52 msec 40 msec 32 msec
 4 192.168.51.1 54 msec 39 msec 31 msec
 5 200.200.200.2 68 msec 60 msec 32 msec
 6 200.222.222.2 65 msec 143 msec 62 msec

PE_RTR#
PE_RTR#trace vrf CUST-A-NEW-ISP 192.168.101.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.101.1

 1 10.1.3.2 [MPLS: Labels 16/22 Exp 0] 56 msec 52 msec 44 msec
 2 10.10.10.1 [MPLS: Label 22 Exp 0] 36 msec 24 msec 24 msec
 3 10.10.10.2 40 msec 40 msec 36 msec
 4 192.168.50.1 26 msec 57 msec 23 msec
 5 192.168.101.1 32 msec 48 msec 36 msec
PE_RTR#

One final point to make is that advertising the PI space to both providers for backup purposes was a possibility. as-path prepend could have been used from breakout site 2 to make it less preferred. But complications come into play depending on how each provider advertises the PI space and whether they honour any adjustments that the customer makes. Should return traffic not follow the same path, stateful firewall sessions would also encounter also difficulty.

So a pretty straight forward solution in the end but interesting from the perspective of a migration standpoint. I am interest to hear thoughts on whether anyone would have taken a different approach. Perhaps we should have done policy based routing or maybe another solution? As usual thoughts are always welcome.

Asymmetric routing caused by unfiltered redistribution

This quirk demonstrates how the different administrative distances of BGP, combined with the Best Path Selection algorithm can cause asymmetric routing if redistribution isn’t done carefully.

As a reminder, each blog will follow 3 sections: The quirk, the search and the work. The quirk describes the problem, the search shows how a solution was reached and the work shows the technical and CLI aspects.

The quirk

The scenario we will be looking at is as follows:

blog2_image1_base_setup

The network consists of an MPLS core with multiple remote sites (only one is shown here). There is a dual homed breakout site, which passes through a firewall (performing security and address translation services as normal) and onwards to an internet facing WAN connection.

A default route is learned over eBGP from the Provider Edge router (PE4) connected to the internet facing Customer Edge router (CE4). This is redistributed into OSPF. The MPLS facing Customer Edge routers (CE1 and CE2) redistribute OSPF into BGP using the redistribute ospf 1 match internal external 2 command. The default and local 10.200.0.0/24 routes are advertised to the Provider Edge Routers (PE1 and PE2) and into the MPLS core. PE1 gives the routes received from CE1 a local preference of 200 making this WAN link preferred.

So that the breakout firewall has a path back to the MPLS sites, every MPLS sites range is advertised through eBGP into the MPLS core before being sent to CE1 and CE2 and redistributed into OSPF.

The quirk comes into play when you consider that, at this stage, no filtering of any kind is applied to the redistribution. Combine that with the order in which the BGP sessions of CE1 and CE2 establish and we quickly see problems with return traffic from the internet headed back to an MPLS site.

Consider the following sequence of events:

  1. CE2 establishes its eBGP neighborship to PE2 before CE1 establishes it session to PE1. CE2 learns about the MPLS LAN ranges from PE2. These eBGP learned routes have an AD of 20.
  1. CE2 redistributes these eBGP prefixes into the OSFP link state database (LSDB).
  1. CE1 receives the Type 5 LSAs and installs these prefixes into its RIB. These OSPF prefixes have an AD of 110.
  1. Without filtering, CE1 will redistribute these into BGP. BGP will give them a weight of 32,768 (because they are redistributed and thus locally sourced). Another, and sometimes overlooked, aspect is that these locally generated routes will be given an AD of 200.
  2. CE1 now establishes its neighborship to PE1 and receives the prefixes for the MPLS sites over eBGP (just as CE2 did). These eBGP prefixes are installed in the BGP RIB and have an AD of 20. They have a weight of 0 since they are learned from a neighbor.
  1. Now CE1 has to choose the best path back to any given MPLS site. One might think that the decision is easy, by comparing Administrative Distances. CE1 knows about the MPLS sites through eBGP and OSPF. eBGPs AD is 20. OSPFs AD is 110. Therefore eBGP should win right? Not quite. When a router receives paths to a given destination from multiple routing sources, it uses the Administrative Distance to judge the trustworthiness of the protocol – with the lowest one being most trusted. But, what needs to be considered here is that each routing protocol will put its best route forward to be considered… and in the case of BGP this could result in routes with different ADs. Let’s follow what happens:

OSPF has one only E2 route, which has an AD of 110. So OSPF puts this forward.

However the BGP Router process has two options to choose from. It runs through the BGP Best Path Selection Algorithm to decide (for a reminder of its steps take a look at this document).

It doesn’t get very far before a decision is made. In fact, it is on the first step! The route redistributed from OSPF has a weight of 32,768 whereas the one learned from its eBGP neighbor has a weight of 0. Higher weight wins, so BGP selects the prefix that was learned through redistribution and puts it forward. Remember this route has an AD of 200…

CE1 looks at its options and chooses the routing source with the lowest AD, which in this case is OSPF. As a result the OSPF route is installed in the IP RIB.

  1. CE1 does not even redistribute its eBGP learned prefixes into OSPF. Redistribution takes place from the IP RIB and there are no BGP routes in there.
  2. Because of this, the breakout firewall only sees routes for the MPLS sites from CE2 and sets CE2 as the next hop.

From here, we can see that traffic leaving a remote MPLS site destined for the internet, will go out via the primary CE1-PE1 link. However return traffic will go back via the CE2-PE2 link.

blog2_image2_traffic_path

Of course if CE1 establishes its BGP session first this is not an issue, however that is far from ideal. We needed to look at a way to either make sure CE1 brings up its BGP session first, prevent CE1 from learning routes from CE2, or prevent the redistribution back into BGP from OSPF.

 

The search

There are a number of ways to tackle this issue. Some better than others.

One possible approach would be to try to make sure that CE1 was always the first to bring its BGP peering up… or rather, to make sure that CE2 clears its BGP configuration if it detects CE1 bring its BGP neighborship up. The following EEM script, configured on CE2, was used to test this idea:

event manager applet lanprimarywan
 event track 123 state up
 action 1.0 syslog msg "START_EEM_SRIPT1: Soft clears BGP relationship 
 when Primary Routers WAN link comes up"
 action 2.0 cli command "event timer countdown time 60"
 action 3.0 cli command "enable"
 action 4.0 cli command "clear ip bgp 10.10.1.6"
 action 5.0 cli command "end"
 action 6.0 syslog msg "BGP clear by EEM”
!
ip route 10.10.1.1 255.255.255.255 10.200.0.252
!
track 123 ip sla 123
!
ip sla 123
 icmp-echo 10.10.1.1 source-ip 10.200.0.253
 frequency 10
ip sla schedule 123 life forever start-time now
!

In short, CE2 would track the PE1 WAN interface. A static route has been included to make sure that it tracks it by going through CE1 (rather than its WAN connection). If this tracking object came up, CE2 would clear its BGP session. There is a delay timer put into the script to allow a minute for CE1 to bring up its BGP session.

There is a major problem with this approach however. Just because the WAN link is up doesn’t mean the PE1-CE1 BGP neighborship is up. The neighborship could drop for some other reason, without the link failing. If this happened CE2 would never clear its BGP session.

Plus, even if the tracking worked as expected, it might be deemed too disruptive to hard clear a BGP session for such an important site. As we will see, there are better options available.

A second possible approach involves preventing CE1 from learning any OSPF routes from CE2. This can be accomplished using a distribute-list. A distribute-list sits between the Shortest Path First calculation and the IP routing table. It doesn’t stop prefixes from entering the LSDB or affect the best route OSPF chooses. But it will prevent routes moving from the LSDB to the IP Routing table. If a distribute-list is applied inbound and allows only the local LAN ranges and the default route, then the MPLS site prefixes will never enter CE1s IP RIB from OSPF. Since redistribution is performed from the IP RIB, they will never show up in the BGP table.

The configuration would look as follows:

router ospf 1
 redistribute bgp 65489 metric 10 subnets
 network 10.200.0.0 0.0.0.255 area 0
 distribute-list LOCALS_AND_DEFAULT in
!
ip prefix-list LOCALS_AND_DEFAULT seq 5 permit 0.0.0.0/0
ip prefix-list LOCALS_AND_DEFAULT seq 10 permit 10.200.0.0/24
ip prefix-list LOCALS_AND_DEFAULT seq 100 deny 0.0.0.0/0 le 32

This configuration works just fine but there is a third option that makes use of tagging and allows for a cleaner approach.

This third option is outlined in the work section below. It involves making use of tagging and filtering using route-maps.

When prefixes are advertised to CE2 over eBGP and redistributed into OSPF, we can tag the prefixes. We can then configure a route-map on CE1 that only allows prefixes that do not have this tag to be redistributed into BGP.

Let’s explore the configuration of how this would be achieved.

 

The work

For this scenario I have built a GNS3 lab that looks as follows (this is available for download from the GNS3 page):

gns3_mpls_breakout_bgp_and_ospf_lab_7

Three MPLS sites are represented by loopbacks on the router named LNS (representing an L2TP Network Server in name only. It is simply a 3725 running BGP and MPLS). The ranges for these MPLS site are 192.168.1-3.0/24. A loopback with IP 50.50.50.50/32 on the INTERNET router (the cloud image) is used to simulate a public IP.

Here is the base configuration for CE1 and CE2 as far as OSPF and BGP are concerned:

hostname CE1
!
router ospf 1
 router-id 11.11.11.11
 log-adjacency-changes
 redistribute bgp 65489 metric 5 subnets
 network 10.200.0.0 0.0.0.255 area 0
!
router bgp 65489
 bgp log-neighbor-changes
 neighbor 10.10.1.1 remote-as 100
!
 address-family ipv4
  redistribute connected
  redistribute static
  redistribute ospf 1 match internal external 2
  neighbor 10.10.1.1 activate
  neighbor 10.10.1.1 allowas-in
  neighbor 10.10.1.1 soft-reconfiguration inbound
  neighbor 10.10.1.1 route-map BLOCK_LOCALS_AND_DEFAULT in
  neighbor 10.10.1.1 route-map ALLOW_LOCALS_AND_DEFAULT out
  default-information originate
  no auto-summary
  no synchronization
 exit-address-family
!
ip prefix-list LOCALS_AND_DEFAULT seq 5 permit 0.0.0.0/0
ip prefix-list LOCALS_AND_DEFAULT seq 10 permit 10.200.0.0/24
ip prefix-list LOCALS_AND_DEFAULT seq 100 deny 0.0.0.0/0 le 32
!
route-map BLOCK_LOCALS_AND_DEFAULT deny 10
 match ip address prefix-list LOCALS_AND_DEFAULT
!
route-map BLOCK_LOCALS_AND_DEFAULT permit 20
!
route-map ALLOW_LOCALS_AND_DEFAULT permit 10
 match ip address prefix-list LOCALS_AND_DEFAULT
hostname CE2
!
router ospf 1
 router-id 22.22.22.22
 log-adjacency-changes
 redistribute bgp 65489 metric 10 subnets
 network 10.200.0.0 0.0.0.255 area 0
!
router bgp 65489
 bgp log-neighbor-changes
 neighbor 10.10.1.5 remote-as 100
!
 address-family ipv4
  redistribute connected
  redistribute static
  redistribute ospf 1 match internal external 2
  neighbor 10.10.1.5 activate
  neighbor 10.10.1.5 allowas-in
  neighbor 10.10.1.5 soft-reconfiguration inbound
  neighbor 10.10.1.5 route-map BLOCK_LOCALS_AND_DEFAULT in
  neighbor 10.10.1.5 route-map ALLOW_LOCALS_AND_DEFAULT out
  default-information originate
  no auto-summary
  no synchronization
 exit-address-family
!
ip prefix-list LOCALS_AND_DEFAULT seq 5 permit 0.0.0.0/0
ip prefix-list LOCALS_AND_DEFAULT seq 10 permit 10.200.0.0/24
ip prefix-list LOCALS_AND_DEFAULT seq 100 deny 0.0.0.0/0 le 32
!
route-map BLOCK_LOCALS_AND_DEFAULT deny 10
 match ip address prefix-list LOCALS_AND_DEFAULT
!
route-map BLOCK_LOCALS_AND_DEFAULT permit 20
!
route-map ALLOW_LOCALS_AND_DEFAULT permit 10
 match ip address prefix-list LOCALS_AND_DEFAULT
!

Note that CE1 has a lower cost when redistributing routes. This is to ensure the breakout firewall will prefer going via CE1 given the option.

Let’s clear the BGP neighborship of CE1 and see what routes it selects:

CE1#clear ip bgp *
CE1#
*Mar 1 00:06:52.471: %BGP-5-ADJCHANGE: neighbor 10.10.1.1 Down User reset
CE1#
*Mar 1 00:06:53.759: %BGP-5-ADJCHANGE: neighbor 10.10.1.1 Up
CE1#
CE1#sh ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 10.200.0.1 to network 0.0.0.0

 100.0.0.0/30 is subnetted, 1 subnets
O E2 100.100.100.0 [110/20] via 10.200.0.1, 00:05:46, FastEthernet0/0
 99.0.0.0/29 is subnetted, 1 subnets
O 99.99.99.0 [110/2] via 10.200.0.1, 00:05:46, FastEthernet0/0
 10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C 10.10.1.0/30 is directly connected, Serial0/0
B 10.10.1.4/30 [20/0] via 10.10.1.1, 00:00:21
C 10.200.0.0/24 is directly connected, FastEthernet0/0
O E2 192.168.1.0/24 [110/10] via 10.200.0.253, 00:00:24, FastEthernet0/0
O E2 192.168.2.0/24 [110/10] via 10.200.0.253, 00:00:24, FastEthernet0/0
O E2 192.168.3.0/24 [110/10] via 10.200.0.253, 00:00:24, FastEthernet0/0
O*E2 0.0.0.0/0 [110/5] via 10.200.0.1, 00:05:47, FastEthernet0/0
CE1#

So currently CE1 is preferring its E2 OSPF routes to reach the MPLS sites. When pinging from a remote MPLS site we see that it takes an outbound path across the PE1-CE1 link:

LNS#trace vrf CUST_A 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 18/21 Exp 0] 44 msec 48 msec 44 msec
 2 10.10.1.1 [MPLS: Label 21 Exp 0] 36 msec 32 msec 36 msec
 3 10.10.1.2 36 msec 32 msec 36 msec
 4 10.200.0.1 36 msec 84 msec 36 msec
 5 99.99.99.2 100 msec 96 msec 56 msec
 6 100.100.100.2 32 msec 80 msec 80 msec
LNS#

However the path back from the firewall traverses the CE2-PE2 link:

FW#trace 192.168.1.1

Type escape sequence to abort.
Tracing the route to 192.168.1.1

 1 10.200.0.253 32 msec 28 msec 8 msec
 2 10.10.1.5 36 msec 36 msec 36 msec
 3 10.1.2.2 [MPLS: Labels 16/20 Exp 0] 56 msec 40 msec 44 msec
 4 192.168.1.1 [MPLS: Label 20 Exp 0] 84 msec 60 msec 88 msec
FW#

The BGP table of CE1 helps to show what is happening:

CE1#sh bgp ipv4 unicast
BGP table version is 12, local router ID is 11.11.11.11
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal, r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

 Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 10.200.0.1 5 32768 ?
*> 10.10.1.0/30 0.0.0.0 0 32768 ?
* 10.10.1.1 0 0 100 ?
*> 10.10.1.4/30 10.10.1.1 0 100 ?
*> 10.200.0.0/24 0.0.0.0 0 32768 ?
*> 99.99.99.0/29 10.200.0.1 2 32768 ?
*> 100.100.100.0/30 10.200.0.1 20 32768 ?
* 192.168.1.0 10.10.1.1 0 100 ?
*> 10.200.0.253 10 32768 ?
* 192.168.2.0 10.10.1.1 0 100 ?
*> 10.200.0.253 10 32768 ?
* 192.168.3.0 10.10.1.1 0 100 ?
*> 10.200.0.253 10 32768 ?
CE1#

We can see that there are two paths to each MPLS site. The reason for its best path selection becomes clear when taking a closer look at the one of the prefixes:

CE1#sh bgp ipv4 unicast 192.168.1.0/24
BGP routing table entry for 192.168.1.0/24, version 4
Paths: (2 available, best #2, table Default-IP-Routing-Table)
 Not advertised to any peer
 100, (received & used)
 10.10.1.1 from 10.10.1.1 (1.1.1.1)
 Origin incomplete, localpref 100, valid, external
 Local
 10.200.0.253 from 0.0.0.0 (11.11.11.11)
 Origin incomplete, metric 10, localpref 100, weight 32768, valid, 
 sourced, best
CE1#

The path via 10.1.1.1 has a weight of 0 since it is learned from an eBGP neighbor (as the word external implies). The path via CE2 is locally sourced (as the word sourced and the Local AS path imply) and has a weight of 32,768. Because of this, the second path, which has AD 200, is chosen as the best path and ultimately loses out to OSPF.

Now let’s look at fixing this using route-maps and tagging. The first step is to configure CE2 to tag any eBGP routes that it redistributes into OSPF with tag 10.

CCE2#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CE2(config)#route-map SET_TAG permit 10
CE2(config-route-map)#set tag 10
CE2(config-route-map)#exit
CE2(config)#router ospf 1
CE2(config-router)#redistribute bgp 65489 metric 10 subnets route-map 
SET_TAG

The OSPF LSDB now reflects this change:

CE2#sh ip ospf database external 192.168.1.0

 OSPF Router with ID (22.22.22.22) (Process ID 1)

 Type-5 AS External Link States

 LS age: 57
 Options: (No TOS-capability, DC)
 LS Type: AS External Link
 Link State ID: 192.168.1.0 (External Network Number )
 Advertising Router: 22.22.22.22
 LS Seq Number: 8000000C
 Checksum: 0xCE04
 Length: 36
 Network Mask: /24
 Metric Type: 2 (Larger than any link state path)
 TOS: 0
 Metric: 10
 Forward Address: 0.0.0.0
 External Route Tag: 10

CE2#

The next task is to configure CE1 to block redistribution for anything that has tag 10:

CE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CE1(config)#route-map BLOCK_TAG deny 10
CE1(config-route-map)#match tag 10
CE1(config-route-map)#route-map BLOCK_TAG permit 20
CE1(config-route-map)#exit
CE1(config)#router bgp 65489
CE1(config-router)# redistribute ospf 1 match internal external 2 
route-map BLOCK_TAG

The effect is immediate. CE1 now prefers to the path over eBGP:

CE1#sh ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 10.200.0.1 to network 0.0.0.0

 100.0.0.0/30 is subnetted, 1 subnets
O E2 100.100.100.0 [110/20] via 10.200.0.1, 00:31:44, FastEthernet0/0
 99.0.0.0/29 is subnetted, 1 subnets
O 99.99.99.0 [110/2] via 10.200.0.1, 00:31:44, FastEthernet0/0
 10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C 10.10.1.0/30 is directly connected, Serial0/0
B 10.10.1.4/30 [20/0] via 10.10.1.1, 00:26:19
C 10.200.0.0/24 is directly connected, FastEthernet0/0
B 192.168.1.0/24 [20/0] via 10.10.1.1, 00:00:26
B 192.168.2.0/24 [20/0] via 10.10.1.1, 00:00:26
B 192.168.3.0/24 [20/0] via 10.10.1.1, 00:00:26
O*E2 0.0.0.0/0 [110/5] via 10.200.0.1, 00:31:45, FastEthernet0/0
CE1#

In addition to this, there is now only one route for the MPLS sites in the BGP RIB:

CE1#show bgp ipv4 unicast
BGP table version is 15, local router ID is 11.11.11.11
Status codes: s suppressed, d damped, h history, * valid, > best, i - 
internal, r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

 Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0 10.200.0.1 5 32768 ?
*> 10.10.1.0/30 0.0.0.0 0 32768 ?
* 10.10.1.1 0 0 100 ?
*> 10.10.1.4/30 10.10.1.1 0 100 ?
*> 10.200.0.0/24 0.0.0.0 0 32768 ?
*> 99.99.99.0/29 10.200.0.1 2 32768 ?
*> 100.100.100.0/30 10.200.0.1 20 32768 ?
*> 192.168.1.0 10.10.1.1 0 100 ?
*> 192.168.2.0 10.10.1.1 0 100 ?
*> 192.168.3.0 10.10.1.1 0 100 ?
CE1#

Just to double check, we can clear the CE1 BGP session again to make sure that the change sticks:

CE1#
CE1#clear ip bgp *
CE1#
*Mar 1 00:36:04.463: %BGP-5-ADJCHANGE: neighbor 10.10.1.1 Down User reset
*Mar 1 00:36:05.283: %BGP-5-ADJCHANGE: neighbor 10.10.1.1 Up
CE1#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 10.200.0.1 to network 0.0.0.0

 100.0.0.0/30 is subnetted, 1 subnets
O E2 100.100.100.0 [110/20] via 10.200.0.1, 00:35:32, FastEthernet0/0
 99.0.0.0/29 is subnetted, 1 subnets
O 99.99.99.0 [110/2] via 10.200.0.1, 00:35:32, FastEthernet0/0
 10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C 10.10.1.0/30 is directly connected, Serial0/0
B 10.10.1.4/30 [20/0] via 10.10.1.1, 00:00:56
C 10.200.0.0/24 is directly connected, FastEthernet0/0
B 192.168.1.0/24 [20/0] via 10.10.1.1, 00:00:57
B 192.168.2.0/24 [20/0] via 10.10.1.1, 00:00:57
B 192.168.3.0/24 [20/0] via 10.10.1.1, 00:00:57
O*E2 0.0.0.0/0 [110/5] via 10.200.0.1, 00:35:33, FastEthernet0/0
CE1#

Success. The route-map has successfully blocked the OSPF being redistributed into the BGP table. As a result the route that the BGP Router process puts forth is the eBGP route, which wins over OSPF with an AD of 20.

A couple of side points to note: An alterative to this approach is to adjust the redistributed routes to make the BGP Best Path Algorithm select the eBGP route over the locally redistributed route. We could have done this using a route-map that resets the weight of the redistributed routes to zero and sets the local preference to 95 (below the default of 100). The config would look as follows:

router bgp 65489
 address-family ipv4
 redistribute ospf 1 match internal external 2 route-map 
 LOWER_WEIGHT_AND_PREF
!
route-map LOWER_WEIGHT_AND_PREF permit 10
 set local-preference 95
 set weight 0

However in this network scenario, there is no real reason to redistribute the MPLS sites back into BGP. It is safer to block them entirely.

It’s also prudent to apply this configuration in the opposite direction as well (tag redistributed routes on CE1 and block them on CE2).

And finally, you might have noticed the route-maps applied inbound and outbound on the eBGP sessions in the base config shown above. These are done to avoid routes looping from BGP to OSPF and back into BGP. MPLS solutions often have multiple sites with the same private AS number meaning allowas-in or as-override must be used to bypass BGP loop prevention (whereby a router running BGP will ignore updates for prefixes that have its own AS number in the AS_PATH attribute). This tagging could easily be used on the outbound advertisements, instead of the prefix-lists shown above. Tagging is more dynamic than manually defining the local ranges using prefix-lists.

Finally let’s confirm routing is following the same path inbound and outbound:

LNS#trace vrf CUST_A 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 18/21 Exp 0] 28 msec 36 msec 40 msec
 2 10.10.1.1 [MPLS: Label 21 Exp 0] 36 msec 36 msec 32 msec
 3 10.10.1.2 40 msec 24 msec 32 msec
 4 10.200.0.1 40 msec 48 msec 40 msec
 5 99.99.99.2 88 msec 56 msec 52 msec
 6 100.100.100.2 92 msec 84 msec 72 msec
LNS#
FW#trace 192.168.1.1 source fa0/0

Type escape sequence to abort.
Tracing the route to 192.168.1.1

 1 10.200.0.253 20 msec
 2 10.10.1.1 24 msec
 3 10.1.2.2 [MPLS: Labels 16/20 Exp 0] 60 msec
 4 192.168.1.1 [MPLS: Label 20 Exp 0] 44 msec 20 msec 44 msec
FW#

Looks good. Routing is symmetric and as expected.

There more ways to solve this problem than I have shown here. Feel free to play around with the lab to see what you can come up with.

Feeback is more than welcome. Let me know if you found this blog useful or interesting. Thanks for reading.

Bridging Layer 2 Across the Core

Welcome to netquirks

As this is my first blog I thought I should write a bit of an introduction. This site is dedicated to looking at interesting and, as the name suggests, quirky scenarios in the world of Network Engineering.

I’ve also added some of my study notes, GNS3 labs and other bits and pieces, so feel free to have a look around. Details of the site, including the layout and a bit about myself can be found on the About page.

Generally blogs will be divided up into three sections: the quirk, the search and the work. The quirk describes the scenario, the search describes how a solution was arrived at and the work shows the technical and command line details. I’ll try and add a new blog once a month.

I will add to this site as time goes by. Any feedback is more than welcome…

 

Bridging Layer 2 Across the Core

This first scenario looks at a case where two remote sites needed to be connected through layer 2 across a Service Provider Core and a single xconnect or changing of a BGP session type was not possible. We ended up having to combine bridging, pseudowires and trunking to provide access…

 

The quirk

The picture below shows the basic setup. We needed to combine two layer 2 domains across an MPLS core. A new connection was brought into a switch on VLAN 6 at Site A. It needed to connect over to Site B. Under normal circumstances we would build a layer 2 xconnect/pseudowire between the sites, however in this circumstance we were not able to…

For a layer 2 xconnect to be configured the terminating device must be able to determine the next-hop label to push on the top of the frame. However the gateway of the Site B Layer 2 domain was a Cisco 7200 router which ran an Option A eBGP session to our PE. This meant it wasn’t getting labels over BGP. In addition, there was no LDP between the 7200 and the PE.

We couldn’t simply configure an Option B session (and consequently move the xconnect onto the 7200) because this would involve potential downtime for the site which was unacceptable.

To make it worse, there were no cable runs between the two locations to bring up a simple layer 2 point-to-point.

It should also be noted that router-on-stick was used at Site B meaning there were other VLANs, all terminating on their own sub-interface, connected to the 7200.

In summary it looked as follows:

blog1_image1_setup

 

The search

Even though an xconnect could not go the full length, the decision was made to push one as far as was viable. So we began by creating an xconnect from PE1 to PE2. VLAN 6 was added to S1’s uplink trunk and the sub-interface that was created for it on PE1 was added to the xconnect (CLI to follow).

The problem we had to face was how to get the layer 2 connectivity around or through the 7200, with minimal disruption. A solution was found in bridging….

We configured a bridge domain on the 7200 and put two new sub-interfaces into the bridge-domain – one for the LAN interface and one for the WAN interface.

The gateway for this subnet was previously a layer 3 sub-interface on Gi0/0 (standard router-on-a-stick setup). This was changed to a BVI.

In a similar fashion a sub-interface was setup on the connecting interface on PE2. This was added to the other end of the xconnect.

What we ultimately ended up with was something that looked like this:

blog1_image2_solution

Once this was setup we could see MAC learning and L2 connectivity across the core.

 

The work

The below GNS3 topology was put together to test and demonstrate the solution before putting it into practice. This can be downloaded from the GNS3 page.

GNS3_bridging_mpls_and_xconnects_Lab_5

LDP is running between the service provider routers and loopbacks are distributed via IS-IS. IPv4 and VPNv4 relationships exist between the PEs. This config is not shown but is available on the lab download.

Host 4 represents the new incoming connection to VLAN 6. Host 2 represents a Site B device on VLAN 6. The other hosts are simply representative of other devices on other VLANs for the sake of variation.

If we look at the configuration of CE1 we can see the config behind a the basic bridging setup:

hostname CE1
!
!enable irb and bridging
bridge irb
bridge 1 protocol ieee
bridge 1 route ip
!

!Configure the WAN sub-interface beneath the main interface, 
!assign it to the bridge domain and set the encapsulation to 
!vlan 6
interface FastEthernet0/0
 description link to PE2
 ip address 10.1.1.1 255.255.255.252
 duplex full
!
interface FastEthernet0/0.6
 description Bridged link to PE2
 encapsulation dot1Q 6
 !Technically the WAN interface need not have the same 
 !encapsulation as the LAN interface. But the sub-interface on 
 ! the PE must have the same encapsulation as this WAN interface.
 bridge-group 1
!
!The key here is that VLAN 6's sub-interfaces is added to the 
! bridge group using the bridge-group command
interface FastEthernet1/0.5
 description VLAN 5 GATEWAY
 encapsulation dot1Q 5
 ip address 172.16.1.1 255.255.255.0
!
interface FastEthernet1/0.6
 description L2 INTERFACE FOR VLAN 6
 encapsulation dot1Q 6
 bridge-group 1
!
!Configure the bridged virtual interfaces that will act as the 
!gateway for VLAN 6.
interface BVI1
 ip address 192.168.1.1 255.255.255.0
!
!Very basic configuration of an IPv4 eBGP neighborship with the 
!PE for the purposes of making the LAB go.
router bgp 100
 bgp log-neighbor-changes
 neighbor 10.1.1.2 remote-as 500
 !
 address-family ipv4
  no synchronization
  redistribute connected
  neighbor 10.1.1.2 activate
  no auto-summary
 exit-address-family

Then, turning to PE2, we can see that the sub-interface for vlan 6 is pushed into an xconnect.

hostname PE2
!
!Psueduowire class used to set the encapsulation for the xconnect
pseudowire-class CLASS_ONE
 encapsulation mpls
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
!
!A standard /30 IP address is configured on the main interfaces. 
!The sub-interface, however, listens for a VLAN 6 tag and pushes 
!traffic into an xconnect.
interface FastEthernet1/0
 description link to Site B
 ip address 10.1.1.2 255.255.255.252
 duplex full
 speed 100
!
interface FastEthernet1/0.2
 description VLAN 6 link to CE1
 encapsulation dot1Q 6
 xconnect 1.1.1.1 100 pw-class CLASS_ONE

Likewise on the PE1 side the configuration of the xconnect is very similar:

hostname PE1
!
pseudowire-class CLASS_ONE
 encapsulation mpls
!
interface FastEthernet1/1.6
 description VLAN 6 link to S1
 encapsulation dot1Q 6
 xconnect 2.2.2.2 100 pw-class CLASS_ONE
!

We can verify the successful connection of the xconnect using the show xconnect peer <ip> vcid <id> command.

PE1#sh xconnect peer 2.2.2.2 vcid 100
Legend:    XC ST=Xconnect State  S1=Segment1 State  
S2=Segment2 State  UP=Up DN=Down  AD=Admin Down      
IA=Inactive  SB=Standby  RV=Recovering  NH=No Hardware

XC ST  Segment 1                   S1 Segment 2          S2
------+---------------------------+--+-------------------+--
UP     ac   Fa1/1.6:6(Eth VLAN)    UP mpls 2.2.2.2:100    UP
PE1#

Additionally we can see MAC learning on the bridge group of router CE1 (c208.0d06.0000 is the MAC address for Host 4):

CE1#show bridge 1

Total of 300 station blocks, 298 free
Codes: P - permanent, S - self

Bridge Group 1:

    Address       Action   Interface       Age   RX count   TX count
c206.0d04.0000   forward   Fa1/0.6           0          5          4
c208.0d06.0000   forward   Fa0/0.6           0          5          5
CE1#

And finally we can see that we are able to run a ping from Host4 to Host2:

Host4#ping 192.168.1.50

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.50, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), 
round-trip min/avg/max = 128/167/224 ms
Host4#