IVE ARP’d on for too long

The purpose of this blog is to highlight how different platforms respond to ARP requests and to explore some strange default operations on Juniper IVE VPN platforms. This quirk was found during a datacentre migration, during which the top-of-rack/first-hop device changed from a Cisco IOS 6500 environment to a Nexus Switching environment. The general setup looks like this and follows an example customer with a Shared IVS setup:

blog10_diagram1_setup

In order to understand this scenario, it’s important to know what the Juniper IVE platform is and how it provides its VPN services.  To that end, I’ll give a brief overview of the platform before looking at the quirk.

IVE Platform

The Juniper 6500 IVE (Instant Virtual Extranet) platform, is a physical appliance that offers customers a unique VPN solution linking to their MPLS network. Once connected, a home worker will be connected to their corporate MPLS network just as if they were at a Branch Office.

(In order to avoid confusion between the Juniper 6500 IVE and the Cisco 6500 L3 switch -which also plays an important role in this setup but is a very different kind of device – I will just use the term IVE to refer to the Juniper platform)

IVE Ports

As you can see from the digram above, an IVE appliance has an external port and an internal port.

The external port, as its name implies, is typically assigned a public IP address. It also has virtual ports, which are analogous to sub-interfaces, each with their own IPs. Each of these virtual ports links to an individual customers VPN platform, or a shared VPN platform that holds multiple customer solutions. A common design involves placing a firewall in between the external interface and the internet. This allows the virtual interfaces to share the same subnet as the main external interface. Customer public IPs are destination NAT’d inbound (or MIP’d if you’re using a Juniper firewall) to their corresponding virtual IPs.

The internal port, similarly services multiple customers. This port can be thought of as a trunk port, whereby each VLAN links to an individual customers VRF, typically with an SVI as the gateway – sometimes used with HSRP or other FHRP.

Shared or Dedicated

Customers can have either a Shared or Dedicated VPN solution. These solutions are called IVS’s (or Instant Virtual Systems). You can have multiple IVS’s on a single IVE appliance.

Shared IVS Solutions represent a single multi-tenant IVS. Basically, multiple customers connect to the same IVS and are segmented by allocating them different sign-in pages and connection policies. Options are more limited than having a Dedicated IVS but can be more cost effective.

Dedicated IVS solutions give customers more flexibility. They can have more connected users and added customisation such as 2FA and multiple realms.

When an IVS is created it needs to link to the internal port. To do this one or more VLANs can be assigned. If the platform is Dedicated, only a single VLAN needs to be assigned – namely that of the customer. This VLAN will link to an SVI in the customers VRF. If the platform is Shared, multiple the VLANs are assigned – one per customer. However in this case, a default VLAN will need to be assigned for when the IVS needs to communicate on a network that is independent from any of its individual customers. Typically the Shared Authentication VLAN is used for this.

But what is the Shared Authentication VLAN? This leads to the next part of the setup… how users authenticate.

Authentication

When a VPN user logins in from home and authenticates, the credentials they enter on the sign-in page with need to be… well… authenticated. Much like the IVS solutions themselves, there are both Shared and Dedicated options.

Customers can have their own LDAP or RADIUS servers within their MPLS networks. In this case the IVE will make a request to this LDAP when a user connects. This is called Dedicated Authentication.

Alternatively, the Service Provider can offer a Shared Authentication solution. This alleviates the customer from having to build and maintain their own LDAP servers by utilising a multi-tenant platform managed by the Provider. The customer supplies the user details, and the Service Provider handles the rest. 

Shared Authentication is typically used for Shared IVS’s. In order to connect to the Shared Authentication Server, a Shared IVS will allocate a VLAN – alongside all of its customer VLANs – on the internal trunk port. This links to the Providers network (for example an internal VRF or VLAN) where the Shared Authentication servers reside. It is this VLAN that is assigned as the default VLAN for the Shared IVS. 

The below screenshot is taken from the Web UI of the IVE platform. It shows some of the configuration for a Shared IVS (namely IVS123).  It uses a default VLAN called Shared_Auth_Network as noted by the asterisk in the bottom right table:

blog10_image1_default_vlan

We’re nearly ready to look at the quirk. There is just one last thing to note regarding how a Shared IVS Platform, like IVS123, communicates with one of its customers Authentication Servers.

Here is the key sentence to remember: When a Shared IVS platform communicates with any authentication server (shared or dedicated), it will use its Shared Auth VLAN IP as the source address in the IP packet.

This behaviour seems very counterintuitive and I’m not sure why the IVS wouldn’t use the source IP of the VLAN for that customer IVS.

Whatever the reason for this behaviour, the result is that packets sourced from a Shared IVS Platform communicating to one of its customer’s Dedicated authentication servers, will be sending packets with a source IP of the Shared Auth VLAN. But such a customer isn’t using Shared Auth. Their network doesn’t know or care about the Shared Auth environment.  So when their Dedicated LDAP server receives an authentication request from the IVE, it sees the source IP address as being from this Shared Auth VLAN.

The solution, however, is easy enough (barring any IP overlaps)… The customer simply places a redistributed static route into its VRF pointing any traffic to this Shared Auth subnet back to their internal port of the IVE.

To understand this better, let’s take a look at a diagram of the setup as a user attempts to connect:

blog10_diagram2_authentication

Now we are equipped to investigate the quirk, which looks at a customer on a Shared IVS platform, but with Dedicated LDAP Authentication Servers.

The quirk

As mentioned earlier, this quirk follows a migration of an IVE platform from an environment using Cisco IOS 6500s switches to an environment using Cisco Nexus switches.

In the both environments, trunk ports connect to the internal IVE ports with SVIs acting as gateways. The difference comes in the control and dataplane that were used. The original IOS environment was a standard MPLS L3VPN network. The Nexus environment was part of a hierarchical VxLAN DC Fabric. Leaf switches connected direct to the IVEs and implemented anycast gateway on the SVIs. Prefix and MAC information was communicated over the EVPN BGP address family and ASR9k DCIs acted as border-leaves terminating the VTEPs, which were then stitched into the MPLS core.

The key difference however, isn’t in the overlays or dataplane protocols being used. The key is how each ToR device responds to ARP…

Once the move was completed and the IVE was connected to the Nexus switches everything seemed fine at first glance. Users with Dedicated IVS’s worked. Users on Shared IVS’s who utilised the Shared Auth server could also login and authenticate correctly. However a problem was found when checking any customer who had a VPN solution configured on a Shared IVS platform with Dedicated Authentication. Despite the customer login page showing up (implying that the public facing external side was working), authentication requests to their Dedicated Auth Servers were failing.

Below shows the Web UI output of a test to connect to our example customers LDAP servers at 192.168.10.10.

blog10_image2_ldap_failure

As we searched for a solution to this problem, we had to keep in mind how a Shared IVS Platform makes Auth Server requests…

The search

Focusing on just one of the customers on the Shared platform, we first checked how far a trace would get from the IVE to the Dedicated Auth Server. We found pretty quickly that the trace would not even reach the first hop – that is, the anycast gateway IP that was on the SVI of the Nexus leaf switch.

blog10_image3_trace_fail

However when checking from the Nexus, both routing and tracing, we saw we could reach the Dedicated Auth Server fine – as long as we sourced from the right VRF.

nexus1# sh ip route vrf CUST_A | b 192.168.10.10 | head lines 5
192.168.10.0/24, ubest/mbest: 2/0
*via 172.16.24.34 %default, [20/0], 7w2d, bgp-65000, external, tag 500 
    (evpn) segid: 12345 tunnelid: 0xc39dfe04 encap: VXLAN

*via 172.16.24.33 %default, [20/0], 7w2d, bgp-65000, external, tag 500 
    (evpn) segid: 12345 tunnelid: 0xc39dfe05 encap: VXLAN

nexus1# traceroute 192.168.10.10 vrf CUST_A
traceroute to 192.168.10.10 (192.168.10.10), 30 hops max, 40 byte packets
1 172.16.24.33 (172.16.24.33) 1.455 ms 1.129 ms 1.022 ms
2 172.16.20.54 (172.16.20.54) 6.967 ms 6.928 ms 6.64 ms
3 10.11.2.3 (10.11.2.3) 8.002 ms 7.437 ms 7.92 ms
4 10.24.4.1 (10.24.4.1) 6.789 ms 6.683 ms 6.764 ms
5 * * *
6 192.168.10.10 (192.168.10.10) 12.374 ms 0.704 ms 0.62 ms

This led us to check the Layer 2 between the switch and the IVE. We did this by checking the ARP table entries on the IVE. We immediately found that there were no ARP entries to be found for the ToR SVI for any customer on a Shared Platform with a Dedicated Authentication setup.

The output below shows the ARP table as seen from the console of the IVE. Note the incomplete ARP entry for 172.16.20.33, the SVI on the Nexus for our example customer.

(As a quick aside, you may notice that the HWAddress of the Nexus is showing as 11:11:22:22:33:33. This is due to the fabric forwarding anycast-gateway-mac 1111.2222.3333 command being configured.)

Please choose from among the following options:
1. View/Set IP/Netmask/Gateway/DNS/WINS Settings
2. Print Routing Table
3. Print ARP Cache
4. Clear ARP Cache
5. Ping to a Server
6. Trace route to a Server
7. Remove Routes
8. Add ARP entry
9. View cluster status
10. Configure Management port (Enabled)

Choice: 3
Address       HWtype  HWaddress          Flags Mask  Iface
172.16.31.1   ether   11:11:22:22:33:33   C          int0.2387
10.101.23.4   ether   11:11:22:22:33:33   C          int0.1298
192.168.77.1  ether   11:11:22:22:33:33   C          int0.2347
172.16.20.33          (incomplete)                   int0.

So there is no ARP entry. But logically this appears to be more or less the same layer 2 segment when it connected to the 6500. So what gives?

It turns out that 6500s and Nexus switches respond to ARP requests in different ways. The process on the 6500 is fairly standard and works as follows:

blog10_diagram3_6500_arp

But a Nexus will not respond to an ARP request if the source IP is from a subnet that it doesn’t recognise:

blog10_diagram4_nexus_arp

In our example case, the Nexus switch does not recognise 10.10.10.10 as a valid source IP for the receiving interfaces (which has IP 172.16.20.33). It sees it as off-net. We could also see the ARP check failing by using debug ip arp packet on the switch.

So what’s the solution? There are a couple of ways to tackle this. We could add a static ARP entry on the IVE, but this could be cumbersome if new needed to add it for each Shared IVS. Alternatively, we could add a secondary IP to the subnet on the SVI…

The Work

Adding a secondary IP is fairly straight forward. The config would be as follows:

nexus1# sh run interface vlan 2301
!
interface Vlan2301
description Customer_A
no shutdown
bandwidth 2000
vrf member CUST_A
no ip redirects
ip address 172.16.20.33/29
ip address 10.10.10.11/31 secondary
fabric forwarding mode anycast-gateway

A /31 works well in this case, encompassing only the IPs that are needed (namely 10.10.10.10 and 10.10.10.11) . This allows the ARP request to pass the aforementioned check that the Nexus performs. From here the MAC entries began to show up and connectivity to the Shared Auth Server began to work.

Please choose from among the following options:
1. View/Set IP/Netmask/Gateway/DNS/WINS Settings
2. Print Routing Table
3. Print ARP Cache
4. Clear ARP Cache
5. Ping to a Server
6. Trace route to a Server
7. Remove Routes
8. Add ARP entry
9. View cluster status
10. Configure Management port (Enabled)

Choice: 3
Address       HWtype  HWaddress          Flags Mask   Iface
172.16.31.1   ether   11:11:22:22:33:33   C           int0.2387
10.101.23.4   ether   11:11:22:22:33:33   C           int0.1298
192.168.77.1  ether   11:11:22:22:33:33   C           int0.2347
172.16.20.33  ether   11:11:22:22:33:33   C           int0.2301

blog10_image4_ldap_success

So this raises the question of whether or not this behaviour is desired. Should a device responding to an ARP request, check the source IP? I’d tend to lean in favour of this type of behaviour. It adds extra security and besides, it’s actually the behaviour of the IVE that is strange in this case. One would think that the IVS would use the source IP of the connecting customers subnet, instead of that of the Shared Auth VLAN. The behaviour certainly is unorthodox but finding a solution to this problem highlights some of the interesting scenarios that can arise when working with different vendors and operating systems.

I hope you’ve enjoyed the read. I’m always open to alternate ideas or general discussion so if you have any thoughts, let me know.

MPLS Management misconfiguration

There are many different ways for ISPs to manage MPLS devices like routers and firewalls that are deployed to customer sites. This quirk explores one such solution and looks at a scenario where a misconfiguration results in VRF route leaking between customers.

The quirk

When an ISP deploys Customer Edge (CE) devices to customers sites they might, and often do, want to maintain management. For customers with a simple public internet connection this is usually straight forward – the device is reachable over the internet and  an ACL or similar policy will be configured, allowing access from only a list of approved ISP IP addresses (for extra security VPNs could be used).

However when Peer-to-Peer L3VPN MPLS is used, it is more complicated. The customer network is not directly accessible from the internet without going through some kind of a breakout site. The ISP will either need a link into their customers MPLS network or must configure access through the breakout. This can become complicated as the number of customers, and the number of sites per customer, increases.

One option, presented in this quirk, is to have all MPLS customers PE-CE WAN subnets come from a common supernet range. These WAN subnets can then be exported into a common management VRF using a specific RT. The network that will be used to demonstrate this looks as follows:

blog4_image1_base_setup

This is available for download as a GNS3 lab from here. It includes the solution to the quirk as detailed below.

The ISPs ASN is 500. The two customer have ASNs 100 and 200 (depending on the setup these would typically be private ASNs, but they have been shown here as 100 and 200 for simplicity). A management router (MGMT) in ASN 64512 has access to the PE-CE WAN ranges for all of the customers, all of which come from the supernet 172.30.0.0/16. A special subnet within this range, 172.30.254.0/24, is reserved for the Management network itself. The MGMT router, or MPLS jump box as it may also be called, is connected to this range – as would any other devices requiring access to the MPLS customers devices (backup or monitoring systems for instance… not shown).

The basic idea is that each customer VRF exports their PE-CE WAN ranges with an RT of 500:501. The MGMT VRF then imports this RT.

Along side this, the MGMT VRF will exports its own routes (from the 172.30.254.0/24 supernet) with an RT of 500:500. All of the customer VRFs import 500:500.

This has two key features:

  • Customer WAN ranges will all be from the 172.30.0.0/16 and must not overlap between customers.
  • WAN ranges and site subnets are not, at any point, leaked between customer VRFs.

To get a better idea of how it works, take a look at the following diagram:

blog4_image2_mpls_mgmt_concept

The CLI for each customer VRF setup looks as follows:

ip vrf CUST_1
 description Customer_1_VRF
 rd 500:1
 vpn id 500:1
 export map VRF_EXPORT_MAP
 route-target export 500:1
 route-target import 500:1
 route-target import 500:500
!
route-map VRF_EXPORT_MAP permit 10
 match ip address prefix-list VRF_WANS_EXCEPT_MGMT
 set extcommunity rt 500:501 additive
route-map VRF_EXPORT_MAP permit 20
!
ip prefix-list VRF_WANS_EXCEPT_MGMT seq 10 deny 172.30.254.0/24 le 32
ip prefix-list VRF_WANS_EXCEPT_MGMT seq 20 permit 172.30.0.0/16 le 32

Note that the export map used on customer VRFs makes a point to exclude the routes that the Management supernet (172.30.254.0/24). This is done on the off chance that the range exists within the customers VRF table.

The VRF for the Management network is configured as follows (note this is only configured on CE3 in the above lab):

ip vrf MGMT_VRF
 description VRF for Management of Customer CEs
 rd 500:500
 vpn id 500:500
 route-target export 500:500
 route-target import 500:500
 route-target import 500:501

This results in the WAN ranges for customers being tagged with the 500:501 RT but not the LAN ranges.

PE1#sh bgp vpnv4 unicast vrf CUST_1 172.30.1.0/30
BGP routing table entry for 500:1:172.30.1.0/30, version 9
Paths: (1 available, best #1, table CUST_1)
  Advertised to update-groups:
    1         3

  Local
    0.0.0.0 from 0.0.0.0 (1.1.1.1)
      Origin incomplete, metric 0, localpref 100, weight 32768, valid, 
       sourced, best
      Extended Community: RT:500:1 RT:500:501
      mpls labels in/out 23/aggregate(CUST_1)

PE1#sh bgp vpnv4 unicast vrf CUST_1 192.168.50.0/24
BGP routing table entry for 500:1:192.168.50.0/24, version 3
Paths: (1 available, best #1, table CUST_1)
  Advertised to update-groups:
    3

  100
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1
      mpls labels in/out 24/nolabel
PE1#

192.168.50.0/24, above, is a one of the LAN ranges and does not have the 500:501 RT.

Every VRF can see the management network and the management network can see all the PE-CE WAN ranges for every customer:

PE1#sh ip route vrf CUST_2

Routing Table: CUST_2
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1
       L2 - IS-IS level-2, ia - IS-IS inter area, * - candidate default
       U - per-user static route, o - ODR
       P - periodic downloaded static route

Gateway of last resort is not set

B       192.168.60.0/24 [20/0] via 172.30.1.10, 01:32:17
        172.30.0.0/30 is subnetted, 3 subnets
B         172.30.254.0 [200/0] via 3.3.3.3, 01:32:09
B         172.30.1.4 [200/0] via 2.2.2.2, 01:32:09
C         172.30.1.8 is directly connected, FastEthernet1/0
B       192.168.50.0/24 [200/0] via 2.2.2.2, 01:32:09

PE1#
PE3#sh ip route vrf MGMT_VRF

Routing Table: MGMT_VRF
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1
       L2 - IS-IS level-2, ia - IS-IS inter area, * - candidate default
       U - per-user static route, o - ODR
       P - periodic downloaded static route

Gateway of last resort is not set

        172.30.0.0/30 is subnetted, 4 subnets
C         172.30.254.0 is directly connected, FastEthernet0/0
B         172.30.1.0 [200/0] via 1.1.1.1, 01:32:24
B         172.30.1.4 [200/0] via 2.2.2.2, 01:32:24
B         172.30.1.8 [200/0] via 1.1.1.1, 01:32:24

PE3#

Also, note that the routing table for Customer 2 (vrf CUST_2) cannot see the 172.30.1.0/30 WAN range for Customer 1 (vrf CUST_1).

Given the proper config, the MGMT router can access the WAN ranges for customers:

MGMT#telnet 172.30.1.2
Trying 172.30.1.2 ... Open

User Access Verification
Password:
CE1-1>

NB. I’m not advocating using telnet in such an environment. Use SSH as a minimum when you can.

The quirk comes in when a simple misconfiguration introduces route leaking between customer VRFs.

Consider an engineer accidentally configuring a VRF that exports all its vpnv4 prefixes with RT 500:500 (rather than only exporting its PE-CE WAN routes with RT500:501 as described above). The mistake is easy enough to make and will cause routes from the newly configured VRF to be imported by all other customer VRFs. This will have a severe impact for any customers with the same route within their VRF.

To demonstrate this, imagine that the CUST_1 VRF is not yet configured. Pinging from site Customer 2 Site 2 (CE2-2 on the lower left side of the diagram) with a source of 192.168.60.1 to Customer 2 Site 1 (CE1-2) with a destination of 192.168.50.1 works fine

CE2-2#trace 192.168.50.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.50.1
 1 172.30.1.9 12 msec 24 msec 24 msec
 2 10.10.14.4 [AS 500] [MPLS: Labels 16/24 Exp 0] 92 msec 64 msec 44 msec
 3 172.30.1.5 [AS 500] [MPLS: Label 24 Exp 0] 48 msec 68 msec 52 msec
 4 172.30.1.6 [AS 500] 116 msec 88 msec 104 msec

CE2-2#

If the CUST_1 VRF is now setup with the aforementioned misconfiguration, route leaking between CUST_1 and CUST_2 will result:

PE1(config)#ip vrf CUST_1
PE1(config-vrf)# description Customer_1_VRF
PE1(config-vrf)# rd 500:1
PE1(config-vrf)# vpn id 500:1
PE1(config-vrf)# route-target export 500:1
PE1(config-vrf)# route-target import 500:1
PE1(config-vrf)# route-target export 500:500
PE1(config-vrf)#
PE1(config-vrf)# interface FastEthernet0/1
PE1(config-if)# description Link to CE 1 for Customer 1
PE1(config-if)# ip vrf forwarding CUST_1
PE1(config-if)# ip address 172.30.1.1 255.255.255.252
PE1(config-if)# duplex auto
PE1(config-if)# speed auto
PE1(config-if)# no shut
PE1(config-if)#exit
PE1(config)#router bgp 500
PE1(config-router)# address-family ipv4 vrf CUST_1
PE1(config-router-af)# redistribute connected
PE1(config-router-af)# redistribute static
PE1(config-router-af)# neighbor 172.30.1.2 remote-as 100
PE1(config-router-af)# neighbor 172.30.1.2 description Customer 1 Site 1
PE1(config-router-af)# neighbor 172.30.1.2 activate
PE1(config-router-af)# neighbor 172.30.1.2 default-originate
PE1(config-router-af)# neighbor 172.30.1.2 as-override
PE1(config-router-af)# neighbor 172.30.1.2 route-map CUST_1_SITE_1_IN in
PE1(config-router-af)# no synchronization
PE1(config-router-af)# exit-address-family
PE1(config-router)#

VRF CUST_1 will export its routes (including 192.168.50.0/24 from Customer 1 Site 1 – CE1-1) and the VRF CUST_2 will import these routes due to the RT of 500:500.

Looking at the BGP and routing table for the CUST_2 VRF shows that the next hop for 192.68.50.0/24 is now the CE1-1 router.

PE1#sh ip route vrf CUST_2 192.168.50.0
Routing entry for 192.168.50.0/24
  Known via "bgp 500", distance 20, metric 0
  Tag 100, type external
  Last update from 172.30.1.2 00:02:45 ago
  Routing Descriptor Blocks:
  * 172.30.1.2 (CUST_1), from 172.30.1.2, 00:02:45 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 100

PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 21
Paths: (2 available, best #1, table CUST_2)
  Advertised to update-groups:
    2

  100, imported path from 500:1:192.168.50.0/24
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1 RT:500:500

  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24

PE1#

There are now two possible paths to reach 192.168.50.0/24. One imported from the VRF for CUST_1 and one from its own (coming from CE1-2). The path via AS 100 is being preferred due to the lower IGP metric. Note the 500:500 RT in this path.

Once this is done CE2-2 cannot reach its 192.168.50/24 subnet on CE1-2.

CE2-2#trace 192.168.50.1 source lo1
Type escape sequence to abort.

Tracing the route to 192.168.50.1
1 172.30.1.9 8 msec 12 msec 12 msec
2 * * *
3 * * *
4 * * *
...output omitted for brevity

Granted, this issue is caused by a mistake, but the difference between the correct and incorrect commands is minimal. An engineer under pressure or working quickly could potentially disrupt a massive MPLS infrastructure resulting in outages for multiple customers.

The search

As mentioned at the beginning of this blog, there are multiple ways to manage an MPLS network.

One possibility is to have a single router that, rather than import and export WAN routes based on RTs, has a single loopback address in each VRF. It is from this loopback that the router will source SSH or telnet sessions to the customer CE devices. For example:

interface loopback 1
 description Loopback source for Customer 1
 ip vrf forwarding CUST_1
 ip address 100.100.100.100 255.255.255.255
!
interface loopback 2
 description Loopback source for Customer 2
 ip vrf forwarding CUST_2
 ip address 100.100.100.100 255.255.255.255

MGMT# telnet 172.30.1.2 /vrf CUST_1

This has a number of advantages:

  • This router acts as a single jump host (rather than a subnet), which could be considered more secure
  • There is no restriction on the WAN addresses for each customer. They can be any WAN range at all and can overlap between customers.
  • The same IP address can be used for each VRFs loopback (as long as it doesn’t clash with any existing IPs already in the customers VRF).

However there are a number of disadvantages:

  • Each VRF must be configured on this jump router
  • This jump router is a single point of failure
  • The command to log on is more complex and requires the users to know the VRFs exact name rather than just the router IP.
  • Migrating to this solution, from the aforementioned RT import/export solution, would be a cumbersome and long process.
  • Centralised MPLS backups could be complicated if there is a not a common subnet (like 172.30.254.0/24) reachable by all CE devices.

For these reasons it was decided not to use this solution. Rather, it was decided to use import filtering, to prevent this issue from taking place even if the misconfiguration occurred. The import filtering uses a route-map that makes the followed sequential check:

    1. If a route has the RT 500:500 and is from the management range (172.30.254.0/24) allow it.
    2. If any other route has the RT 500:500, deny it.
    3. Allow the import of all other routes.

Essentially, rather than just importing 500:500, this route-map checks to make sure that a vpnv4 prefix comes from the management range of 172.30.254.0/24. The biggest issue in this scenario was the deployment of this route-map to all VRFs on all PEs. But with a little bit of scripting (I won’t go into the details here), this was far more plausible than the option of deploying a multi-VRF jump router.

The work

The route map described in the above section looks as follows:

ip extcommunity-list standard VRF_MGMT_COMMUNITY permit rt 500:500
ip prefix-list VRF_MGMT_LAN seq 5 permit 172.30.254.0/24 le 32
!
route-map VRF_IMPORT_MAP permit 10
 match ip address prefix-list VRF_MGMT_LAN
 match extcommunity VRF_MGMT_COMMUNITY
!
route-map VRF_IMPORT_MAP deny 20
 match extcommunity VRF_MGMT_COMMUNITY
!
route-map VRF_IMPORT_MAP permit 30

NB. This is a good example of and/or operation in a route map. If the types differ (in this case a prefix list and an extcommunity list) the operation is treated as a conjunction (AND) operation. If the types are the same it is a disjunction (OR) operation.

This will prevent the issue from occurring as it will stop the import of any vpnv4 prefix that has an RT of 500:500 unless it is from the management range.

Here is the configuration of this import map on PE1 (the other PEs are not shown but it should be configured on them too):

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)# ip extcommunity-list standard VRF_MGMT_COMMUNITY permit 
rt 500:500
PE1(config)#ip prefix-list VRF_MGMT_LAN seq 5 permit 172.30.254.0/24 
le 32
PE1(config)#!
PE1(config)#route-map VRF_IMPORT_MAP permit 10
PE1(config-route-map)# match ip address prefix-list VRF_MGMT_LAN
PE1(config-route-map)# match extcommunity VRF_MGMT_COMMUNITY
PE1(config-route-map)#!
PE1(config-route-map)#route-map VRF_IMPORT_MAP deny 20
PE1(config-route-map)# match extcommunity VRF_MGMT_COMMUNITY
PE1(config-route-map)#!
PE1(config-route-map)#route-map VRF_IMPORT_MAP permit 30
PE1(config-route-map)#
PE1(config-route-map)#ip vrf CUST_2
PE1(config-vrf)#import map VRF_IMPORT_MAP

After this addition, in the event that the misconfiguration takes place when creating the CUST_1 VRF, the import map will block the 192.168.50.0/24 subnet. The only path that the CUST_2 VRF has to 192.168.50.0/24 is from CE1-2, which is correct. Here is the configuration and resulting verification:

PE1(config)#ip vrf CUST_1
PE1(config-vrf)# description Customer_1_VRF
PE1(config-vrf)# rd 500:1
PE1(config-vrf)# vpn id 500:1
PE1(config-vrf)# route-target export 500:1
PE1(config-vrf)# route-target import 500:1
PE1(config-vrf)# route-target export 500:500
PE1#sh ip route vrf CUST_2 192.168.50.0
Routing entry for 192.168.50.0/24
  Known via "bgp 500", distance 200, metric 0
  Tag 200, type internal
  Last update from 2.2.2.2 00:22:12 ago
  Routing Descriptor Blocks:
  * 2.2.2.2 (Default-IP-Routing-Table), from 5.5.5.5, 00:22:12 ago
    Route metric is 0, traffic share count is 1
    AS Hops 1
    Route tag 200

PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 12
Paths: (1 available, best #1, table CUST_2)
Advertised to update-groups:
    2
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#
CE2-2#trace 192.168.50.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.50.1

 1 172.30.1.9 12 msec 24 msec 8 msec
 2 10.10.14.4 [AS 500] [MPLS: Labels 18/24 Exp 0] 60 msec 68 msec 64 msec
 3 172.30.1.5 [AS 500] [MPLS: Label 24 Exp 0] 52 msec 68 msec 44 msec
 4 172.30.1.6 [AS 500] 84 msec 56 msec 56 msec

CE2-2#

Management of the correct WAN device is still working as well…

MGMT#telnet 172.30.1.10
Trying 172.30.1.10 ... Open

User Access Verification

Password:
CE2-2>

Just for good measure, and to double check that our route-map is making a difference, let’s see what happens if we remove the import map from the CUST_2 VRF.

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)#ip vrf CUST_2
PE1(config-vrf)#no import map VRF_IMPORT_MAP
PE1(config-vrf)#^Z
PE1#
*Mar 1 00:27:45.259: %SYS-5-CONFIG_I: Configured from console by console
PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 22
Paths: (2 available, best #1, table CUST_2)
Flag: 0x820
  Advertised to update-groups:
    2
  100, imported path from 500:1:192.168.50.0/24
    172.30.1.2 from 172.30.1.2 (192.168.50.1)
      Origin incomplete, metric 0, localpref 100, valid, external, best
      Extended Community: RT:500:1 RT:500:500
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#

The offending route is imported into the CUST_2 VRF pretty quickly, proving that our route-map works. If the route map is put back in place, and we wait for the BGP Scanner to run (after 30 seconds or less) the vpnv4 prefix is blocked again:

PE1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE1(config)#ip vrf CUST_2
PE1(config-vrf)#import map VRF_IMPORT_MAP
PE1(config-vrf)#^Z
PE1#
*Mar 1 00:29:51.443: %SYS-5-CONFIG_I: Configured from console by console
PE1#sh bgp vpnv4 unicast vrf CUST_2 192.168.50.0
BGP routing table entry for 500:2:192.168.50.0/24, version 24
Paths: (1 available, best #1, table CUST_2)
Flag: 0x820
  Advertised to update-groups:
    2
  200
    2.2.2.2 (metric 20) from 5.5.5.5 (5.5.5.5)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:500:2
      Originator: 2.2.2.2, Cluster list: 5.5.5.5
      mpls labels in/out nolabel/24
PE1#

This quirk shows just one way to successfully configure MPLS management and protect against misconfiguration. Give me a shout if anything was unclear or if you have any thoughts. As mentioned earlier, the GNS3 lab is available for download so have a tinker and see what you think.

Site by Site MPLS Breakout Migration

This months quirk is a bit late. I have been studying furiously and managed to pass my Deploying Cisco Service Provider Advanced Network Routing exam last week. Only two to go before I get CCNP SP. 🙂

Another plus side is that I have a tonne of study notes that I will be uploading over the next few weeks. So anyone interested in Multicast, BGP or IPv6 watch this space.

Anyways, this quirk looks at a design solution whereby a 100+ site MPLS customer needed to change the Service Provider for their primary internet breakout one site at a time…

 

The quirk     

The customer had an L3VPN MPLS cloud with a new ISP, but still had their primary internet breakout with their old ISP.

The below diagram shows a stripped down version of such a network, illustrating the basic idea:

blog3_image1_base_setup

So whilst all of the MPLS sites connected to the new ISPs core, the link to the internet was still going out through a site that connected to the old provider.

The customer needed to move the default route and primary breakout over but did not want to do a single “big bang” migration and move all of the sites at once. Rather, they wanted to migrate one site at a time.

The search

The first step in looking at how to accomplish this was to break down the requirements. The following conditions needed to be met:

  • Each site must still be able to access all other sites and the file/application servers at the primary breakout site. These servers would be moved to the new ISP connection and breakout site 2 last of all.
  • As each site moves over to the new breakout, they only need PAT to gain access to the internet – no public services are run at the remote sites.
  • The PI space held by the customer, used for public facing services on the application servers, would be moved to the new provider once all site were migrated.
  • Sites must be able to be moved one at a time without affecting any other sites.
  • The majority of MPLS sites were single homed with a static default.

Looking at these requirements gave us a good idea of what we needed to achieve.

Policy based routing was considered first. Adjusting either the next hop or VRF using the source address. However this would require too much overhead in identifying the site that had been moved, either the by community value or source prefix, combined with setting the next hop or VRF to use.

Ultimately, the use of a second VRF with “all but default” route leaking was decided upon. This involved creating a second VRF with a default route pointing to the new ISP breakout. All routes except the defaults were to be leaked between these VRFs.

This meant that all we needed to migrate a site, was change the VRF to which the attachment circuit belonged.

It is worth highlight that had there been a significant number of multihomed sites implementing BGP, using policy based routing may have been preferred. This is because a large number of BGP neighborships would need to be reconfigured to the correct VRF.

The work

The below output has been taken from a simulation. The MPLS sites have been represented using loopbacks1-3 on PE_RTR.

First we will take a look at a traceroute to the internet (to IP 50.50.50.50) and the routing table for the original VRF before any changes were made: 

PE_RTR#sh ip route vrf CUST-A-OLD-ISP

Routing Table: CUST-A-OLD-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 1.1.1.1 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:15:34
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:15:34
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:15:34
C 192.168.1.0/24 is directly connected, Loopback1
C 192.168.2.0/24 is directly connected, Loopback2
C 192.168.3.0/24 is directly connected, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:15:34
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:15:34
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:19
B* 0.0.0.0/0 [200/5] via 1.1.1.1, 00:15:43
PE_RTR#

PE_RTR#trace vrf CUST-A-OLD-ISP 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 17/20 Exp 0] 116 msec 72 msec 48 msec
 2 10.10.10.1 [MPLS: Label 20 Exp 0] 24 msec 44 msec 24 msec
 3 10.10.10.2 20 msec 20 msec 36 msec
 4 192.168.50.1 28 msec 56 msec 24 msec
 5 100.100.100.1 116 msec 52 msec 72 msec
 6 100.111.111.1 64 msec 140 msec 60 msec
PE_RTR#

So the WAN range of the breakout in this simulation is 100.100.100.0/29. This is their PI space. Notice the range 192.168.101.0/24, which is the subnet that the file/application servers are on.

The VRF configuration on the PEs is straightforward.

ip vrf CUST-A-OLD-ISP
 description VRF for Old ISP Breakout
 rd 100:1
 route-target export 100:1
 route-target import 100:1

Before we created the new VRF, we needed a way to differentiate what can and cannot be leaked. For this we used filtering when exporting RTs. We designated the RT 100:100 for routes that should be leaked.

First we started by making a prefix list that catches the default route:

ip prefix-list defaultRoute seq 5 permit 0.0.0.0/0
ip prefix-list defaultRoute seq 50 deny 0.0.0.0/0 le 32

Then we specified a route-map that attached the RT 100:100 to prefixes that are not the default route

route-map ALL-EXCEPT-DEFAULT permit 10
 match ip address prefix-list defaultRoute
!
route-map ALL-EXCEPT-DEFAULT permit 20
 set extcommunity rt 100:100 additive

Note the use of the additive keyword so as not to overwrite any existing communities.

Once we had these setup, we created the new VRF and applied this route-map in the form of an export-map to set the correct RTs. We made sure to import 100:100 and then applied the same to original VRF.

ip vrf CUST-A-NEW-ISP
 description VRF for New ISP Breakout
 rd 100:2
 export map ALL-EXCEPT-DEFAULT
 route-target export 100:2
 route-target import 100:100
 route-target import 100:2
!
ip vrf CUST-A-OLD-ISP
 description VRF for Old ISP Breakout
 rd 100:1
 export map ALL-EXCEPT-DEFAULT
 route-target export 100:1
 route-target import 100:100
 route-target import 100:1

From here, after deploying this to all the relevant PEs and injecting a new default route, the migration from one VRF to another was fairly straight forward. Below shows an example using a simulated loopback (the principle would be the same for the incoming attachment circuit to a customer site):

PE_RTR(config)#interface Loopback1
PE_RTR(config-if)# ip vrf forwarding CUST-A-NEW-ISP
% Interface Loopback1 IP address 192.168.1.1 removed due to enabling 
VRF CUST-A-NEW-ISP
PE_RTR(config-if)# ip address 192.168.1.1 255.255.255.0

If we look at the routing table for this new vrf we see the following:

PE_RTR#sh ip route vrf CUST-A-NEW-ISP

Routing Table: CUST-A-NEW-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 2.2.2.2 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:16:16
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:16:16
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:16:16
C 192.168.1.0/24 is directly connected, Loopback1
B 192.168.2.0/24 is directly connected, 00:16:17, Loopback2
B 192.168.3.0/24 is directly connected, 00:16:23, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:16:16
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:16:18
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:20
B* 0.0.0.0/0 [200/0] via 2.2.2.2, 00:16:18
PE_RTR#

An interesting side note here is that even though Loopback2 and 3 are directly connected, they are shown as having been learned through BGP. This is the result of the import from the original VRF. Indeed upon closer inspection of one of the prefixes we see the 100:100 community:

PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-NEW-ISP 192.168.3.0/24
BGP routing table entry for 100:2:192.168.3.0/24, version 47
Paths: (1 available, best #1, table CUST-A-NEW-ISP)
 Not advertised to any peer
 Local, imported path from 100:1:192.168.3.0/24
 0.0.0.0 from 0.0.0.0 (3.3.3.3)
 Origin incomplete, metric 0, localpref 100, weight 32768, valid, 
external, best
 Extended Community: RT:100:1 RT:100:100
 mpls labels in/out nolabel/aggregate(CUST-A-OLD-ISP)

And looking at the default route we see no such community and a different next hop from the original table.

PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-NEW-ISP 0.0.0.0
BGP routing table entry for 100:2:0.0.0.0/0, version 40
Paths: (1 available, best #1, table CUST-A-NEW-ISP)
 Not advertised to any peer
 65489
 2.2.2.2 (metric 3) from 2.2.2.2 (2.2.2.2)
 Origin incomplete, metric 5, localpref 200, valid, internal, best
 Extended Community: RT:100:2
 mpls labels in/out nolabel/23

The old VRFs table still shows a route for the newly migrated site (although now learned via BGP) and the default route is still as it was originally:

PE_RTR#sh ip route vrf CUST-A-OLD-ISP

Routing Table: CUST-A-OLD-ISP
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route

Gateway of last resort is 1.1.1.1 to network 0.0.0.0

 10.0.0.0/30 is subnetted, 3 subnets
B 10.10.11.0/30 [200/0] via 2.2.2.2, 00:15:34
B 10.10.10.0/30 [200/0] via 1.1.1.1, 00:15:34
B 10.20.20.0/30 [200/0] via 2.2.2.2, 00:15:34
B 192.168.1.0/24 is directly connected, 00:15:36, Loopback1
C 192.168.2.0/24 is directly connected, Loopback2
C 192.168.3.0/24 is directly connected, Loopback3
B 192.168.50.0/24 [200/0] via 1.1.1.1, 00:15:34
B 192.168.51.0/24 [200/0] via 2.2.2.2, 00:15:34
B 192.168.101.0/24 [200/0] via 1.1.1.1, 00:16:19
B* 0.0.0.0/0 [200/5] via 1.1.1.1, 00:15:43
PE_RTR#
PE_RTR#sh bgp vpnv4 unicast vrf CUST-A-OLD-ISP 0.0.0.0
BGP routing table entry for 100:1:0.0.0.0/0, version 15
Paths: (1 available, best #1, table CUST-A-OLD-ISP)
 Not advertised to any peer
 65489
 1.1.1.1 (metric 3) from 1.1.1.1 (1.1.1.1)
 Origin incomplete, metric 0, localpref 100, valid, internal, best
 Extended Community: RT:100:1
 mpls labels in/out nolabel/26

Finally, a traceroute test shows that the newly migrated site accesses the internet via a different site and can still access the application server subnet

PE_RTR#trace vrf CUST-A-NEW-ISP 50.50.50.50 source lo1

Type escape sequence to abort.
Tracing the route to 50.50.50.50

 1 10.1.3.2 [MPLS: Labels 16/20 Exp 0] 44 msec 40 msec 52 msec
 2 10.20.20.1 [MPLS: Label 20 Exp 0] 32 msec 36 msec 52 msec
 3 10.20.20.2 52 msec 40 msec 32 msec
 4 192.168.51.1 54 msec 39 msec 31 msec
 5 200.200.200.2 68 msec 60 msec 32 msec
 6 200.222.222.2 65 msec 143 msec 62 msec

PE_RTR#
PE_RTR#trace vrf CUST-A-NEW-ISP 192.168.101.1 source lo1

Type escape sequence to abort.
Tracing the route to 192.168.101.1

 1 10.1.3.2 [MPLS: Labels 16/22 Exp 0] 56 msec 52 msec 44 msec
 2 10.10.10.1 [MPLS: Label 22 Exp 0] 36 msec 24 msec 24 msec
 3 10.10.10.2 40 msec 40 msec 36 msec
 4 192.168.50.1 26 msec 57 msec 23 msec
 5 192.168.101.1 32 msec 48 msec 36 msec
PE_RTR#

One final point to make is that advertising the PI space to both providers for backup purposes was a possibility. as-path prepend could have been used from breakout site 2 to make it less preferred. But complications come into play depending on how each provider advertises the PI space and whether they honour any adjustments that the customer makes. Should return traffic not follow the same path, stateful firewall sessions would also encounter also difficulty.

So a pretty straight forward solution in the end but interesting from the perspective of a migration standpoint. I am interest to hear thoughts on whether anyone would have taken a different approach. Perhaps we should have done policy based routing or maybe another solution? As usual thoughts are always welcome.