Multihoming without a PE-to-CE Dynamic Routing Protocol

Posted on 03/05/2017 by stevencnz

This quirk looks at how a multihomed site without a CE-to-PE routing protocol, like eBGP, can run into failover problems when using a first hop redundancy protocol.

The setup is as follows:

The CE routers in this case are Cisco 887 routers. The WAN connections are ADSL lines. From the CE routers, PPP sessions connect to the provider LNS/BNGs routers (PE1 and PE2). These PPP sessions run over L2TP tunnels between the LAC and LNS. RADIUS is used by the LNS routers to authenticate the PPP sessions and to obtain IP and routing attributes.

CE1 and CE2 are running HSRP. CE1 is Active. The CE LAN interfaces are switchports and the IP/HSRP configurations are on SVIs for the access VLAN. Both CEs have a static default route pointing to the dialer interface for their respective WAN connections. CE1 tracks its dialer interface so that it can lower its HSRP priority if the WAN connection fails (allowing CE2 to take over).

Outbound traffic is routed via the HSRP Active router.

Inbound traffic works as follows:

When an LNS router authenticates a PPP session, it will send an Auth-Request to the RADIUS server. The RADIUS server, when sending its Access-Accept to confirm the user is valid, will also return RADIUS attributes that the LNS server parses and applies to its configuration. For example, the attributes can indicate what IP to assign to the user – a Framed-IP that will show on the dialer interface of the CE. Cisco’s Framed-Route AVP (Attribute Value Pair) can also be used to include static routes.

In this scenario Framed-IP and Framed-Route RADIUS attributes (among others not detailed here) are returned, which gives a WAN IP to the CE and installs a static route onto the LNS router. Each PPP session has one or more LAN ranges associated with it. The static route points traffic for these LAN ranges to the Framed-IP assigned for the PPP session.

The site in this scenario has a /28 network assigned to it. The primary PPP session from CE1 receives two static routes – one for each of the two /29s that the /28 is made up of. The secondary PPP session from CE2 receives a single /28 static route.

These static routes are redistributed into the iBGP running in the service provider network. In the event that a PPP session drops, the associated static routes will be removed from the LNS routers.

Under normal circumstances, incoming traffic will follow either of the two more specific /29s down the primary WAN connection.

There are other ways to prefer one WAN connection over another (using BGP attributes when redistributing or similar) but I’ve used this subnet splitting approach for simplicity.

In the event that the primary WAN connection fails, the following occurs:

For outbound traffic: CE1 lowers its HSRP priority allowing CE2 to take over. Outgoing traffic now goes via CE2.

For inbound traffic: The PPP session on PE1 will drop and both of the static routes will be removed. This leaves the /28 down the secondary WAN connection for traffic to be forwarded down.

But what happens if the FastEthernet0 LAN interface on CE1 fails?

HSRP will fail over, meaning outbound traffic will leave the site via the secondary WAN connection as expected.

However because the PPP session does not drop, the two /29 static routes to CE1 remain in place. Return traffic will traverse this WAN link and end up at CE1. CE1 has no route to the destination and will send it back over its default. Traffic will then loop until the TTL decrements to zero. The site has lost connectivity.

A reconfiguration is needed in order to allow for this situation, which is sometimes called “LAN-side failover”.

The Search

The first and most obvious question might be, why not run a routing protocol, like eBGP, between the PEs and CEs? The PE router would learn about the LAN range over this protocol rather than having static routes. The CEs would use redistribute connected and in the event that the LAN failed, this advertisement would cease.

There are a couple reasons why you might not want to run a dynamic PE-to-CE routing protocol. Firstly, there could be a lot of incoming subscriber sessions on the LNS routers. The overhead involved in running so many eBGP sessions might be too much compared to simply using RADIUS Attributes. Secondly, not all CPEs can support BGP, or whatever PE-to-CE protocol you want to run. Granted, an 887 can, but not all devices have this capability.

So with that said, let’s look at some options for how to deal with this issue…

There are several options to resolve this quirk. I’ll explore two of them here, each of which takes a different approach.

The first option is to ensure that in the event that the LAN interface goes down, the CE router automatically brings down the WAN connection.

Depending on the CPE used, there can be multiple ways to do this. In the case of a Cisco 887, a good way to do this is with EEM scripting. The EEM script can be made to trigger based on a tracking object for the LAN interface. You will also need to make sure that a second EEM script is configured to bring the WAN link back up if the LAN link is restored. I will show an example of such a script below.

An alternative approach is to ensure that there is a direct link between the Active and Standby routers in addition to the regular LAN link. Both LAN connections into each CE router would be in the same VLAN, allowing connection to the SVI. This would mean that if Fa0 dropped, HSRP would not fail over. Traffic leaving the site would still go via CE1, but it would pass through CE2 first and use the direct link between them.

As a side note, it is worth mentioning that one might mistakenly think that CE2, upon receiving outbound traffic, would forward it directly out of its WAN interface in accordance with its default route (causing asymmetric routing when the traffic returns via CE1). But this doesn’t happen. What needs to be remembered is that the routers interfaces are switchports and the destination MAC address will still be 0000.0c07.acxx (where xx is the HSRP group number). CE1 still holds this MAC meaning CE2 will pass it onwards through its switchport rather than routing the traffic.

In my experience this option is preferable. A single cable run and access port configuration is all that is needed. EEM Scripts can be unreliable at times and might not trigger when they should. Having said that, if this needs to be done on the CPE after deployment and remote hands are not possible, the EEM script might be the best approach.

The Work

The general HSRP setup could be as follows:

hostname CE1
!
interface Vlan10
 description SVI for LAN
 ip address 123.123.123.2 255.255.255.240
 standby 10 ip 123.123.123.1
 standby 10 priority 200
 standby 10 preempt
 standby 10 track 1 decrement 150
!
track 1 interface Dialer0 ip routing
!

The EEM script described above will need to trigger when Fa0 goes down. For that, the following tracker is used:

track 2 interface FastEthernet0 line-protocol

This EEM script will shut down the WAN connection if the tracker goes down and restore it if the tracker comes back up:

event manager applet LAN_FAILOVER_DOWN
 event track 2 state down
 action 1.0 syslog msg "Fa0 down. Shutting down controller interface"
 action 2.0 cli command "enable"
 action 3.0 cli command "configure terminal"
 action 4.0 cli command "controller vdsl 0"
 action 5.0 cli command "shutdown"
 action 6.0 cli command "end"
 action 7.0 syslog msg "Controller interface shutdown complete"
!
event manager applet LAN_FAILOVER_UP
 event track 2 state up
 action 1.0 syslog msg "Fa0 up. Enabling controller interface."
 action 2.0 cli command "enable"
 action 3.0 cli command "configure terminal"
 action 4.0 cli command "controller vdsl 0"
 action 5.0 cli command "no shutdown"
 action 6.0 cli command "end"
 action 7.0 syslog msg "Controller interface enabled."

When Fa0 goes drops, the syslog entries look this this:

Feb 27 14:42:18 GMT: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0, changed state to down
Feb 27 14:42:19 GMT: %TRACKING-5-STATE: 2 interface Fa0 line-protocol Up->Down
Feb 27 14:42:19 GMT: %HA_EM-6-LOG: LAN_FAILOVER_DOWN: Fa0 down. Shutting down controller interface
Feb 27 14:42:19 GMT: %CONTROLLER-5-UPDOWN: Controller VDSL 0, changed state to administratively down
Feb 27 14:42:19 GMT: %SYS-5-CONFIG_I: Configured from console by on vty1 (EEM:LAN_FAILOVER_DOWN)
Feb 27 14:42:19 GMT: %HA_EM-6-LOG: LAN_FAILOVER_DOWN: Controller interface shutdown complete

And when it is restored…

Feb 27 14:43:53 GMT: %LINK-3-UPDOWN: Interface FastEthernet0, changed state to up
Feb 27 14:43:53 GMT: %HA_EM-6-LOG: LAN_FAILOVER_UP: Fa0 up. Enabling controller interface.
Feb 27 14:43:54 GMT: %SYS-5-CONFIG_I: Configured from console by on vty1 (EEM:LAN_FAILOVER_UP)
Feb 27 14:43:54 GMT: %HA_EM-6-LOG: LAN_FAILOVER_UP: Controller interface enabled.
Feb 27 14:44:54 GMT: %CONTROLLER-5-UPDOWN: Controller VDSL 0, changed state to up

The second option is simpler and does not require much configuration at all. All we’d need to do is run a cable from Fa1 on CE1 to Fa1 on CE2 and put the following configuration under Fa1:

interface fa1
 description link to other CE for LAN failover
 switchport
 switchport mode access
 switchport access vlan 10

There isn’t much else to show for this solution other than to re-iterate that with this in place, HSRP would not fail over and traffic in both direction would flow via CE2s switchports.

There are other ways to tackle this problem that I have not detailed here (using etherchannel on the LAN perhaps, or something involving floating static routes) and any alternatives ideas would be good to hear about and interesting to discuss. Thanks for reading.

Category: LNS, Multihoming, RADIUSTags: LNS, Multihoming, RADIUS

netquirks

Multihoming without a PE-to-CE Dynamic Routing Protocol

Like this:

Related

Leave a ReplyCancel reply

netquirks

netquirks

Multihoming without a PE-to-CE Dynamic Routing Protocol

Share this:

Like this:

Related

Leave a ReplyCancel reply

netquirks

Discover more from netquirks