OTV for Secured / Firewalled Networks – A Design Consideration

My colleague at (ccie-or-null.net) and I recently came across a design limitation, or “opportunity,” with OTV and firewalls.  The plan was to take a current environment with Layer 3 gateways on a firewall, and OTV those networks across multiple Data Center.  A simplified topology would look like this:

The Blue Layer 3 gateways are on the core devices in each network (10.50.15.0/24 in this example).

The Red Layer 3 gateways are on firewalls at each site (10.50.37.0/24 in this example).

otv1

OTV works beautifully for unsecured (non-firewalled) networks.  However, we quickly discovered an issue with the firewall-secured networks.

The problem

Firewalls are stateful, and we have asymmetric route flows, therefore communications for secured networks with default gateways in two locations will not function because only half of the conversation is actually seen on each firewall.  Here is a drawing illustrating this issue:

otv2

  1. Source 10.50.15.20 initiates a TCP connection to 10.50.37.8 (SYN)
  2. Default Gateway for 10.50.15.20 is 10.50.15.1
  3. Traffic is routed to the local firewall, which is advertising 10.50.37.0/24
  4. Firewall performs an ARP lookup, sees the MAC for 10.50.37.8 and forwards the packet
  5. SYN Packet makes to it to the client
  6. Client needs to route back to 10.50.15.20, so it sends the SYNACK to it’s default gateway, the local firewall
  7. Local firewall receives a SYNACK, checks its state table, but never received the SYN.  Packet is dropped.

The reason this is not an issue with unsecured networks is due to the fact that no firewalls are in place, therefore no stateful inspection, and successful gateway localization at each site.

Possible Solutions:

We immediately collaborated to come up with some alternative solutions to solve this problem.

  1. Configure a default gateway in only 1 site.
  2. Disable stateful inspection on firewalls
  3. Utilize Transparent Firewalls
  4. ASA clustering between Data Centers
  5. Utilize ASRs with Zone-based firewalls and LISP
    1. (I won’t be going into this, but could be a potential solution)
  6. Run VSGs / ASA 1000Vs
    1. (Great for East-West communications in virtual-only environments, currently limited to 400Mbps)

 Let’s take a deeper look at some the possible solutions and inherent challenges with each:

Option 1 – Default Gateway in only a single site

 Pros:

  • Quickest and easiest to implement

 Cons:

  • Doubles latency and link utilization for particular traffic flows due to traffic tromboning.  This could have a detrimental effect on application performance between remote sites.

See examples below, using only a single default gateway for 10.50.37.0/24 (located in Site B):

Example 1: Traffic from unsecured asset in Site A to secured asset in Site B (Optimal traffic flow):

otv3

Example 2: Traffic from unsecured asset in Site A to secured asset in Site A (Sub-optimal traffic flow):

otv4

Example 3: Traffic from Internet client to secured asset in Site A (Sub-optimal traffic flow):

otv5

Example 4: Assuming we Source-NAT traffic to force return traffic back (Optimal traffic flow, however, could have severe adverse effects on various applications):

otv9

We ran some performance tests to determine the impact, and as expected, latency was doubled (sometimes tripled) for particular traffic flows.  This also affects bandwidth utilization – recipe for disaster.

Option 2 – Disable Stateful inspection on firewall (stateless)

On the ASAs we could use TCP state bypass or asr groups to essentially “ignore” the asymmetric routing issue by bypassing the state table.

Depending on your environment, this could be in violation of PCI DSS 3.0 Compliancy (or some other variant):

otv-l3-11

Option 3 – Utilize Transparent Firewalls

 Pros:

  • Ideal for workload mobility between Data Centers
  • Resolves asymmetric state table issue
  • Compliant!

Cons:

  • Time consuming
  • Requires redesign of firewall architecture
  • Depending on the environment – change could be too impactful

In the examples below, I added a third network (10.50.48.0/24), which is also firewall secured.  The purpose of this additional network is to show flow between two separate, secured, OTV’d networks.

Transparent Firewall Topology:

otv7

Flow between OTV’d secured device in Site A to secured device on separate OTV’d network in Site B:

otv11

One caveat with this design is the need to create access-list entries for same-VLAN traffic between Data Centers.  See example:

Flow between OTV’d secured device in Site A to secured device on separate OTV’d network in Site B:

otv12

Option 4 – ASA Clustering between Data Centers

This is actually a pretty sweet solution if your environment is capable of it (recommended by our Cisco SE).  Clustering ASAs eliminates the asymmetric routing issue due to the fact that all firewalls in the cluster share the state table.  All firewalls are capable of forwarding all traffic flows.

One large caveat with this design is the geographic limitation.  Unless your Data Centers are less than 100km away, and latency between your sites is less than 10ms, this design is not currently supported by Cisco.  So, if you’re Data Centers are connected via 1Gbps across multiple states, this may not be a possible solution for you.

More information here:

http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/VMDC/ASA_Cluster/ASA_Cluster/ASA_Cluster.html#wp1417746

Conclusion

Ultimately, your design will depend on your environment.  However, if you plan to OTV firewall-secured networks, keep these design recommendations in mind:

  1. ASA Clustering if your Data Centers are nearby (less than 10ms latency between sites)
  2. Alternatively, use Transparent firewalls
  3. Don’t OTV firewall-secured networks (kidding)

2 comments

  1. Hi,

    Did you try tu use LISP Multihop extension to get it working? Seems to be the best solution in your use case, because you don’t need to configure the same security policies on the two sites firewalls.

    1. Thank you for the comment. I considered LISP, but given that all the platforms involved in the communications did not support LISP, and the fact that no Cisco customer at the time was running this in production, I decided to advise against it. I do agree with you, however, that LISP is the way to go as long as it fits in your network. Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s