Dealing with seamless NSX DFW migrations (1/2)

Introduction

One of the use cases provided by VMware NSX is the Distributed Firewall, which secures application workloads within the datacenter. It’s fully decoupled from the underlying network infrastructure, being able to focus on application-based security. Many customers have already implemented NSX to secure their workloads.

With the upcoming EOL of NSX-v there is an urgent need to migrate to NSX-T. VMware offers the “migration coordinator“-tool, which enables customers to migrate from NSX-V to NSX-T. VMware invests heavily in this tool, but it only offers “big-bang” migration methods for DFW related scenario’s between different vCenters, this is due a software limitation of VMware NSX itself. One of the tasks of the NSX Manager is to translate Security Tags/VMs to IP addresses (VM2IP) which is programmed on the firewall of every vNIC of each VM.
NSX currently does not offer a method of sharing this VM2IP information between NSX Managers (except for NSX Federation, but that is out-of-scope for this blog … It can, however, be a part of your migration-strategy).

To addition: VMware HCX (VMware Hybrid Cloud Connect tool) offers a method of migrating NSX tags along with the VM migration between vCenters/NSX Manager, but BE AWARE that it still doesn’t sync the firewall rules nor will it share the VM2IP information between NSX Managers which can result in communication-errors during and after migrations.

VMware continues to develop on this part of the migration, but sadly there is not a standardized solution available at this moment. With this blog I want to focus on “seamless” migrations of NSX-v to NSX-T and provide you some guidelines which can help you during these types of migration.

Black- vs whitelisting

The NSX DFW is either configured for black- or whitelisting, which can impose a risk on the migration.

When using a blacklisting model, network traffic is allowed by default and only explicitly denied when needed. So when during the migration VM2IP information is not available it will only affect the explicitly denying firewall rules, allowing the traffic by default. This will leave a security hole when something goes wrong during migration, but it usually doesn’t affect production workloads: Workloads can continue to operate normally.

So when there is no possibility to share the VM2IP information between the NSX Managers and you use a blacklisting model, this security risk must be accepted for a migration and you are good to go.

A whitelisting model on the other hand, can impose a great risk: It can be disruptive when correct VM2IP information isn’t available. A strategy which usually isn’t accepted by most customers.
There is a possibility to switch to a blacklisting model prior to a migration and switch back to whitelisting model after the migration has finished. This is a workable solution, but has an impact on project timelines as additional tasks need to be executed.

When there is no possibility to use a black-listing model and the security risk is not accepted, an alternative migration strategy must me executed.
A strategy whereby the VM2IP information must be shared between the NSX Managers during the migration as a fallback method. A strategy which can be daunting in environments with thousands of VM’s.

Sharing VM2IP information between managers

By default there is no functionality available which enables sharing VM2IP information between NSX Managers. The vCenter servers hold this pieces of information, and the NSX Manager retrieves this information from them.

With NSX-T you have the option to register additional computer-managers (a.k.a. vCenter servers) which allow you to access the VM2IP information directly from other vCenters (which is connected to a NSX-V instance for example). Keep in mind that you can only register one (1) NSX-T Manager per vCenter server.
With NSX-v you don’t have that option to register to multiple vCenters, there is only a 1-to-1 relationship between the NSX Manager and a vCenter (als in a universal setup).
In either way NSX does not offer any topology where you can configure bi-directional VM2IP information synchronization: You need to implement a method of sharing this information outside of the standard NSX capabilities (a.k.a. scripted).

NSX offers IPSet objects which can contain one or more IP-addresses/subnets or IP-ranges. These IPSet objects can act as a references for the VM2IP information needed on the destination NSX Manager. For example, if you use a VM in a firewall rule on the source NSX Manager, it can reference to a IP Set object which can be used in a firewall rule on the destination NSX Manager. This will allow a seamless transition when migrating between vCenter servers/NSX Managers.

There are some important points to keep on mind:

Because VM’s are migrated between vCenters/NSX Managers, bi-directional synchronization need be be implemented to order to allow a seamless migration.
There are limitations on IP Set objects (10k objects/4k addresses per object), which must taken into account when creating your migration strategy.
For IP Set object which reference to an object, make it trackable to the source object: by adding the name or object-id in the name or description.
When a VM is migrated, NSX security tags are not migrated automatically. When using Security tags, your migration strategy need to encompass this issue (or use HCX as mentioned).
NSX-T has limitation regarding the objects which can be used in the firewall rules: for example, it’s forbidden to add vCenter objects directly into the firewall rules. Your migration strategy need to encompass this issue. It’s highly recommended to use ONLY NSX Security Groups (or NSGroups) inside firewall rules.
Use as much heterogenous objects as source objects as possible, using mixed objects (VM’s, Security Groups, etc) as source objects will increase synchronization complexity increasing migration risk. Keep It Simple Stupid (KISS)

Conceptual migration steps

The steps below describe a phased migration, mitigating a “big-bang” migration strategy.

In this example I have 2 NSX environments, each NSX environment is connected to it’s own vCenter server. In NSX_A I’ve got 2 VM’s which need to be migrated seamlessly to NSX-B in a whitelisted environment. The default rules are set to any-any-deny and an additional rule is configured which enables communication between VM_A and VM_B.

I’m using the VM objects as source objects for the IP Set Reference Objects (IPSROs), so no NSX Security Groups are configured. It’s perfectly possible to replace the VM-objects by NSX Security Groups in this hypothetical example.

The first step is to setup a bi-directional synchronization between NSX_A and NSX_B. During this step the IPSRO’s and firewall rules in both NSX environments are configured. This step prepares the firewall rulebase for VM migrations. The IPSRO’s are periodically synced (for example every 5 minutes) to overcome any modifications made which can potentially result in outages during migrations.
This periodical sync enables you to seamlessly migrate over a longer period of time, but be aware that an IP changes is not reflected on the IPSRO instantly: it will take some time.

Because the VM objects are not present in the secondary site, the IPSRO’s replace them within the firewall rule, allowing any communication during migration.

enable synchronization between NSX_A and NSX_B

The next step is a migrate a VM to NSX_B, in this example i’m migrating VM-B. You can use the cross-vCenter vMotion functionality or use the cross-vCenter vMotion fling for this migration.
The IPSRO’s are still present in the DFW making the existing firewall rules effective during the migration.

The real problem during this step is not the migration itself, the real problem is that the source VM doesn’t reside on the NSX_A environment anymore, which can disrupt the IPSRO synchronization as this still points to the source site (NSX_A). The synchronization for VM_B must modified so it’s being sourced from the NSX_B environment, essentially re-enabling the bi-directional synchronization.

The next step is to migrate VM_A and modify the sync along with it (just as the previous step).

in a real environment this will become a ongoing process, until all VM’s have been migrated.

After all virtual machines have been migratie, we can move forward en remove all IPSOR’s and decommission the NSX-A environment if needed.

Conclusion

This steps don’t look daunting, but in reality it is.
In my next blog i will provide some Powershell scripts which can help you during these migrations.

To be continued ..