NSX-v: understanding and overcoming DFW firewall rule maximums

In this blog I’m going into the deep down holes of the DFW firewall rule maximums of VMware NSX for vSphere. As the stated maximum on the configmax-website are soft-limits and not hard-limits. Let discuss what the hard limit of the amount of DFW rules is. Let’s start by talking about the Distributed Firewall (DFW).

The Distributed Firewall

The DFW is a firewall which operates at the vNIC level of the VM and they being provisioned from the NSX Manager. The NSX Manager consist of a DFW firewall rulebase for ALL virtual machines.

a.ka. one rulebase to rule them all!

Several components enable the rules to be applied at the vNIC level from the NSXM (global) rulebase:

  • On the NSX Manager runs an Advanced Message Queuing Protocol (AMQP) provided by RabbitMQ.
    It allows the ESXi host to retrieve necessary information through a secure channel.
  • On the ESXi hosts runs the vShield Firewall Deamon (vsfwd) which provides a communication path on behalf of other components running inside the host. It provides a secure shell connection to the Message Broker (AMQP) on the NSX Manager, running on tcp port 5671.
  • Also on the ESXI hosts runs the VMware Internetworking Service Insertion Protocol (vsip), which is an essential component. It is the DFW kernel space core component and downloads the DFW rulebase (in protobuf format) from the NSX Manager (through the vsfwd), converts them to a VSIPIOCTL format and places them (as a filter) on the vNIC of a VM.
    The corresponding VSIPIOCTL binary can be used to view available rule information from the cli for a given filter.
    The downloaded firewall rules are placed in the following location /etc/vmware/vShield-Statefull-Firewal/vsipfw_ruleset.dat .
  • As stated for each vNIC a corresponding DFW filter is created, which can be viewed with the “VSIPIOCTL”- or, the more approachable, “show dfw” command.
  • At the vNIC level the IOChain technology is used, which provides 15 “slots” which can be used to manipulate network traffic traversing the vNIC. The DFW is using a slot (slot 2) to actually filter the traffic from the kernel level of the ESXi host by using the DvFilter.

Within the NSX Manager DFW rulebase, you can include vSphere related objects. But the implemented rules at the vNIC level are IP based. The NSX Manager needs to translate these vSphere objects into IP addresses, it therefor uses information gathered from the vCenter server (through the use of VMware Tools on the VMN) or using ARP/DHCP snooping. Without these translation the DFW firewall cannot operate correctly. For retrieving the IP addresses from a VM the VPXA in the picture is being used (but is outside of the scope for now).

The hard limit

The hard limit is the amount of available heap memory on the ESXi hosts to store DFW rules and IP address-sets. As stated each vNIC has it’s own filter, including the rules and address-sets. When you have a large amount of DFW firewall rules created within the NSX Manager, they all are being published to each vNIC filter consuming a large amount of available memory.

VMware recommends that the amount of available heap memory should be at least 20 precent.
Since VMware NSX version 6.2.4 the amount of available heap memory for hosts larger than 128 GB is increased from 1,5 Gigabyte to 3 Gigabytes as stated per KB article 2146298.

When congestion occurs on the available heap memory, new DFW rules fail to be published to the ESXi hosts.

Monitoring

You can monitor the amount of available heap memory by using the following command.

for n in $(vsish -e ls /system/heaps|grep vsip); do echo $n; vsish -e get /system/heaps/$n'stats'| grep -e "Name\|current\ bytes\ allocated\|maximum\ heap\ size\|percent free of max\|failed"; done

It will show the amount of available free heap memory, which should not exceed the recommended 20%.

Mitigation

You can mitigate against DFW problems by implementing the DFW firewall rulebase in a smart manner:
Each DFW rules has an “Applied To:” parameter which, by default, is configured at the “Distributed Firewall”-level. Which in result are implemented on all vNICs for all VM’s. This can have unexpected results: for example, when you have one rule applied to the Distributed Firewall level and you have a 1000 VMs, the rule is implemented 1000 times. These numbers can grow exponentially.

The “Applied To” parameter can be set to the following configuration:

  • One or more vSphere cluster
  • vSphere Datacenter
  • Distributed virtual port group
  • NSX Edge
  • Network
  • Virtual Machine
  • vNIC
  • NSX logical switch

By utilizing the “applied to” parameter you narrow the scope of the DFW firewall rule which helps reducing the amount of used memory for the DFW filters.

Sources, links (and thanks to):

 https://www.sneaku.com/2017/06/06/monitoring-dfw-heap-usage/

http://www.routetocloud.com/tag/vsip/

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top