This article provides general performance guidelines for working with VPN in Security Gateway R77 and above.
Table of Contents:
VPN and SecureXL
Choosing an encryption algorithm and AES-NI
Insights into SSL VPN Gateway Performance
(1) Interface Affinity
As an example, we will use a 12600 appliance with a default CoreXL configuration of "2:10"
2 CPU cores are used for Secure Network Distributor (SND), and
10 CPU cores are used for CoreXL FW instances, and
2 NICs - 1 LAN, 1 WAN.
In R76 and lower, it is recommended to manually configure static interfaces affinity (refer to "Related documentation" section). For example:
eth0 to CPU 0
eth1 to CPU 1
In R77 and above, configuring interfaces affinity manually is usually not needed.
In R76 and lower, for interfaces using igb driver and ixgbe driver, or for high performance set ups, static interfaces affinity should be configured.
Enabling Multi-Queue is recommended when a single interface receives too much traffic for a single CoreXL SND to handle (refer to "Related documentation" section). If the CPU utilization on the CPU cores running as CoreXL SND is high while the CPU utilization on CPU cores running as CoreXL FW instances is relatively idle, then administrator should consider reducing the amount of CoreXL FW instances from 10 to 8, releasing more CPU cores to run as CoreXL SND.
This is relevant only if you have 2 NICs with Multi-Queue enabled, or 4 NICs.
(2) VPN and SecureXL (relevant to Site-to-Site and IPSec Remote Access)
When SecureXL is enabled, Encrypt-Decrypt actions usually take place on SecureXL level (on CPU cores running as CoreXL SND). All VPN traffic will be handled on the CPU cores running as CoreXL SND under the following conditions:
Only "Firewall" and "IPSec VPN" software blades are enabled
There are no fragmented packets
SecureXL acceleration is not disabled by any of the security rules (refer to sk32578)
VPN features that are disqualified from SecureXL (see below) are disabled
If all the above conditions are met, all VPN traffic will be handled on CPU cores running as CoreXL SND with minimum traffic being forwarded to the CoreXL FW instances, resulting in multi-core processing of VPN traffic (depending on the number of CPU cores running as CoreXL SND).
The following VPN features are handled by CPU cores running as CoreXL FW instances:
Fragmented VPN packets
Any compression algorithms (go to IPSec VPN Community properties - "Advanced Settings" pane - "Advanced VPN Properties")
Using HMAC-SHA384 for data integrity and authentication (refer to sk104578)
Any transport mode SA (used in L2TP clients and GRE tunnels)
Multicast IPsec (GDOI)
Monitoring Software Blade - if in addition to "System Counters", also "Traffic" counters are enabled in Security Gateway object (in such a case, connections are flagged with "Accounting" flag in the output of "fwaccel conns" command)
Any Software Blades other than "Firewall" are used
When Software blades (other than "Firewall") are enabled on VPN traffic (for example, Application Control), Encrypt-Decrypt will still take place on SecureXL level (on CPU cores running as CoreXL SND), but the clear packets will be forwarded to a CoreXL FW instance for the blade's processing. All forwarded traffic related to a VPN tunnel will be handled by a single CoreXL FW instance #0, causing a bottleneck on a single CPU core on R77.30 and earlier Security Gateways. (The issue was resolved in R80.10 and later; for more information, see sk118097.)
To mitigate this, two Security Gateways can be used:
The external Security Gateway (participating in VPN Community) performing the VPN Encrypt-Decrypt
The internal Security Gateway (protecting the internal network) with blades activated
Alternatively, a VSX Gateway can be used with the following internal topology:
(VPN) --- [Virtual System dedicated to VPN traffic] --- [Virtual Switch] --- [Virtual System with activated software blades] --- internal network
AES-NI is Intel's dedicated instruction set, which significantly improves the speed of Encrypt-Decrypt actions and allows one to increase VPN throughput (Site-to-Site, Remote Access and Mobile Access). The general speed of the system depends on additional parameters.
Check Point supports AES-NI on the following appliances (only when running Gaia OS with 64-bit kernel):
AES-GCM (128-bit and 256-bit), which shows the most significant improvement - with AES-NI, it is faster than AES-CBC, when both sides support AES-NI. Without AES-NI support, it is slightly slower than AES-CBC + HMAC-SHA1.
AES-GCM is not recommended in the following scenarios:
Communities with Security Gateway R75.30 and lower - GCM support is partial.
Communities with Security Gateway R75.47 and lower - You should still consider this, if those Security Gateways handle very little VPN traffic.
Check Point Appliances, which do not support AES-NI - 12200 model, all 4000 series, all 2000 series (in addition, Gaia OS in 32-bit mode does not support AES-NI).
Communities with Check Point 600 / 1100 / Security Gateway 80 Appliances - best throughput can be achieved with AES-128.
Note: AES-GCM is not supported by SAM card - best throughput can be achieved with AES-128.
Visitor Mode is supported by the legacy SecureClient and by Endpoint Connect (Endpoint Security) Client.
Each packet in Visitor Mode is processed in user space, which causes a load on CPU on Security Gateway (only several hundred Visitor Mode clients can be handled by the Security Gateway).
In SecureClient, if enabled by the user, Visitor Mode is never automatically turned off. It is recommended that users only enable Visitor Mode when essential (typical to Airport and Hotel Wi-Fi spots), and disable it afterwards.
(4) Insights into SSL VPN Gateway Performance
It is recommended to use a dedicated Check Point appliance as the SSL VPN Gateway.
A Load Sharing cluster is preferable to a stronger appliance in most cases.
In Load Sharing mode, Sticky Decision Function (SDF) is enabled automatically. By design, SDF disables SecureXL, which decreasing performance of IPSec clients. You may disable Sticky Decision Function (SDF) through Smart Dashboard. When Mobile Access blade is enabled, Sticky Decision Function (SDF) is forced and cannot be disabled.
Simultaneous Logins - lowering the number of simultaneous logins increases capacity.
Web compression - saves bandwidth, but increases load on Security Gateway's CPU.
Logging in security rules - increases load on Security Gateway's CPU.
The appliance sizing tool (sk88160) should only be used for 5000 users or less.
It is important to understand the web application complexity when using sizing tools. Tunneled applications, such as Remote Desktop users, consume considerably more bandwidth than Outlook Web Access (OWA). In turn, OWA consumes more bandwidth than simple web applications.
IPS - disabling some Web Intelligence protections may decrease CPU utilization on Security Gateway.
Traditional Anti-Virus causes throughput and concurrent connections degradations in SSL VPN.
Client Authentication - using client certificate for authentication increases CPU utilization the most.
To improve latency, verify short response time of internal servers, DNS servers and Proxy server.
Starting in R76, increasing the number of CoreXL FW instances linearly increases the SSL session rate.
HTTP requests are served by Apache Web Server processes. Due to the fact that these processes are not multi-threaded, each Apache process serves one HTTP request at a time. Each Apache process consumes approximately 2 MegaBytes of RAM per HTTP request. In order not to exhaust the machine's RAM, there is a cap of ~15% of machine's RAM for Apache processes.
For example, a machine with 2GB RAM will have up to 148 Apache processes. The 15% cap is for Mobile Access integrated software blade.
It is possible to configure Mobile Access as a single software blade running on a dedicated machine (refer to sk53003). However, in R75.30 and lower, the total number of Apache processes could not exceed 990 - even if there was enough RAM.
Usually, an Apache process can serve up to 4 users. Therefore, if [number of users] < [4 x number of Apache process], then consider reducing the cap to free memory for other uses.
When an HTTP request is served, the connection may not be immediately closed. Apache has the possibility to keep the connection open for a configurable period of time though keep-alives. The keep-alive configuration is used to reduce web browser request latency by allowing subsequent requests on the same connection to be served by this Apache process instead of spawning a new instance.
Due to the fact that there is a cap for the number of Apache processes, this configuration could cause HTTP requests to be dropped because Apache processes are "waiting" for requests before handling other connections.
By default, starting in R75.40, the keep-alive is set to 2 seconds.
For maximum capacity, disable Apache keep-alive.
For minimum latency, increase the Apache keep-alive
Avoid SSL on internal communications
Prefer Kerberos over NTLM for authentication, because NTLM authentication is re-done for every TCP connection. Using Kerberos reduces bandwidth between Security Gateway and the internal server, and reduces latency to the end user.
Hostname Translation is preferred over Link Translation because Hostname Translation configuration reduces CPU and RAM utilization, improves throughput and latency and reduces bandwidth (refer to R77 Mobile Access Administration Guide).
Link Translation Domain: remove external websites from the list of websites that will be translated for improved performance and capacity.
Use R75.40 or above to support more than 700 concurrent ActiveSync clients.
SNX Application Mode
There can be up to 512 simultaneous SNX Application Mode connections (file descriptors limit).
Throughput is limited by one CPU core for SSL processing (refer to the "SNX Network Mode" section below).