Support Center > Search Results > SecureKnowledge Details
'FW-1: State synchronization is in risk. Please examine your synchronization network to avoid further problems!' appears in /var/log/messages file Technical Level
Symptoms
  • In Security Gateway versions R80.10 and below: 'FW-1: State synchronization is in risk. Please examine your synchronization network to avoid further problems!' appears in /var/log/messages file.

  • In R80.20 Security Gateway and above, one or both of the following messages appear:
    "State synchronization is in risk - received 'Reject notification' from remote member, might need to enlarge 'Delta Sync Sending Queue' on that member."
    Or
    "State synchronization is in risk - sent 'Reject notification' to remote member, might need to enlarge 'Delta Sync Sending Queue' on local member."

Cause

This message could appear under extremely high load, when a synchronization update was permanently lost. A synchronization update is considered to be permanently lost when it cannot be retransmitted because it is no longer in the transmit queue of the update originator.


Solution

Table of Contents:

  • Background
  • Recommendations for ClusterXL
  • Recommendations for Nokia IPSO-based cluster

 

Background

This message could appear under extremely high load, when a synchronization update was permanently lost. A synchronization update is considered to be permanently lost when it cannot be retransmitted because it is no longer in the transmit queue of the update originator.

This scenario does not mean that the Security Gateway will malfunction, but rather that there is a potential problem. The potential problem is harmless, if the lost sync update was to a connection that runs only on a single member as in the case of unencrypted (clear) connections (except in the case of a failover, when the other member needs this update).

The potential problem can be harmful when the lost sync update refers to a connection that is non-sticky, as is the case with encrypted connections. In this case the other cluster member(s) may start dropping packets relating to this connection, usually with a TCP out of state error message.

 

Recommendations for ClusterXL

If this message occurs only once immediately following Full Sync, it can be safely ignored. If this message appears erratically, then refer to these recommendations:

  • Configure a dedicated State Synchronization network (only the State Synchronization traffic should pass through those interfaces).

  • Reduce the amount of State Synchronization traffic by disabling the Synchronization for non-critical services (e.g., short HTTP connections, DNS UDP, ICMP, etc).

    Disable the Synchronization for a service if ALL of the following conditions are true:
    1. A significant portion of the traffic crossing the cluster uses a particular service. If you do not synchronize this service, then the amount of synchronization traffic is reduced and cluster performance is enhanced.

    2. The service usually opens short connections, whose loss may not be noticed. DNS (over UDP) and HTTP are typically responsible for most connections, and generally have very short life and inherent recoverability at the application level. However, services, which typically open long connections, such as FTP, should always be synchronized.

    3. Configurations that ensure bi-directional stickiness for all connections do not require synchronization to operate (only to maintain High Availability). Such configurations include:
      • Any cluster in High Availability mode (for example, ClusterXL New HA or Nokia VRRP)
      • ClusterXL in a Load Sharing mode with clear connections (no VPN or static NAT)
      • OPSEC clusters that guarantee full stickiness (refer to the OPSEC cluster's documentation)


  • If disabling the synchronization is not possible, then consider enabling the Delayed Synchronization for those services (requires SecureXL).

  • Analyze the performance of each cluster member per sk33781 - Performance analysis for Security Gateway NGX R65 / R7x and sk98348 - Best Practices - Security Gateway Performance. Correlate the appearance of this message to the CPU and memory utilization on the cluster members.

    • For versions R80.10 and below:

    • Increase the Sync "Sending Queue Size" and the Sync "Receiving Queue Size" on all cluster members per sk82080 - /var/log/messages are filled with 'kernel: FW-1: fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last 410 packets'.

    • In R80.20 Security Gateway and above:

      Depending on which message appears in the log, different actions are required:

      • "State synchronization is in risk - received 'Reject notification' from remote member, might need to enlarge 'Delta Sync Sending Queue' on that member." - This message means that the local cluster member sent a Delta Sync retransmission request which the remote member rejected. This happens when the remote member no longer has the requested Delta Sync in its Sending Queue. In this case it's recommended to increase the "Sending Queue Size" on the remote member.

      • "State synchronization is in risk - sent 'Reject notification' to remote member, might need to enlarge 'Delta Sync Sending Queue' on local member." - This message means that the local member rejected the remote member's retransmission request. This happens when the requested Delta Sync is no longer in the local member's Sending Queue. In this case it's recommended to increase the Sync "Sending Queue Size" on the local member.

      Another way to recognize when to increase the size of the Sending Queue and Receiving Queue is to use the'cphaprob syncstat' command. The output of this command has a "Sync at risk" section which includes "Sent reject notifications" and "Received reject notifications" counters. These counters show how many times a cluster member has sent or received a notification for a Delta Sync retransmission request.

      • If the "Received reject notification" counter is increasing, increase the size of the Sending Queue.

      • If the "Received reject notification" counter is not increasing, increase the size of the Receiving Queue.

      Note: If an increase in one of the queues is required, it's recommended to increase Sending and Receiving Queue sizes on all members, to avoid the following scenarios:

      • If the local member's Sending Queue size is increased without increasing the remote member's Receiving Queue size, this may result in the local member sending a larger amount of Delta sync updates than the remote member can handle. Resulting in the remote member losing Delta Sync updates.

      • If the local member's Receiving Queue size is increased without increasing the remote member's Sending Queue size, this may result in the local member sending Delta Sync retransmissions requests for updates which the remote member no longer has. Resulting in the local member losing Delta Sync updates.


    • Best Practice is to maintain the default ratio between the Sending Queue size and the Receiving Queue size, meaning the Sending Queue Size is double the Receiving Queue size.

      The way to increase queue sizes is explained in sk82080 - /var/log/messages are filled with 'kernel: FW-1: fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last 410 packets'.

If the error message persists, then consider blocking new connections under high load, as explained in:

 

Recommendations for Nokia IPSO-based cluster

To reduce or eliminate these messages in Nokia IPSO-based cluster, verify that you followed these recommendations:

  • Sync network needs to be dedicated to Check Point State Synchronization only. It is not recommended to run VRRP or IPSO Cluster on Sync interfaces (refer to sk39179).

  • Sync interfaces should be configured as a non-ADP interfaces.

  • Sync interfaces should be the same speed or faster than the fastest interface on the VRRP or IPSO Cluster. However, this recommendation is impractical when 10 gigabit interfaces are employed. In practice, no more than 2 gigabits of throughput is needed for sync traffic on IP Series Appliances. This means a pair of non-ADP gigabit Ethernet interfaces will be sufficient, though in many cases a single gigabit Ethernet interface will suffice.

  • It is recommended to use a dedicated VLAN on a switch, for Check Point State Synchronization traffic from a single cluster only, i.e., you should not mix Check Point State Synchronization traffic from other cluster members across this dedicated VLAN. Use of cross-over cable is also supported in a 2-node cluster. Choosing to use a switch or a cross-over cable is an environment preference.

  • For VRRP only - disable synchronization for certain services. This will help stabilize the systems because:

    • Less memory will be demanded by the sync process.
    • Less CPU time will be demanded by the sync process.


    The short HTTP, and DNS UDP services are good candidates to be taken out of synchronization. Because of their short life, they are not affected in a fail-over scenario without synchronization.

  • The following limitations are applicable for State Synchronization over wide area network:

    • The synchronization network must guarantee no more than ~20-30 ms latency and no more than ~2-3% packet loss.
    • The synchronization network may only include switches and hubs. No routers are allowed on the synchronization network, because routers will drop Cluster Control Protocol (CCP) packets, which are sent either in Multicast, or in Broadcast mode, and thus non-routable.


  • If IP cluster is configured between two IPSO Appliances located in two different cities, CCP and VRRP advertisements need to be update quite frequently. Any latency can cause both cluster members to behave abnormally. Also, if there is a break in the WAN link, both will become master, which can also cause problems.

    The solution is to ensure minimal latency between the cluster members and a highly reliable link.

  • Increase the Sync "Sending Queue Size" and the Sync "Receiving Queue Size" on all cluster members per sk82080 - /var/log/messages are filled with 'kernel: FW-1: fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last 410 packets'.

If the error messages persist, then consider blocking new connections under high load, as explained in:

Applies To:
  • This solution integrates sk40519

Give us Feedback
Please rate this document
[1=Worst,5=Best]
Comment