Support Center > Search Results > SecureKnowledge Details
VRRP cluster member stops responding when queried over SNMP, or when SNMP Traps should be sent
Symptoms
  • Failover in VRRP cluster on Gaia OS might take several minutes (for all VRRP addresses to be completely taken over by the new VRRP Master), which causes traffic and connectivity loss.
    Sometimes, the VRRP failover does not work at all and a reboot is needed.

  • VRRP cluster member stops responding and a reboot is needed in the following scenario:
    1. Continuously send SNMP queries to a VRRP interface
    2. Shut down that VRRP interface
    3. Bring up that VRRP interface
    4. VRRP cluster member stops responding even on the console connection
  • Disabling SNMP on the VRRP cluster members resolves the failover issue.

  • Running strace utility attached to the involved daemons during the issue shows:

    • Attached to RouteD daemon:
      connect(25, {sa_family=AF_UNIX, path="/tmp/snmptrap"}, 16 <unfinished ...>
    • Attached to ConfD daemon:
      connect(16, {sa_family=AF_UNIX, path="/tmp/iclid"}, 13 <unfinished ...>
    • Attached to SnmpD daemon:
      connect(10, {sa_family=AF_UNIX, path="/tmp/iclid"}, 13<unfinished ...>
Cause

Deadlock between RouteD, SnmpD and ConfD daemons:
RouteD daemon is using SNMP library and its I/O is blocked on getting a response from the SNMPD daemon. When the system gets really busy, RouteD daemon cannot process requests from ConfD daemon because RouteD daemon is waiting for the SNMPD daemon, which is not running.


Solution
Note: To view this solution you need to Sign In .