Support Center > Search Results > SecureKnowledge Details
60000 / 40000 Appliances - Traffic outage can occur on SSMs during large traffic volume on Mgmt interfaces on the SSMs Technical Level
Symptoms
  • Connectivity to SSMs is lost intermittently during traffic load on Mgmt interfaces on the SSMs.

  • /var/log/messages file shows the following:

    ch01-01 cmd: SSM 1 is not pingable
    ch01-01 cmd: SSM 2 is not pingable
    ch01-01 cmd: Running SSM check alive test from blade 1_2
    ... ...
    ch01-01 cmd: SSM pingable test from another blade not succeeded, setting SSM1 to down
    ch01-01 cmd: Failed to communicate with SSM, run check alive test
    ch01-01 cmd: SSM communication retry failed
    ch01-01 cmd: Chassis 1 SSM2 is down, 0 SSM(s) currently active
  • Kernel debug ('fw ctl debug -m fw + drop') shows that traffic is dropped:

    ... dropped by fwkdrv_enqueue_packet_user_ex Reason: Instance is currently fully utilized, vsid 0, instance 0
Cause

As a result of large traffic volume on Mgmt interfaces on the SSMs, the CMD (chassis monitor daemon) can consider the SSM1, SSM2 and the CMMs to be DOWN. In addition, a break in the communication between the different hardware components (including SSH to the SMO) can occur.

Chain of events (based on VSX mode):

  1. Mgmt interfaces on each SSM (per chassis), will send incoming traffic to the SMO.
  2. Each packet is handled by design by VS0.
  3. When the Mgmt interface is flooded with traffic, VS0 inbound queue can become fully utilized.
    As a result, the dispatcher could not pass new packets.
  4. Packets coming from eth1-CIN and eth2-CIN interfaces are the keep-alive packets from different hardware components on the chassis: SSM1, SSM2, CMM1, and CMM2.
  5. SMO sends out the keep-alive messages (via the CIN interfaces) to the SSMs/CMMs and waits for their response.
  6. Due to the fully-utilized packet queue, these keep-alive packets (and SSH packets) are dropped by the dispatcher.
  7. CMD (chassis monitor daemon) considers the SSMs and CMMs as not available, which causes traffic outage.

Note: In VSX VSLS mode, the SMO is responsible for the chassis-monitor task.


Solution
Note: To view this solution you need to Sign In .