Support Center > Search Results > SecureKnowledge Details
Check Point Security Gateway on SecurePlatform / Gaia freezes, crashes, or reboots randomly, core dump files are not created
Symptoms
  • Security Gateway running SecurePlatform / Gaia OS freezes, crashes, or reboots randomly, core dump files are not created.
Solution

Table of Contents:

  • Background
  • Procedure
  • Related solutions

 

Background

Due to various circumstances, the Security Gateway might freeze, or crash. In such cases, no information can be written into any system logs.

To understand what might have caused such failure, we have to extract the necessary information from the operating system (memory stack).
SecurePlatform / Gaia operating system can be configured to dump core files. However, the failure might be so hard, that the core files can not be dumped.
In these cases, we need to "seize" the moment when SecurePlatform / Gaia freezes or crashes and extract the memory stack directly from the kernel.

The following procedure explains how to prepare the problematic machine for the next occurrence of freeze, or crash.

 

Procedure

Note: This procedure is a simple change in configuration of Linux kernel, which has no impact on performance.

A demonstration of the procedure on R77.30 Security Gateway running Gaia OS:

We need to configure SecurePlatform / Gaia OS kernel on problematic machine in certain way, so during the problem it produces an output on the console, and accepts the input from keyboard.

The procedure involves the following basic steps:

  1. Configuring SecurePlatform / Gaia OS kernel to produce an output on the console and accept input from keyboard during the freeze, or crash
  2. Connecting another machine to the problematic Security Gateway using a Serial (RS232) cable
  3. Rebooting the problematic Security Gateway into "Debug Mode"
  4. Testing that we are able to communicate with the kernel directly
  5. Waiting for the next occurrence of freeze, or crash
  6. Extracting the necessary information from the kernel (memory stack)
  7. Restoring the original configuration

Detailed instructions:

  1. Back up the /boot/grub/grub.conf file:

    [Expert@HostName]# cp /boot/grub/grub.conf /boot/grub/grub.conf_backup
  2. Connect over Console and run 'w' command - note the number of Serial Terminal - ttyS0 or ttyS1

  3. Edit the /boot/grub/grub.conf file - modify the value of 'console=' parameter to be the number of Serial Terminal - 'console=ttyS0' or 'console=ttyS1'

    Modify the parameter for 'Debugging Mode':

    • For Gaia OS 64-bit (R75.40 and above)

      title Start in 64bit online debug mode
          root (hd0,0)
          kernel /vmlinuz-x86_64 ro  root=/dev/vg_splat/lv_current vmalloc=256M  panic=15 console=CURRENT kdb=on crashkernel=128M@16M 3
          initrd /initrd-x86_64
      
    • For Gaia OS (R75.40 and above)

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=256M  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd
    • For NGX SecurePlatform 2.6 (R70 and above)

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=256M  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd
    • For NGX SecurePlatform 2.6 (R65 ENFv26)

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd 
    • For NGX SecurePlatform 2.4 (R60 - R65)

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ console=CURRENT kdb=on 3 
          initrd /initrd 
    • For VSX NGX R67 SecurePlatform 2.6

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=128M thash_entries=4096 panic=15 console=CURRENT kdb=on quiet 3
          initrd /initrd
    • For VSX NGX R65 SecurePlatform 2.4

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ console=CURRENT kdb=on 3 
          initrd /initrd 
    • For VSX NGX Scalability Pack SecurePlatform 2.4

      title SecurePlatform VSX NGX Scalability Pack [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-2.4.18-10cp41_kdb ro root=/dev/sda2 console=CURRENT 3 
          initrd /initrd-2.4.18-10cp41_kdb.img 
    • For VSX NGX SecurePlatform 2.4

      title SecurePlatform VSX NGX [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-2.4.18-10cp27_kdb ro root=/dev/sda2 console=CURRENT 3 
          initrd /initrd-2.4.18-10cp27_kdb.img 
    • For NG AI SecurePlatform 2.4

      title SecurePlatform NG with Application Intelligence (R55) [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-smp_kdb ro root=/dev/sda3 console=CURRENT 3 
          initrd /initrd-smp_kdb 
  4. Note: In some rare cases it has been found that the USB drivers can cause conflict with KDB.

    If KDB generates an 'Oops' message when attempting to switch into KDB mode, then at the end of the kernel line in /boot/grub/grub.conf file,
    the user should add 'nousb' parameter (e.g, '...console=ttyS0 kdb=on 3 nousb').

  5. Using a serial cable, connect a separate machine to port COM# (see Step 2 above).

  6. From this separate machine, connect to the Security gateway using a Terminal program such as PuTTY / SecureCRT / TeraTerm Web / HyperTerminal.
    The connection parameters for this are the RS232 default (9600, NONE, 8, 1, NONE).

    Related solution: sk108095 - Serial console connection configuration for Check Point appliances.

  7. You should be able to communicate with the crashing machine through the COM port.
    Make sure you have a serial connection with the machine - you should see the usual prompt [Expert@HostName] and you should be able to run commands.

  8. Reboot the crashing machine. When you see the prompt "Press any key to see the boot menu..." - press any key and start the machine in Debugging Mode:

    1. Wait for the login prompt (login:) - log in to the system.

    2. In NGX R60 and above, you have to enable the on-line kernel debugger:

      [Expert@HostName]# echo 1 > /proc/sys/kernel/kdb
      [Expert@HostName]# cat /proc/sys/kernel/kdb

      Output should show 1

    3. Try switching to the kernel prompt by one of the following methods:

      • pressing CTRL+A (you will see ^A) and then pressing Enter
      • pressing CTRL+AA (you will see ^A^A) and then pressing Enter
      • pressing CTRL+C (you will see ^C) and then pressing Enter
      • pressing Esc+K+D+B and then pressing Enter
      • using the Send a Break Signal option in the Terminal program

      Important Note: Keyboard layout must be English, otherwise machine will freeze.

    4. The prompt should change:

      from the usual Bash prompt:
      [Expert@HostName]#

      to the kernel prompt:

      kdb>
      Entering kdb (current=0xXXXXXXXX, pid 0) due to Keyboard Entry
    5. Type ? or help to see all available commands (in sub-menu more>, type q or Q and then press Enter to quit).

  9. From the kernel prompt run:

    kdb> bt

    For each function, the following information is given:

    • The pointer to the stack frame (ESP).
    • The current address within this frame (EIP).
    • The return address from the previous frame converted to a function name and an offset of the address within the function.
      If this is the first frame, the address is of the current location (EIP Register).
    • The function arguments.

    You should get a Stack output on the screen - multiple lines:

    that look similar to these:

    EBP        EIP         Function(args)
    0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                   kernel .text 0x80100000 0x8011ec20 0x8011ecf0
    

    or that look similar to these:

    EBP        EIP         Function(args)
    0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
    
  10. From the kernel prompt run:

    kdb> go

    You should exit the on-line kernel debugger and enter a regular prompt [Expert@HostName]#

    Note:
    Occasionally, the 'go' command might not work, and the error is displayed: 'Catastrophic error detected'.
    In such case, instead of returning to a regular prompt, the machine might br stuck in KDB prompt, or even crash.
    Such kernel's behavior might be caused by numerous reasons.
    If the machine is stuck in KDB prompt, reboot the machine manually.
    Otherwise, let the machine crash and reboot itself...

    Example:
    [0]kdb> go
    Catastrophic error detected
    kdb_continue_catastrophic=0, type go a second time if you really want to continue
    [0]kdb> go
    Catastrophic error detected
    kdb_continue_catastrophic=0, attempting to continue
    Kernel panic - not syncing

  11. Make sure the serial connection is still working and that you can run commands.

  12. Leave this serial machine working and connected.
    Wait until the freeze/crash occurs again.

  13. When the freeze/crash occurs, on the machine, which is still connected over serial console, repeat Step 8-C to switch to kernel debug prompt kdb>

  14. See what processes are running and copy their complete list (to match the PID and the stacks):

    kdb> ps

    You should get an output on the screen - multiple lines that look like these:

    Task Addr       Pid   Parent [*] cpu State Thread     Command
    0x9fdcaaa0        1        0  0    0   S  0x9fdcac50  init
    0x9ffe2550     1263        1  0    0   S  0x9ffe2700  syslogd
    0x9fe95550     1272        1  0    0   S  0x9fe95700  klogd
    0x9fc0b000     1518        1  0    0   S  0x9fc0b1b0  sshd
    0x9fc0baa0     1563        1  0    0   S  0x9fc0bc50  crond
    ................................................
    more>
    ................................................
    0x97325000     1935     1922  0    0   S  0x973251b0  login
    0x9fc5caa0     1936     1935  0    0   S  0x9fc5cc50  bash
    

    Then press Enter until you see the kernel prompt again kdb> - copy the complete list of the processes.

  15. In NGX R60 and above, display syslog buffer and copy the complete output:

     

    kdb> dmesg

    text
    text
    more>

     

    Then press Enter until you see the kernel prompt again kdb> - copy the complete output.

    Note: The output will be long.

  16. On Gaia OS / SecurePlatform 2.6, display the summary:

    kdb> summary

    You should get an output on the screen - multiple lines that look like these:

    summary
    sysname    Linux
    release    2.6.18-92cp
    version    #1 SMP Wed Feb 4 17:35:21 IST 2009
    machine    i686
    nodename   Member_A
    domainname 
    date       2009-06-13 19:35:58 tz_minuteswest -180
    uptime     00:03
    load avg   0.81 0.77 0.33
    
    MemTotal:       502364 kB
    MemFree:        122796 kB
    Buffers:         17176 kB
    Cached:         132716 kB
    SwapCached:          0 kB
    Active:         195680 kB
    HighTotal:           0 kB
    HighFree:            0 kB
    LowTotal:       502364 kB
    LowFree:        122796 kB
    SwapTotal:     2096440 kB
    SwapFree:      2096440 kB
    Dirty:             368 kB
    AnonPages:      122604 kB
    Mapped:          24636 kB
    Slab:            13328 kB
    PageTables:       1572 kB
    NFS_Unstable:        0 kB
    Bounce:              0 kB
    CommitLimit:   2347620 kB
    Committed_AS:   385804 kB
    VmallocTotal:  1566712 kB
    VmallocUsed:     94100 kB
    VmallocChunk:  1470332 kB
    HugePages_Total:     0
    HugePages_Free:      0
    HugePages_Rsvd:      0
    Hugepagesize:     2048 kB
    kdb>
    
  17. If you have single CPU, then perform the following steps
    (if you have multiple CPUs (cpsmp), then go to the next Step):

    1. Start the stack traceback from the CPU:

      [0]kdb> bt

      You should get a Stack output on the screen - multiple lines:

      that look similar to these:

      EBP        EIP         Function(args)
      0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                     kernel .text 0x80100000 0x8011ec20 0x8011ecf0
      0x8194bec8 0x8011d725 schedule+0x135
                                     kernel .text 0x80100000 0x8011d5f0 0x8011d860
      

      or that look similar to these:

      EBP        EIP         Function(args)
      0x807adf48 0x80431802 __do_softirq+0x62
      0x807adf6c 0x804318cb do_softirq+0x3b
      0x807adf74 0x80431b96 irq_exit+0x36
      0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
      0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
      0x807adfac 0x80403e41 default_idle+0x31 
      
    2. Copy the Stack output from the screen.

    3. Important for SecurePlatform 2.6:
      In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
      Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

      kdb> mds %esp

      You should get a Stack output on the screen - multiple lines that look like these:

      0x885d1d1c 00000000   ....
      0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
      0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
      
    4. Copy the Stack output from the screen.

  18. If you have multiple CPUs (cpsmp), then perform the following steps in the context of each CPU core:

    1. Check the available CPU contexts
    2. For the context of the 1st CPU core:
      1. Go to the context of the CPU core
      2. Verify the CPU context
      3. Start the stack traceback from the 1st CPU core
      4. Copy the Stack output from the screen
    3. For the context of each additional CPU core:
      Repeat the above steps to collect the Stack in the context of each CPU core

    Example for the machine with 2 CPU cores:

    1. Check the available CPU contexts. In the kernel prompt, run:

      [0]kdb> cpu
    2. Note the number of CPU contexts ("Available cpus" in the following output):

      Currently on cpu 0
      Available cpus: 0, 1

    3. Go to the context of the 1st CPU core. In the kernel prompt, run:

      [0]kdb> cpu 0

    4. The prompt should change to:

      Entering kdb (current=0xXXXXXXXX, pid 0) on processor 0 due to cpu switch.

    5. Verify the CPU context ("Currently on cpu" in the following output):

      [0]kdb> cpu
      Currently on cpu 0
      Available cpus: 0, 1

    6. Start the stack traceback from the 1st CPU core:

      [0]kdb> bt

      You should get a Stack output on the screen - multiple lines:

      that look similar to these:

      EBP        EIP         Function(args)
      0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                     kernel .text 0x80100000 0x8011ec20 0x8011ecf0
      0x8194bec8 0x8011d725 schedule+0x135
                                     kernel .text 0x80100000 0x8011d5f0 0x8011d860
      

      or that look similar to these:

      EBP        EIP         Function(args)
      0x807adf48 0x80431802 __do_softirq+0x62
      0x807adf6c 0x804318cb do_softirq+0x3b
      0x807adf74 0x80431b96 irq_exit+0x36
      0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
      0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
      0x807adfac 0x80403e41 default_idle+0x31 
      
    7. Copy the Stack output from the screen.

    8. Important for SecurePlatform 2.6:
      In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
      Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

      kdb> mds %esp

      You should get a Stack output on the screen - multiple lines that look like these:

      0x885d1d1c 00000000   ....
      0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
      0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
      
    9. Copy the Stack output from the screen.

    10. Go to the context of the 2nd CPU core. In the kernel prompt, run:

      [0]kdb> cpu 1
    11. The prompt should change to:

      Entering kdb (current=0xXXXXXXXX, pid 0) on processor 1 due to cpu switch.
    12. Verify the CPU context ("Currently on cpu" in the following output):

      [1]kdb> cpu
      Currently on cpu 1
      Available cpus: 0, 1
    13. Start the stack traceback from the 2nd CPU core:

      [1]kdb> bt

      You should get a Stack output on the screen - multiple lines:

      that look similar to these:

      EBP        EIP         Function(args)
      0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                     kernel .text 0x80100000 0x8011ec20 0x8011ecf0
      0x8194bec8 0x8011d725 schedule+0x135
                                     kernel .text 0x80100000 0x8011d5f0 0x8011d860
      

      or that look similar to these:

      EBP        EIP         Function(args)
      0x807adf48 0x80431802 __do_softirq+0x62
      0x807adf6c 0x804318cb do_softirq+0x3b
      0x807adf74 0x80431b96 irq_exit+0x36
      0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
      0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
      0x807adfac 0x80403e41 default_idle+0x31
      
    14. Copy the Stack output from the screen.

    15. Important for SecurePlatform 2.6:
      In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
      Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

      kdb> mds %esp

      You should get a Stack output on the screen - multiple lines that look like these:

      0x885d1d1c 00000000   ....
      0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
      0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
      
    16. Copy the Stack output from the screen.

  19. Return to the normal shell. In the kernel prompt, run:

    kdb> go

    You should exit the on-line kernel debugger and enter a regular prompt [Expert@HostName]#

    Note:
    Occasionally, the 'go' command might not work, and the error is displayed: 'Catastrophic error detected'.
    In such case, instead of returning to a regular prompt, the machine might be stuck in KDB prompt, or even crash.
    Such kernel's behavior might be caused by numerous reasons.
    If the machine is stuck in KDB prompt, reboot the machine manually.
    Otherwise, let the machine crash and reboot itself...

    Example:
    [0]kdb> go
    Catastrophic error detected
    kdb_continue_catastrophic=0, type go a second time if you really want to continue
    [0]kdb> go
    Catastrophic error detected
    kdb_continue_catastrophic=0, attempting to continue
    Kernel panic - not syncing

  20. If the keyboard does not respond, the press CTRL+A or CTRL+AA or CTRL+C or Send a Break Signal from the Terminal program, and then try the "bt" command again.

  21. In NGX R60 and above, disable the on-line kernel debugger:

    [Expert@HostName]# echo 0 > /proc/sys/kernel/kdb
    [Expert@HostName]# cat /proc/sys/kernel/kdb

    Output should show 0

  22. You might want to change the /boot/grub/grub.conf file to the original version.

  23. Reboot the machine again. When you see the prompt "Press any key to see the boot menu..." - press any key and start the machine in Normal Mode.

  24. Send the following to Check Point Support for analysis:

    • stacks (outputs of 'bt' and 'mds %esp' commands)
    • list of processes (output of 'ps' command)
    • syslog buffer (output of 'dmesg' command)

 

Give us Feedback
Please rate this document
[1=Worst,5=Best]
Comment