Support Center > My Favorites > SecureKnowledge Details
Check Point Security Gateway on SecurePlatform / Gaia freezes, crashes, or reboots randomly, core dump files are not created Technical Level
Symptoms
  • Security Gateway running SecurePlatform / Gaia OS freezes, crashes, or reboots randomly, core dump files are not created.
Solution

Table of Contents:

  • Background
  • Procedure
  • Related solutions

 

Background

Due to various circumstances, the Security Gateway might freeze, or crash. In such cases, no information can be written into any system logs.

To understand what might have caused such failure, we have to extract the necessary information from the operating system (memory stack).
SecurePlatform / Gaia operating system can be configured to dump core files. However, the failure might be so hard, that the core files can not be dumped.
In these cases, we need to "seize" the moment when SecurePlatform / Gaia freezes or crashes and extract the memory stack directly from the kernel.

The following procedure explains how to prepare the problematic machine for the next occurrence of freeze, or crash.

 

Procedure

Note: This procedure is a simple change in configuration of Linux kernel, which has no impact on performance.

A demonstration of the procedure on R77.30 Security Gateway running Gaia OS:

We need to configure SecurePlatform / Gaia OS kernel on problematic machine in certain way, so during the problem it produces an output on the console, and accepts the input from keyboard.

The procedure involves the following basic steps:

  1. Configuring SecurePlatform / Gaia OS kernel to produce an output on the console and accept input from keyboard during the freeze, or crash
  2. Connecting another machine to the problematic Security Gateway using a Serial (RS232) cable
  3. Rebooting the problematic Security Gateway into "Debug Mode"
  4. Testing that we are able to communicate with the kernel directly
  5. Waiting for the next occurrence of freeze, or crash
  6. Extracting the necessary information from the kernel (memory stack)
  7. Restoring the original configuration

Detailed instructions:

  1. Back up the /boot/grub/grub.conf file:

    [Expert@HostName]# cp /boot/grub/grub.conf /boot/grub/grub.conf_backup
  2. Connect over Console and run 'w' command - note the number of Serial Terminal - ttyS0 or ttyS1

  3. Edit the /boot/grub/grub.conf file - modify the value of 'console=' parameter to be the number of Serial Terminal - 'console=ttyS0' or 'console=ttyS1'

    Modify the parameter for 'Debugging Mode':

    • For GAIA OS with 3.10 kernel:

      title Start in 64bit online debug mode
          kernel /vmlinuz-x86_64 ro  root=/dev/vg_splat/lv_current  noht panic=15 console=CURRENT kgdboc=kbd,ttyS0
          crashkernel=auto 3
      
      The kgdboc parameter should get kbd for keyboard interrupt and console terminal that will be used during the debugger interaction.
    • For Gaia OS 64-bit (R75.40 and above)

      title Start in 64bit online debug mode
          root (hd0,0)
          kernel /vmlinuz-x86_64 ro  root=/dev/vg_splat/lv_current vmalloc=256M  panic=15 console=CURRENT kdb=on crashkernel=128M@16M 3
          initrd /initrd-x86_64
      
    • For Gaia OS (R75.40 and above)

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=256M  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd
    • For NGX SecurePlatform 2.6 (R70 and above)

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=256M  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd
    • For NGX SecurePlatform 2.6 (R65 ENFv26)

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/  panic=15 console=CURRENT kdb=on 3 
          initrd /initrd 
    • For NGX SecurePlatform 2.4 (R60 - R65)

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ console=CURRENT kdb=on 3 
          initrd /initrd 
    • For VSX NGX R67 SecurePlatform 2.6

      title Start in online debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ vmalloc=128M thash_entries=4096 panic=15 console=CURRENT kdb=on quiet 3
          initrd /initrd
    • For VSX NGX R65 SecurePlatform 2.4

      title Start in debug mode 
          root (hd0,0) 
          kernel /vmlinuz ro  root=LABEL=/ console=CURRENT kdb=on 3 
          initrd /initrd 
    • For VSX NGX Scalability Pack SecurePlatform 2.4

      title SecurePlatform VSX NGX Scalability Pack [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-2.4.18-10cp41_kdb ro root=/dev/sda2 console=CURRENT 3 
          initrd /initrd-2.4.18-10cp41_kdb.img 
    • For VSX NGX SecurePlatform 2.4

      title SecurePlatform VSX NGX [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-2.4.18-10cp27_kdb ro root=/dev/sda2 console=CURRENT 3 
          initrd /initrd-2.4.18-10cp27_kdb.img 
    • For NG AI SecurePlatform 2.4

      title SecurePlatform NG with Application Intelligence (R55) [Debugging] 
          root (hd0,0) 
          kernel /vmlinuz-smp_kdb ro root=/dev/sda3 console=CURRENT 3 
          initrd /initrd-smp_kdb 
  4. Note: In some rare cases it has been found that the USB drivers can cause conflict with KDB.

    If KDB generates an 'Oops' message when attempting to switch into KDB mode, then at the end of the kernel line in /boot/grub/grub.conf file,
    the user should add 'nousb' parameter (e.g, '...console=ttyS0 kdb=on 3 nousb').

  5. Using a serial cable, connect a separate machine to port COM# (see Step 2 above).

  6. From this separate machine, connect to the Security gateway using a Terminal program such as PuTTY / SecureCRT / TeraTerm Web / HyperTerminal.
    The connection parameters for this are the RS232 default (9600, NONE, 8, 1, NONE).

    Related solution: sk108095 - Serial console connection configuration for Check Point appliances.

  7. You should be able to communicate with the crashing machine through the COM port.
    Make sure you have a serial connection with the machine - you should see the usual prompt [Expert@HostName] and you should be able to run commands.

  8. Reboot the crashing machine. When you see the prompt "Press any key to see the boot menu..." - press any key and start the machine in Debugging Mode:

    1. Wait for the login prompt (login:) - log in to the system.

    2. In NGX R60 and above, you have to enable the on-line kernel debugger:

      [Expert@HostName]# echo 1 > /proc/sys/kernel/kdb
      [Expert@HostName]# cat /proc/sys/kernel/kdb

      Output should show 1

    3. Try switching to the kernel prompt by one of the following methods:

      • For Gaia with kernel 3.10:

        To get into KDB you need to send the SysRq-G command.

        You should remember that getting to the kdb will stop all processes. You can resume to normal operation by typing: # go

        There are several ways to execute it, depending on the way you are connected to the device.

        • Run the following commands:

          [Expert@gw:0]# echo "1" > /proc/sys/kernel/sysrq
          [Expert@gw:0]# echo g > /proc/sysrq-trigger

          Output:

          SysRq : DEBUG
          Entering kdb (current=0xffff880851436ba0, pid 10683) on processor 0 due to Keyboard Entry

        • Using mimicom terminal emulator:

          Press CTRL-A f g

        • Using telnet via PortServer/Digi:

          Press CTRL-]

          Type: send break and press "g"

      • For Gaia with kernel 2.6.18 press one of the following:

        pressing CTRL+A (you will see ^A) and then pressing Enter
        pressing CTRL+AA (you will see ^A^A) and then pressing Enter
        pressing CTRL+C (you will see ^C) and then pressing Enter
        pressing Esc+K+D+B and then pressing Enter
        using the Send a Break Signal option in the Terminal program
        pressing CTRL+q on azerty keyboard

        Important Note: Keyboard layout must be English, otherwise machine will freeze.

      • The prompt should change:

        from the usual Bash prompt:
        [Expert@HostName]#

        to the kernel prompt:

        kdb>
        Entering kdb (current=0xXXXXXXXX, pid 0) due to Keyboard Entry
      • Type ? or help to see all available commands (in sub-menu more>, type q or Q and then press Enter to quit).

    4. From the kernel prompt run:

      kdb> bt

      For each function, the following information is given:

      • The pointer to the stack frame (ESP).
      • The current address within this frame (EIP).
      • The return address from the previous frame converted to a function name and an offset of the address within the function.
        If this is the first frame, the address is of the current location (EIP Register).
      • The function arguments.

      You should get a Stack output on the screen - multiple lines:

      that look similar to these:

      EBP        EIP         Function(args)
      0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                     kernel .text 0x80100000 0x8011ec20 0x8011ecf0
      

      or that look similar to these:

      EBP        EIP         Function(args)
      0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
      
    5. From the kernel prompt run:

      kdb> go

      You should exit the on-line kernel debugger and enter a regular prompt [Expert@HostName]#

      Note:
      Occasionally, the 'go' command might not work, and the error is displayed: 'Catastrophic error detected'.
      In such case, instead of returning to a regular prompt, the machine might be stuck in KDB prompt, or even crash.
      Such kernel's behavior might be caused by numerous reasons.
      If the machine is stuck in KDB prompt, reboot the machine manually.
      Otherwise, let the machine crash and reboot itself...

      Example:
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, type go a second time if you really want to continue
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, attempting to continue
      Kernel panic - not syncing

    6. Make sure the serial connection is still working and that you can run commands.

    7. Leave this serial machine working and connected.
      Wait until the freeze/crash occurs again.

    8. When the freeze/crash occurs, on the machine, which is still connected over serial console, repeat Step 8-C to switch to kernel debug prompt kdb>

    9. See what processes are running and copy their complete list (to match the PID and the stacks):

      kdb> ps

      You should get an output on the screen - multiple lines that look like these:

      Task Addr       Pid   Parent [*] cpu State Thread     Command
      0x9fdcaaa0        1        0  0    0   S  0x9fdcac50  init
      0x9ffe2550     1263        1  0    0   S  0x9ffe2700  syslogd
      0x9fe95550     1272        1  0    0   S  0x9fe95700  klogd
      0x9fc0b000     1518        1  0    0   S  0x9fc0b1b0  sshd
      0x9fc0baa0     1563        1  0    0   S  0x9fc0bc50  crond
      ................................................
      more>
      ................................................
      0x97325000     1935     1922  0    0   S  0x973251b0  login
      0x9fc5caa0     1936     1935  0    0   S  0x9fc5cc50  bash
      

      Then press Enter until you see the kernel prompt again kdb> - copy the complete list of the processes.

    10. In NGX R60 and above, display syslog buffer and copy the complete output:

       

      kdb> dmesg

      text
      text
      more>

       

      Then press Enter until you see the kernel prompt again kdb> - copy the complete output.

      Note: The output will be long.

    11. On Gaia OS / SecurePlatform 2.6, display the summary:

      kdb> summary

      You should get an output on the screen - multiple lines that look like these:

      summary
      sysname    Linux
      release    2.6.18-92cp
      version    #1 SMP Wed Feb 4 17:35:21 IST 2009
      machine    i686
      nodename   Member_A
      domainname 
      date       2009-06-13 19:35:58 tz_minuteswest -180
      uptime     00:03
      load avg   0.81 0.77 0.33
      
      MemTotal:       502364 kB
      MemFree:        122796 kB
      Buffers:         17176 kB
      Cached:         132716 kB
      SwapCached:          0 kB
      Active:         195680 kB
      HighTotal:           0 kB
      HighFree:            0 kB
      LowTotal:       502364 kB
      LowFree:        122796 kB
      SwapTotal:     2096440 kB
      SwapFree:      2096440 kB
      Dirty:             368 kB
      AnonPages:      122604 kB
      Mapped:          24636 kB
      Slab:            13328 kB
      PageTables:       1572 kB
      NFS_Unstable:        0 kB
      Bounce:              0 kB
      CommitLimit:   2347620 kB
      Committed_AS:   385804 kB
      VmallocTotal:  1566712 kB
      VmallocUsed:     94100 kB
      VmallocChunk:  1470332 kB
      HugePages_Total:     0
      HugePages_Free:      0
      HugePages_Rsvd:      0
      Hugepagesize:     2048 kB
      kdb>
      
    12. If you have single CPU, then perform the following steps
      (if you have multiple CPUs (cpsmp), then go to the next Step):

      1. Start the stack traceback from the CPU:

        [0]kdb> bt

        You should get a Stack output on the screen - multiple lines:

        that look similar to these:

        EBP        EIP         Function(args)
        0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                       kernel .text 0x80100000 0x8011ec20 0x8011ecf0
        0x8194bec8 0x8011d725 schedule+0x135
                                       kernel .text 0x80100000 0x8011d5f0 0x8011d860
        

        or that look similar to these:

        EBP        EIP         Function(args)
        0x807adf48 0x80431802 __do_softirq+0x62
        0x807adf6c 0x804318cb do_softirq+0x3b
        0x807adf74 0x80431b96 irq_exit+0x36
        0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
        0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
        0x807adfac 0x80403e41 default_idle+0x31 
        
      2. Copy the Stack output from the screen.

      3. Important for SecurePlatform 2.6:
        In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
        Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

        kdb> mds %esp

        You should get a Stack output on the screen - multiple lines that look like these:

        0x885d1d1c 00000000   ....
        0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
        0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
        
      4. Copy the Stack output from the screen.

    13. If you have multiple CPUs (cpsmp), then perform the following steps in the context of each CPU core:

      1. Check the available CPU contexts
      2. For the context of the 1st CPU core:
        1. Go to the context of the CPU core
        2. Verify the CPU context
        3. Start the stack traceback from the 1st CPU core
        4. Copy the Stack output from the screen
      3. For the context of each additional CPU core:
        Repeat the above steps to collect the Stack in the context of each CPU core

      Example for the machine with 2 CPU cores:

      1. Check the available CPU contexts. In the kernel prompt, run:

        [0]kdb> cpu
      2. Note the number of CPU contexts ("Available cpus" in the following output):

        Currently on cpu 0
        Available cpus: 0, 1

      3. Go to the context of the 1st CPU core. In the kernel prompt, run:

        [0]kdb> cpu 0

      4. The prompt should change to:

        Entering kdb (current=0xXXXXXXXX, pid 0) on processor 0 due to cpu switch.

      5. Verify the CPU context ("Currently on cpu" in the following output):

        [0]kdb> cpu
        Currently on cpu 0
        Available cpus: 0, 1

      6. Start the stack traceback from the 1st CPU core:

        [0]kdb> bt

        You should get a Stack output on the screen - multiple lines:

        that look similar to these:

        EBP        EIP         Function(args)
        0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                       kernel .text 0x80100000 0x8011ec20 0x8011ecf0
        0x8194bec8 0x8011d725 schedule+0x135
                                       kernel .text 0x80100000 0x8011d5f0 0x8011d860
        

        or that look similar to these:

        EBP        EIP         Function(args)
        0x807adf48 0x80431802 __do_softirq+0x62
        0x807adf6c 0x804318cb do_softirq+0x3b
        0x807adf74 0x80431b96 irq_exit+0x36
        0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
        0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
        0x807adfac 0x80403e41 default_idle+0x31 
        
      7. Copy the Stack output from the screen.

      8. Important for SecurePlatform 2.6:
        In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
        Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

        kdb> mds %esp

        You should get a Stack output on the screen - multiple lines that look like these:

        0x885d1d1c 00000000   ....
        0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
        0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
        
      9. Copy the Stack output from the screen.

      10. Go to the context of the 2nd CPU core. In the kernel prompt, run:

        [0]kdb> cpu 1
      11. The prompt should change to:

        Entering kdb (current=0xXXXXXXXX, pid 0) on processor 1 due to cpu switch.
      12. Verify the CPU context ("Currently on cpu" in the following output):

        [1]kdb> cpu
        Currently on cpu 1
        Available cpus: 0, 1
      13. Start the stack traceback from the 2nd CPU core:

        [1]kdb> bt

        You should get a Stack output on the screen - multiple lines:

        that look similar to these:

        EBP        EIP         Function(args)
        0x8194beac 0x8011eca1 context_switch+0x81 (0x803e3980, 0x8194a000, 0x803b4000, 0x43fc, 0x8194bef4)
                                       kernel .text 0x80100000 0x8011ec20 0x8011ecf0
        0x8194bec8 0x8011d725 schedule+0x135
                                       kernel .text 0x80100000 0x8011d5f0 0x8011d860
        

        or that look similar to these:

        EBP        EIP         Function(args)
        0x807adf48 0x80431802 __do_softirq+0x62
        0x807adf6c 0x804318cb do_softirq+0x3b
        0x807adf74 0x80431b96 irq_exit+0x36
        0x807adf78 0x8041d392 smp_apic_timer_interrupt+0x62 (0x807adf84)
        0x807adf80 0x80405deb apic_timer_interrupt+0x1f (0x0, 0x807ac000, 0x80403e10)
        0x807adfac 0x80403e41 default_idle+0x31
        
      14. Copy the Stack output from the screen.

      15. Important for SecurePlatform 2.6:
        In SecurePlatform 2.6, the kernel panic stack (the output of bt command) can miss some information.
        Therefore, run an additional command ("Display Memory Symbolically") to collect this missing information:

        kdb> mds %esp

        You should get a Stack output on the screen - multiple lines that look like these:

        0x885d1d1c 00000000   ....
        0x885d1d20 91215695 [fwmod]fw_ktd_vprintf+0xd5
        0x885d1d2c 91727fa0 [fwmod]fwkdebug_panic_on_str
        
      16. Copy the Stack output from the screen.

    14. Return to the normal shell. In the kernel prompt, run:

      kdb> go

      You should exit the on-line kernel debugger and enter a regular prompt [Expert@HostName]#

      Note:
      Occasionally, the 'go' command might not work, and the error is displayed: 'Catastrophic error detected'.
      In such case, instead of returning to a regular prompt, the machine might be stuck in KDB prompt, or even crash.
      Such kernel's behavior might be caused by numerous reasons.
      If the machine is stuck in KDB prompt, reboot the machine manually.
      Otherwise, let the machine crash and reboot itself...

      Example:
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, type go a second time if you really want to continue
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, attempting to continue
      Kernel panic - not syncing

    15. If the keyboard does not respond, the press CTRL+A or CTRL+AA or CTRL+C or Send a Break Signal from the Terminal program, and then try the "bt" command again.

    16. In NGX R60 and above, disable the on-line kernel debugger:

      [Expert@HostName]# echo 0 > /proc/sys/kernel/kdb
      [Expert@HostName]# cat /proc/sys/kernel/kdb

      Output should show 0

    17. You might want to change the /boot/grub/grub.conf file to the original version.

    18. Reboot the machine again. When you see the prompt "Press any key to see the boot menu..." - press any key and start the machine in Normal Mode.

    19. Send the following to Check Point Support for analysis:

      • stacks (outputs of 'bt' and 'mds %esp' commands)
      • list of processes (output of 'ps' command)
      • syslog buffer (output of 'dmesg' command)

     

    Give us Feedback
    Please rate this document
    [1=Worst,5=Best]
    Comment