Support Center > Search Results > SecureKnowledge Details
CPU Spike Detective Technical Level
Solution

The CPU Spike Detective is a tool in versions R80.40 and higher. This tool monitors the system CPU usage and checks for spikes (sudden increases) in CPU utilization.

Important Notes:

  • This tool does not cause an impact on performance.
  • This tool is integrated into R81 and higher versions.
    In these versions, the tool is enabled by default.
  • To get this tool on the R80.40 version, you must install R80.40 Jumbo Hotfix Accumulator - Take 69 or higher.
  • Versions R80.30 and lower are not supported (there are no plans to support these versions).
  • On Scalable Platforms, Spike Detective is supported only in R81.10 and higher.

Table of Contents

  • How is the spike detected?
  • What happens when a spike is detected?
  • Spikes in CPView
  • Profiling Data
  • Usage

Detecting a spike

How is the spike detected?

A spike in a CPU core utilization is considered when these conditions are met:

  • CPU utilization is over 80% (this threshold is configurable)
  • CPU utilization of the specific CPU core is at least 1.5 times higher than the entire system average usage (this threshold is configurable).

    This ensures that a highly utilized system (for example, during a performance testing) will not detect all CPU cores as "spiked".

A thread/process is considered as "spiked" if it meets the below conditions:

  • Running on a "spiked" CPU core
  • Utilization is over 70% (this threshold is configurable)
  • Utilization is at least 1.5 times higher than the system average (this threshold is configurable)

What happens when a spike is detected?

Upon detecting a CPU spike, the tool:

  • Reports the spike to:
    • /var/log/spike_detective/spike_detective.log
    • cpview
    • cpview_services
    • /var/log/messages (in R81 and higher)
  • Takes profiling information (see the Profiling Data section below)

Spike log

Upon termination, all spikes are registered to the /var/log/spike_detective/spike_detective.log file.

CPU spike information:

  • Type (CPU/thread)
  • Start time
  • Identifier
    • CPU: Core Number
    • Thread: Thread ID, Thread name
  • Duration in seconds
  • Utilization when detected
  • Average utilization during the spikes lifetime
  • True/False if Performance profiling was taken for this spike

Log example:

[Expert@Firewall]# cat /var/log/spike_detective/spike_detective.log
spike info: type: cpu, cpu core: 3, top consumer: fw_full, start time: 06/08/20 10:17:56, spike duration (sec): 5, initial cpu usage: 84, average cpu usage: 84, perf taken: 1
spike info: type: thread, thread id: 13094, thread name: fw_full, start time: 06/08/20 10:17:50, spike duration (sec): 11, initial cpu usage: 99, average cpu usage: 99, perf taken: 1

Spikes in CPView

Spike information can be reviewed in the CPView > CPU > Spikes.

Notes:

  • The information is similar to the CPU spike information in the log file
  • For both CPView and CPView history, the information is gathered during the last minute.
  • Sections:
    • Overview (last minute)
      • Summary of all CPU spikes during the last minute:
        Total Spikes, Average Spike Duration, Average Usage
    • Top 5 spikes (last minute)
      • For each CPU spike:
        Start Time, CPU Core, Spike Duration, Average Usage, Top Consumer

CPView examples

CPU Spikes:

Thread Spikes:



Profiling Data

By default, when a CPU spike is detected, the Spike Detective collects several predetermined statistics. Some statistics are taken for a specific spike (CPU spike/Thread spike), and others are general statistics, which are taken during spikes, but are unassigned to the specific spike.

Current data collected

  • Perf
    • On by default
    • Taken on a specific core/thread
  • Perf Callgraph
    • Off by default - Can be enabled using the configuration parameters (see the section Configurable Variables below)
    • Taken on a specific core/thread
  • Flofiler
    • Off by default - Can be enabled using the configuration parameters (see the section Configurable Variables below)
    • Only taken when a FireWall worker thread is highly utilized and is taken on the relevant CoreXL Firewall instance
  • Top connections
    • Starting
      • R81 JHF Take 42
      • R80.40 JHF Take 150
    • On by default
    • Only taken if relevant (highly utilized worker)
    • Info taken for the relevant instance
      • Not supported on VSX
    • For more information on the tool - sk172229
  • Heavy connections
    • Starting
      • R81 JHF Take 42
      • R80.40 JHF Take 150
    • On by default
    • Only taken if relevant (highly utilized worker)
    • Info taken for the relevant instance
      • Not supported on VSX
    • For more information on the tool - sk164215

Data location

  • Spike related data is saved in these files:
    • /var/log/spike_detective/data_spike_cpu_<Core_Number>_<Date>_<Time>
      Example:
      data_spike_cpu_0_2020-08-29_15-54-41
    • /var/log/spike_detective/data_spike_thread_<Thread_ID>_<Date>_<Time>
      Example:
      data_spike_thread_25615_2020-08-27_13-01-47
  • General data is saved in this directory:
    • /var/log/spike_detective/data_spike_general_<Date>_<Time>/


External Stats Collector (use with caution!)

In addition, this tools provides the option of adding an external script/tool that runs and collects statistics during a CPU spike.

Add the path to your script using the 'external_collector_path' configuration option (see the section Configurable Variables below).

The output of this script is saved to this file 'external_collector_data.log' which is saved under the spike's data directory (see 'data location' above).

Some additional information is passed to the script and can be used during its run:

  • Variable $1 = spike type: cpu, thread
  • Variable $2 = cpu core or thread id (depends on the spike's type)
  • Variable $3 = thread name (if the spike's type is "thread")

Example of the $FWDIR/conf/spike_detective_conf.xml file:

<?xml version="1.0" encoding="UTF-8"?>
  <cponfig_file>
    <profiler_config>
      <stat name="external_collector_path" type="STRING" value="/home/admin/my_collector_script.sh" /> /*Example!*/
    </profiler_config>
  </cponfig_file>

Example of an external script:

#!/bin/bash

echo "spike's data:"
echo "type = $1"
echo "id = $2"
echo "thread name (if exists) = $3"
echo "collecting stats:"
fwaccel stats -s sleep 2 fwaccel stats -s

Important note before using the optional 'external_collector_path' option:

  • Make sure to assign the execution permissions to your external script before adding it to the CPU Spike Detective.
  • This script might run frequently (depends on the frequency of CPU spikes in the system). Therefore, you consider your tool's performance impact before execution.


Usage

The Spike Detective daemon is enabled by default and runs under the Check Point WatchDog (cpwd_admin) under the name 'SPIKE_DETECTIVE'. It is activated automatically during boot.

Example:

[Expert@Gaia:0]# cpwd_admin list | egrep "PID|SPIKE"
APP PID STAT #START START_TIME MON COMMAND
SPIKE_DETECTIVE 21146 E 1 [14:04:40] 27/8/2020 N spike_detective
[Expert@Gaia:0]#


Start and stop

  1. Start (on by default):
    set spike-detective start

  2.  Stop:
    set spike-detective stop

Enable and disable (survive reboot)

  1. Enable (Enabled by default)
    1. set spike-detective state on
    2. reboot

  2. Disable
    1. set spike-detective state off
    2. reboot

Configurable Variables

The CPU Spike Detective parameters are configured in the $FWDIR/conf/spike_detective_conf.xml file.

How to configure:

  1. Stop the tool:
    set spike-detective stop

  2. Set the parameter:
    set spike-detective config_value section SECTION name NAME type TYPE value VALUE

  3. Start the tool:
    set spike-detective start

Examples of the $FWDIR/conf/spike_detective_conf.xml file:

<?xml version="1.0" encoding="UTF-8"?>
  <cponfig_file>
    <spike_config>
      <stat name="sampling_freq_sec" type="INT" value="5" />
    </spike_config>
    <profiler_config>
      <stat name="profiling_enable" type="BOOLEAN" value="true" />
    </profiler_config>
    <cleaner_config>
      <stat name="cleaner_enable" type="BOOLEAN" value="true" />
    </cleaner_config>
  </cponfig_file>

Useful tip

In some scenarios, sampling in a higher frequency can assist in catching very short spikes. 

You can configure it through Clish in this way:

set spike-detective config_value section spike_config name sampling_freq_sec type INT value 1

<?xml version="1.0" encoding="UTF-8"?>
  <config_file>
    <spike_config>
      <stat name="sampling_freq_sec" type="INT" value="1" />
    </spike_config>
  </config_file>

All configuration variables

Enter the string to filter the below table:

Section Name Purpose Unit Type Default
spike_config sampling_freq_sec CPU usage sampling frequency. Second INT 5
spike_config max_checked_threads Only check the top X threads to locate spiked threads and top consumers. N/A INT 10
spike_config high_threshold Minimum usage threshold to determine the start of a CPU core spike. % INT 80
spike_config low_threshold Minimum usage threshold to determine the end of a CPU core spike. % INT 40
spike_config avg_threshold Minimum usage threshold to determine the start of a CPU core spike, when compared to a CPU core average utilization. % INT 80
spike_config system_multiplier Minimum gap (multiplier) between a spiked CPU core and a system average (for example, 1.5 means 150% of the current utilization above the average utilization). N/A FLOAT 1.5
spike_config avg_multiplier Minimum gap (multiplier) between a spiked CPU core and the CPU core average utilization (for example, 1.5 means 150% of the current utilization above the average utilization). N/A FLOAT 1.5
spike_config thread_high_threshold Minimum usage threshold to determine the start of a CPU core spike. % INT 70
spike_config thread_low_threshold Minimum usage threshold to determine the end of a CPU core spike. % INT 40
spike_config thread_system_multiplier Minimum gap (multiplier) between a spiked CPU core and a system average (for example, 1.5 means 150% of the current utilization above the average utilization). N/A FLOAT 1.5
profiler_config profiling_enable Enable/Disable the collection of statistics and profiling data when a CPU spike is detected.
  • true = Enabled
  • false = Disabled
N/A INT true
profiler_config profiling_freq_sec Determines the period we wait between each run of profiling sample (should prevent taking stats too many times in a short period). Second INT 30
profiler_config perf_enable Enable/Disable running the perf on a spiked CPU/thread (only relevant when the profiling_enable is enabled)
  • true = Enabled
  • false = Disabled
N/A BOOLEAN true
profiler_config perf_sample_sleep_sec For how long the perf will sample the core/thread. Second FLOAT 1
profiler_config perf_sample_freq How frequently perf will sample the core/thread. Second INT 400
profiler_config perf_samples_limit Determine how many perf instances can run in parallel (relevant when more than one core/thread are in spike). N/A INT 2
profiler_config perf_callgraph_enable Enable/Disable running of the perf callgraph on a spiked CPU/thread (only relevant when the profiling_enable is enabled)
  • true = Enabled
  • false = Disabled
N/A BOOLEAN false
profiler_config perf_delete_if_spike_ended Enable/Disable the deletion of the perf sample, if the spike ended before the sample was completed.
  • true = Enabled
  • false = Disabled
N/A BOOLEAN false
profiler_config flofiler_enable Enable/Disable running of the flofiler during spike (only relevant when the profiling_enable is enabled).
  • true = Enabled
  • false = Disabled
N/A BOOLEAN true
profiler_config flofiler_sample_sleep_sec For how long the flofiler will sample the system. Second FLOAT 3
profiler_config external_collector_path Adding an external script/tool to run during a spike and collect profiling stats.
User should enter a full path to their external script/tool.
N/A STRING N/A
profiler_config top_conns_enable
Enable/Disable collecting top connections data during FW worker spike
  • true = Enabled
  • false = Disabled
N/A BOOLEAN true
profiler_config heavy_conns_enable
Enable/Disable collecting heavy connections data during FW worker spike
  • true = Enabled
  • false = Disabled
N/A BOOLEAN true
cleaner_config cleaner_enable Enable/Disable the periodic cleanup of old spikes directories.
  • true = Enabled
  • false = Disabled
N/A BOOLEAN true
cleaner_config cleaner_freq_sec How frequently old files will be removed from the disk. Second INT 86400
cleaner_config cleaner_file_max_age_days Files older than this limit will be deleted from the disk. Day INT 7
cleaner_config cleaner_dir_max_size_bytes Size limit for the entire spike directory.
If exceeded, old files will be removed until limit is obtained.
Byte INT 1000000000

Give us Feedback
Please rate this document
[1=Worst,5=Best]
Comment