The CPU Spike Detective is a tool in versions R80.40 and higher. This tool monitors the system CPU usage and checks for spikes (sudden increases) in CPU utilization.
Important Notes:
This tool does not cause an impact on performance.
This tool is integrated into R81 and higher versions. In these versions, the tool is enabled by default.
Versions R80.30 and lower are not supported (there are no plans to support these versions).
On Scalable Platforms, Spike Detective is supported only in R81.10 and higher.
Table of Contents
How is the spike detected?
What happens when a spike is detected?
Spikes in CPView
Profiling Data
Usage
Detecting a spike
How is the spike detected?
A spike in a CPU core utilization is considered when these conditions are met:
CPU utilization is over 80% (this threshold is configurable)
CPU utilization of the specific CPU core is at least 1.5 times higher than the entire system average usage (this threshold is configurable).
This ensures that a highly utilized system (for example, during a performance testing) will not detect all CPU cores as "spiked".
A thread/process is considered as "spiked" if it meets the below conditions:
Running on a "spiked" CPU core
Utilization is over 70% (this threshold is configurable)
Utilization is at least 1.5 times higher than the system average (this threshold is configurable)
What happens when a spike is detected?
Upon detecting a CPU spike, the tool:
Reports the spike to:
/var/log/spike_detective/spike_detective.log
cpview
cpview_services
/var/log/messages (in R81 and higher)
Takes profiling information (see the Profiling Data section below)
Spike log
Upon termination, all spikes are registered to the /var/log/spike_detective/spike_detective.log file.
CPU spike information:
Type (CPU/thread)
Start time
Identifier
CPU: Core Number
Thread: Thread ID, Thread name
Duration in seconds
Utilization when detected
Average utilization during the spikes lifetime
True/False if Performance profiling was taken for this spike
Log example:
[Expert@Firewall]# cat /var/log/spike_detective/spike_detective.log spike info: type: cpu, cpu core: 3, top consumer: fw_full, start time: 06/08/20 10:17:56, spike duration (sec): 5, initial cpu usage: 84, average cpu usage: 84, perf taken: 1 spike info: type: thread, thread id: 13094, thread name: fw_full, start time: 06/08/20 10:17:50, spike duration (sec): 11, initial cpu usage: 99, average cpu usage: 99, perf taken: 1
Spikes in CPView
Spike information can be reviewed in the CPView > CPU > Spikes.
Notes:
The information is similar to the CPU spike information in the log file
For both CPView and CPView history, the information is gathered during the last minute.
Sections:
Overview (last minute)
Summary of all CPU spikes during the last minute: Total Spikes, Average Spike Duration, Average Usage
Top 5 spikes (last minute)
For each CPU spike: Start Time, CPU Core, Spike Duration, Average Usage, Top Consumer
CPView examples
CPU Spikes:
Thread Spikes:
Profiling Data
By default, when a CPU spike is detected, the Spike Detective collects several predetermined statistics. Some statistics are taken for a specific spike (CPU spike/Thread spike), and others are general statistics, which are taken during spikes, but are unassigned to the specific spike.
Current data collected
Perf
On by default
Taken on a specific core/thread
Perf Callgraph
Off by default - Can be enabled using the configuration parameters (see the section Configurable Variables below)
Taken on a specific core/thread
Flofiler
Off by default - Can be enabled using the configuration parameters (see the section Configurable Variables below)
Only taken when a FireWall worker thread is highly utilized and is taken on the relevant CoreXL Firewall instance
In addition, this tools provides the option of adding an external script/tool that runs and collects statistics during a CPU spike.
Add the path to your script using the 'external_collector_path' configuration option (see the section Configurable Variables below).
The output of this script is saved to this file 'external_collector_data.log' which is saved under the spike's data directory (see 'data location' above).
Some additional information is passed to the script and can be used during its run:
Variable $1 = spike type: cpu, thread
Variable $2 = cpu core or thread id (depends on the spike's type)
Variable $3 = thread name (if the spike's type is "thread")
Example of the $FWDIR/conf/spike_detective_conf.xml file:
Important note before using the optional 'external_collector_path' option:
Make sure to assign the execution permissions to your external script before adding it to the CPU Spike Detective.
This script might run frequently (depends on the frequency of CPU spikes in the system). Therefore, you consider your tool's performance impact before execution.
Usage
The Spike Detective daemon is enabled by default and runs under the Check Point WatchDog (cpwd_admin) under the name 'SPIKE_DETECTIVE'. It is activated automatically during boot.
Example:
[Expert@Gaia:0]# cpwd_admin list | egrep "PID|SPIKE" APP PID STAT #START START_TIME MON COMMAND SPIKE_DETECTIVE 21146 E 1 [14:04:40] 27/8/2020 N spike_detective [Expert@Gaia:0]#
Start and stop
Start (on by default): set spike-detective start
Stop: set spike-detective stop
Enable and disable (survive reboot)
Enable (Enabled by default)
set spike-detective state on
reboot
Disable
set spike-detective state off
reboot
Configurable Variables
The CPU Spike Detective parameters are configured in the $FWDIR/conf/spike_detective_conf.xml file.
How to configure:
Stop the tool: set spike-detective stop
Set the parameter: set spike-detective config_value section SECTION name NAME type TYPE value VALUE
Start the tool: set spike-detective start
Examples of the $FWDIR/conf/spike_detective_conf.xml file:
Only check the top X threads to locate spiked threads and top consumers.
N/A
INT
10
spike_config
high_threshold
Minimum usage threshold to determine the start of a CPU core spike.
%
INT
80
spike_config
low_threshold
Minimum usage threshold to determine the end of a CPU core spike.
%
INT
40
spike_config
avg_threshold
Minimum usage threshold to determine the start of a CPU core spike, when compared to a CPU core average utilization.
%
INT
80
spike_config
system_multiplier
Minimum gap (multiplier) between a spiked CPU core and a system average (for example, 1.5 means 150% of the current utilization above the average utilization).
N/A
FLOAT
1.5
spike_config
avg_multiplier
Minimum gap (multiplier) between a spiked CPU core and the CPU core average utilization (for example, 1.5 means 150% of the current utilization above the average utilization).
N/A
FLOAT
1.5
spike_config
thread_high_threshold
Minimum usage threshold to determine the start of a CPU core spike.
%
INT
70
spike_config
thread_low_threshold
Minimum usage threshold to determine the end of a CPU core spike.
%
INT
40
spike_config
thread_system_multiplier
Minimum gap (multiplier) between a spiked CPU core and a system average (for example, 1.5 means 150% of the current utilization above the average utilization).
N/A
FLOAT
1.5
profiler_config
profiling_enable
Enable/Disable the collection of statistics and profiling data when a CPU spike is detected.
true = Enabled
false = Disabled
N/A
INT
true
profiler_config
profiling_freq_sec
Determines the period we wait between each run of profiling sample (should prevent taking stats too many times in a short period).
Second
INT
30
profiler_config
perf_enable
Enable/Disable running the perf on a spiked CPU/thread (only relevant when the profiling_enable is enabled)
true = Enabled
false = Disabled
N/A
BOOLEAN
true
profiler_config
perf_sample_sleep_sec
For how long the perf will sample the core/thread.
Second
FLOAT
1
profiler_config
perf_sample_freq
How frequently perf will sample the core/thread.
Second
INT
400
profiler_config
perf_samples_limit
Determine how many perf instances can run in parallel (relevant when more than one core/thread are in spike).
N/A
INT
2
profiler_config
perf_callgraph_enable
Enable/Disable running of the perf callgraph on a spiked CPU/thread (only relevant when the profiling_enable is enabled)
true = Enabled
false = Disabled
N/A
BOOLEAN
false
profiler_config
perf_delete_if_spike_ended
Enable/Disable the deletion of the perf sample, if the spike ended before the sample was completed.
true = Enabled
false = Disabled
N/A
BOOLEAN
false
profiler_config
flofiler_enable
Enable/Disable running of the flofiler during spike (only relevant when the profiling_enable is enabled).
true = Enabled
false = Disabled
N/A
BOOLEAN
true
profiler_config
flofiler_sample_sleep_sec
For how long the flofiler will sample the system.
Second
FLOAT
3
profiler_config
external_collector_path
Adding an external script/tool to run during a spike and collect profiling stats. User should enter a full path to their external script/tool.
N/A
STRING
N/A
profiler_config
top_conns_enable
Enable/Disable collecting top connections data during FW worker spike
true = Enabled
false = Disabled
N/A
BOOLEAN
true
profiler_config
heavy_conns_enable
Enable/Disable collecting heavy connections data during FW worker spike
true = Enabled
false = Disabled
N/A
BOOLEAN
true
cleaner_config
cleaner_enable
Enable/Disable the periodic cleanup of old spikes directories.
true = Enabled
false = Disabled
N/A
BOOLEAN
true
cleaner_config
cleaner_freq_sec
How frequently old files will be removed from the disk.
Second
INT
86400
cleaner_config
cleaner_file_max_age_days
Files older than this limit will be deleted from the disk.
Day
INT
7
cleaner_config
cleaner_dir_max_size_bytes
Size limit for the entire spike directory. If exceeded, old files will be removed until limit is obtained.
Byte
INT
1000000000
Give us Feedback
Thanks for your feedback!
Are you sure you want to rate this stars?