[Nagios] Changing check_load Settings for Load Average Monitoring
Since load average spikes have been increasing, I changed the settings for Nagios’s check_load command.
Default values for check_load
define service{
use generic-service
host_name hoge
service_description LOAD
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
notification_interval 240
notification_period 24x7
notification_options c,r
check_command check_load!1,1,1!2,2,2
contact_groups linux-admins
}
I changed max_check_attempts from 3 to 2 times, and normal_check_interval from 5 to 3 minutes.
check_load values after configuration change
define service{
use generic-service
host_name hoge
service_description LOAD
max_check_attempts 2
normal_check_interval 3
retry_check_interval 1
check_command check_load!1,1,1!2,2,2
}
By shortening the check interval, we should be able to respond before the server starts crying for help.
That’s all from the Gemba, where load average is a concern.
That’s all from the Gemba.