[Nagios] check_load での load average (ロードアベレージ) のチェックの設定変更

load average が跳ね上がることが増えてきたので、Nagios の check_load コマンドの設定を変更しました。

check_load のデフォルト値

define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_load!1,1,1!2,2,2
        contact_groups                  linux-admins
}

max_check_attempts を 3 回から 2 回に、normal_check_interval を 5 分から 3 分に変更しました。

check_load の設定変更後の値

define service{
        use                             generic-service
        host_name                       hoge
        service_description             LOAD
        max_check_attempts              2
        normal_check_interval           3
        retry_check_interval            1
        check_command                   check_load!1,1,1!2,2,2
}

チェック間隔を短くしたので、サーバが悲鳴をあげる前に対応できるようになるはずです。

以上、load average　が気になる、現場からお送りしました。

参考情報