HOWTO HDD temperature check

From Gentoo Linux Wiki

Jump to: navigation, search

This HOWTO is about installing a service to check the temperature of your harddisk once in a while. And send an email (or do other action) if the temperature gets too high.

I need this construct to get notified, if the disks of my server get too warm. Harddisks in over temperature may fail earlier, so I needed some notificator to take aktion.

The installation of this stuff is quite forward. All we need is

  • a program to check the disk: hddtemp
  • a program to call hddtemp every 5 minutes
  • a boot script to start and stop the checker program
  • a action program to send away mails: mailto (from net-mail/metamail)

You don't need to install mailto, if you don't like it. This is just one option to take action, if your disks get too warm.


Contents

Install hddtemp

First get hddtemp by

$ emerge hddtemp

After that do test if hddtemp is working properly

$ hddtemp /dev/hd{a,b,c}

If you get something like

WARNING: Drive /dev/hda doesn't appear in the database of supported drives
WARNING: But using a common value, it reports something.
WARNING: Note that the temperature shown could be wrong.
WARNING: See --help, --debug and --drivebase options.
WARNING: And don't forget you can add your drive to hddtemp.db

then your database file may be too old (or the drive too new :-) You can update your drive manually by editing the database file. Which is currently in /usr/share/hddtemp/hddtemp.db Get the model description of your drive by

$ hddtemp -D /dev/hda | grep Model
Model: HDS722525VLAT80

Take that model descriptor and add it to the database

"HDS722525VLAT80"              194  C  "Hitachi 250GB"

Make sure, that your temperature is referred in field 194 of the S.M.A.R.T interface and is in degrees celsius otherwise correct the two fields. Read the man page of hddtemp, how to check this.

Install mailto

Just do

$ emerge metamail


Install the checker script

If hddtemp gets you the correct temperature, you need a program to call periodically. Any shell script is sufficient, I add one for reference

Code: check_hdd_temp.sh
#!/bin/sh


DEVICES="/dev/hda /dev/sda"
TEMPLIMIT=42

EMAIL_NOTIFIER=root
CHECK_PERIOD=300


if [ "$1" == "-d" -o "$1" == "--daemon" ]; then

        while : ; do

                MAILTXT=""
                for A in $DEVICES ; do
                        TEMPERATURE=$(hddtemp -n $A)

                        if [ $TEMPERATURE -gt $TEMPLIMIT ] ; then
                                MAILTXT=${MAILTXT}"Device $A has temperature of ${TEMPERATURE}°C (limit ${TEMPLIMIT}°C)\\n"
                        fi
                        echo $MAILTXT
                done

                if [ -n "$MAILTXT" ] ; then
                        echo "$MAILTXT" | tr '\\n' '\012' | MM_CHARSET=ISO-8859-15 mailto -s "HD temperature warning" $EMAIL_NOTIFIER  > /dev/null
                fi
                sleep $CHECK_PERIOD
        done

else

        MAILTXT=""
        for A in $DEVICES ; do
                TEMPERATURE=$(hddtemp -n $A)

                if [ $TEMPERATURE -gt $TEMPLIMIT ] ; then
                        MAILTXT="${MAILTXT}Device $A has temperature of ${TEMPERATURE}°C (limit ${TEMPLIMIT}°C)\\n"
                fi

        done

        if [ -n "$MAILTXT" ] ; then
                echo -e "$MAILTXT"
        fi

fi

This script can be called from commandline to check the temperature of the harddisks. Make it executable and test it. To test if warnings appear, lower the temperature limit "TEMPLIMIT" to 1.

$ chmod a+x check_hdd_temp.sh
$ ./check_hdd_temp.sh
$ ./check_hdd_temp.sh -d
<CTRL-C>

If you add the "-d" option, then this script loops forever, but it does not detach from its controlling terminal. This is what the bootscript does.


Install the boot script

This script executes the checker script in the background and puts a PID file. With this script, we can easily start and stop the checker script and install it to the boot process. You need to check the path of your checker script.


Code: /etc/init.d/check_hdd_temp
#!/sbin/runscript

depend() {
        use mta
}

start() {
        ebegin "Starting check_hdd_temp"
        start-stop-daemon --start --quiet --background \
                --pidfile /var/run/check_hdd_temp.pid --make-pidfile \
                --exec /root/bin/check_hdd_temp.sh \
                -- -d
        eend ${?}
}

stop() {
        ebegin "Stopping check_hdd_temp"
        start-stop-daemon --stop --quiet --pidfile /var/run/check_hdd_temp.pid \
                && rm /var/run/check_hdd_temp.pid
        eend ${?}
}

Start the checker program

/etc/init.d/check_hdd_temp start

and all should be well. Add it to the boot sequence

rc-update add check_hdd_temp default

That's it!

Personal tools