HOWTO HDD temperature check
From Gentoo Linux Wiki
This HOWTO is about installing a service to check the temperature of your harddisk once in a while. And send an email (or do other action) if the temperature gets too high.
I need this construct to get notified, if the disks of my server get too warm. Harddisks in over temperature may fail earlier, so I needed some notificator to take aktion.
The installation of this stuff is quite forward. All we need is
- a program to check the disk: hddtemp
- a program to call hddtemp every 5 minutes
- a boot script to start and stop the checker program
- a action program to send away mails: mailto (from net-mail/metamail)
You don't need to install mailto, if you don't like it. This is just one option to take action, if your disks get too warm.
Contents |
Install hddtemp
First get hddtemp by
$ emerge hddtemp
After that do test if hddtemp is working properly
$ hddtemp /dev/hd{a,b,c}
If you get something like
WARNING: Drive /dev/hda doesn't appear in the database of supported drives WARNING: But using a common value, it reports something. WARNING: Note that the temperature shown could be wrong. WARNING: See --help, --debug and --drivebase options. WARNING: And don't forget you can add your drive to hddtemp.db
then your database file may be too old (or the drive too new :-) You can update your drive manually by editing the database file. Which is currently in /usr/share/hddtemp/hddtemp.db Get the model description of your drive by
$ hddtemp -D /dev/hda | grep Model Model: HDS722525VLAT80
Take that model descriptor and add it to the database
"HDS722525VLAT80" 194 C "Hitachi 250GB"
Make sure, that your temperature is referred in field 194 of the S.M.A.R.T interface and is in degrees celsius otherwise correct the two fields. Read the man page of hddtemp, how to check this.
Install mailto
Just do
$ emerge metamail
Install the checker script
If hddtemp gets you the correct temperature, you need a program to call periodically. Any shell script is sufficient, I add one for reference
| Code: check_hdd_temp.sh |
#!/bin/sh
DEVICES="/dev/hda /dev/sda"
TEMPLIMIT=42
EMAIL_NOTIFIER=root
CHECK_PERIOD=300
if [ "$1" == "-d" -o "$1" == "--daemon" ]; then
while : ; do
MAILTXT=""
for A in $DEVICES ; do
TEMPERATURE=$(hddtemp -n $A)
if [ $TEMPERATURE -gt $TEMPLIMIT ] ; then
MAILTXT=${MAILTXT}"Device $A has temperature of ${TEMPERATURE}°C (limit ${TEMPLIMIT}°C)\\n"
fi
echo $MAILTXT
done
if [ -n "$MAILTXT" ] ; then
echo "$MAILTXT" | tr '\\n' '\012' | MM_CHARSET=ISO-8859-15 mailto -s "HD temperature warning" $EMAIL_NOTIFIER > /dev/null
fi
sleep $CHECK_PERIOD
done
else
MAILTXT=""
for A in $DEVICES ; do
TEMPERATURE=$(hddtemp -n $A)
if [ $TEMPERATURE -gt $TEMPLIMIT ] ; then
MAILTXT="${MAILTXT}Device $A has temperature of ${TEMPERATURE}°C (limit ${TEMPLIMIT}°C)\\n"
fi
done
if [ -n "$MAILTXT" ] ; then
echo -e "$MAILTXT"
fi
fi
|
This script can be called from commandline to check the temperature of the harddisks. Make it executable and test it. To test if warnings appear, lower the temperature limit "TEMPLIMIT" to 1.
$ chmod a+x check_hdd_temp.sh $ ./check_hdd_temp.sh $ ./check_hdd_temp.sh -d <CTRL-C>
If you add the "-d" option, then this script loops forever, but it does not detach from its controlling terminal. This is what the bootscript does.
Install the boot script
This script executes the checker script in the background and puts a PID file. With this script, we can easily start and stop the checker script and install it to the boot process. You need to check the path of your checker script.
| Code: /etc/init.d/check_hdd_temp |
#!/sbin/runscript
depend() {
use mta
}
start() {
ebegin "Starting check_hdd_temp"
start-stop-daemon --start --quiet --background \
--pidfile /var/run/check_hdd_temp.pid --make-pidfile \
--exec /root/bin/check_hdd_temp.sh \
-- -d
eend ${?}
}
stop() {
ebegin "Stopping check_hdd_temp"
start-stop-daemon --stop --quiet --pidfile /var/run/check_hdd_temp.pid \
&& rm /var/run/check_hdd_temp.pid
eend ${?}
}
|
Start the checker program
/etc/init.d/check_hdd_temp start
and all should be well. Add it to the boot sequence
rc-update add check_hdd_temp default
That's it!
