Red Hat Bugzilla – Bug 159929
dhclient spews requests on disk error
Last modified: 2010-04-06 09:15:47 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513
Description of problem:
this is part conjecture since the computers this happens on can't be used but needs to be reinstalled with fresh harddrives. it has happened twice so far.
the issue is that dhclient sends a DHCP request, gets it acknowledged and accepts the given IP. it will then presumably try to write the data to disk, but fail (on the console, disk I/O errors scroll by in a terrifying speed). it will then IMMEDIATELY send another request.
at our system this means 300+ completed DHCP transactions per second per such host. since each transaction results in six lines of output, our log on the server gets swamped and fills the disk rather quickly (half a gigabyte per hour, which is more than our total log volume for a whole day). when the log disk goes full, we lose logging for other services as well, so this is a real problem for us.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. you need: one computer, one hammer.
2. open case. whack harddisk with hammer. be careful not to whack too hard.
3. boot computer.
Actual Results: the DHCP server is flooded by requests
Expected Results: the computer should wait a minimum of a second between requests.
Oh dear! Sorry to hear about these problems you are having with dhclient.
From a quick look at the code, it seems dhclient does not actually test
if the write of a bound/renewed lease succeeds or not.
What can make it cycle from the bound/renew state to the initial state is
if execution of dhclient-script fails - so the "I/O errors" observed could
have been for reads of /sbin/dhclient-script . Do you have any record of these
I/O error messages? If so, it would be useful to append an example to this bug
If the hard disk failure occurred after dhclient has run and before it
binds/renews a lease, it would be unable to run /sbin/dhclient-script,
and could cycle infinitely - I agree this behaviour should be corrected.
Did you actually verify that the hard disk had failed ?
I'd just like to make sure that this problem was actually due to a hard disk
failure and not problems with dhclient-script.
Presumably the hard disk on which the /var/log partition resides was not the
one which failed ? Else you would not have had the log fill up - or were
you using remote logging ? Some examples of the log messages generated by
dhclient would be useful to append to this bug.
If dhclient detects an unrecoverable dhclient-script failure, it should
probably just exit . I'll work on implementing this. But the further
information described above would be much appreciated in resolving this bug.
the log disk in question is on our DHCP server (which also happens to be our
central logging host), the messages are from dhcpd running there, not the client
machine. it's just the normal sequence of DHCPREQUEST DHCPOFFER DHCPACK.
I'm afraid we don't have any record of the messages on the failing computers.
the console messages zoom by so fast it's impossible to read them, and after
powering down, the hard drive won't even spin up anymore.
(I'm impressed by the speedy response :)
This bug is now fixed with dhcp-3.0.1-40_EL3, which should be in
RHEL-3-U6, but which meanwhile can be downloaded from:
Please try it out and let me know of any issues - thank you.
ISC bug 14894 raised and patch submitted - accepted for inclusion
in next upstream DHCP release.
FWIW, the test RPM has been running in one of our labs (40 student workstations)
for almost a month without any issues. I haven't brought out the hammer, though :-)
Closing as WONTFIX. dhcp-3.0.1-10_EL3 is the latest version shipped for RHEL-3.