Bug 170764 - dhclient spews requests on disk error
Summary: dhclient spews requests on disk error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: dhcp
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jason Vas Dias
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-10-14 14:43 UTC by Jason Vas Dias
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2006-0114
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 18:14:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0114 0 qe-ready SHIPPED_LIVE dhcp bug fix update 2006-03-06 05:00:00 UTC

Description Jason Vas Dias 2005-10-14 14:43:21 UTC
+++ This bug was initially created as a clone of Bug #159929 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513

Description of problem:
this is part conjecture since the computers this happens on can't be used but
needs to be reinstalled with fresh harddrives.  it has happened twice so far.

the issue is that dhclient sends a DHCP request, gets it acknowledged and
accepts the given IP.  it will then presumably try to write the data to disk,
but fail (on the console, disk I/O errors scroll by in a terrifying speed).  it
will then IMMEDIATELY send another request.

at our system this means 300+ completed DHCP transactions per second per such
host.  since each transaction results in six lines of output, our log on the
server gets swamped and fills the disk rather quickly (half a gigabyte per hour,
  which is more than our total log volume for a whole day).  when the log disk
goes full, we lose logging for other services as well, so this is a real problem
for us.


Version-Release number of selected component (if applicable):
dhclient-3.0.1-10_EL3

How reproducible:
Always

Steps to Reproduce:
1. you need: one computer, one hammer.
2. open case.  whack harddisk with hammer.  be careful not to whack too hard.
3. boot computer.

  

Actual Results:  the DHCP server is flooded by requests

Expected Results:  the computer should wait a minimum of a second between requests.


Additional info:

-- Additional comment from jvdias on 2005-06-09 10:27 EST --
Oh dear! Sorry to hear about these problems you are having with dhclient.

From a quick look at the code, it seems dhclient does not actually test
if the write of a bound/renewed lease succeeds or not. 

What can make it cycle from the bound/renew state to the initial state is
if execution of dhclient-script fails - so the "I/O errors" observed could
have been for reads of /sbin/dhclient-script . Do you have any record of these
I/O error messages? If so, it would be useful to append an example to this bug
report.

If the hard disk failure occurred after dhclient has run and before it 
binds/renews a lease, it would be unable to run /sbin/dhclient-script,
and could cycle infinitely - I agree this behaviour should be corrected.

Did you actually verify that the hard disk had failed ?

I'd just like to make sure that this problem was actually due to a hard disk
failure and not problems with dhclient-script. 

Presumably the hard disk on which the /var/log partition resides was not the
one which failed ?  Else you would not have had the log fill up - or were
you using remote logging ?  Some examples of the log messages generated by
dhclient would be useful to append to this bug.

If dhclient detects an unrecoverable dhclient-script failure, it should
probably just exit . I'll work on implementing this. But the further
information described above would be much appreciated in resolving this bug.

Thank you!

 


-- Additional comment from kjetilho.no on 2005-06-09 11:00 EST --
the log disk in question is on our DHCP server (which also happens to be our
central logging host), the messages are from dhcpd running there, not the client
machine.  it's just the normal sequence of DHCPREQUEST DHCPOFFER DHCPACK.

I'm afraid we don't have any record of the messages on the failing computers. 
the console messages zoom by so fast it's impossible to read them, and after
powering down, the hard drive won't even spin up anymore.

(I'm impressed by the speedy response :)


-- Additional comment from jvdias on 2005-06-09 20:32 EST --
This bug is now fixed with dhcp-3.0.1-40_EL3, which should be in 
RHEL-3-U6, but which meanwhile can be downloaded from:
  http://people.redhat.com/~jvdias/dhcp/RHEL-3 
Please try it out and let me know of any issues - thank you.
ISC bug 14894 raised and patch submitted - accepted for inclusion
in next upstream DHCP release.

-- Additional comment from kjetilho.no on 2005-07-05 11:33 EST --
FWIW, the test RPM has been running in one of our labs (40 student workstations)
for almost a month without any issues.  I haven't brought out the hammer, though :-)

Comment 1 Jason Vas Dias 2005-10-14 14:44:28 UTC
fixed with dhcp-3.0.1-40_EL4+

Comment 2 Gilles Detillieux 2005-10-14 16:01:05 UTC
I occurs to me that the problem with this bug, and the seemingly related bug
162080 I reported in June, is that fixing the client side is only half a
solution.  Sure, the dhclient has to be fixed not to misbehave in this way. 
However, the dhcp server should similarly be fixed to detect a misbehaving
client and stop talking to it for a while.  Otherwise, there's a potential
denial of service attack problem, in which a misbehaving client can tie up a
server or fill its /var/log partition with junk messages.  Any chance of getting
the server side fixed to handle this better as well?

Comment 8 Red Hat Bugzilla 2006-03-07 18:14:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0114.html



Note You need to log in before you can comment on or make changes to this bug.