Bug 162080 - dhclient running amock and filling /var/log/messages
dhclient running amock and filling /var/log/messages
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dhcp (Show other bugs)
3
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jason Vas Dias
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-29 15:08 EDT by Gilles Detillieux
Modified: 2007-11-30 17:11 EST (History)
1 user (show)

See Also:
Fixed In Version: 3.0.1-44_FC3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-05 04:18:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gilles Detillieux 2005-06-29 15:08:53 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050502 Red Hat/1.7.7-0.90.1.legacy

Description of problem:
I believe this bug is closely related to the one reported in bug 159929.  The cause of the error is different, but the effect seems to be the same, with dhclient immediately issuing another DHCP request, with the same results, causing it to spew out requests at a phenomenal rate and fill up the /var/log volume with log entries like these in the messages file:

Jun 27 05:03:45 phil dhclient: DHCPREQUEST on eth1 to 140.193.80.125 port 67
Jun 27 05:03:45 phil dhclient: DHCPACK from 140.193.80.125
Jun 27 05:03:45 phil dhclient: dhclient.c(2143): null pointer
Jun 27 05:03:45 phil dhclient: DHCPDECLINE on eth1 to 255.255.255.255 port 67
Jun 27 05:03:45 phil dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
Jun 27 05:03:45 phil kernel: audit(1119866625.040:0): avc:  denied  { execmod }
for  pid=4561 comm=dhclient-script path=/lib/libdl-2.3.5.so dev=hda2 ino=512021
scontext=user_u:system_r:unconfined_t tcontext=system_u:object_r:lib_t tclass=file
Jun 27 05:03:45 phil dhclient: DHCPOFFER from 140.193.80.125

Of course, it was also filling up /var/log/messages on the DHCP server with corresponding entries, till it chewed through 2 GB of space on both systems in about 7 hours, filling up the disk on the client side.  A reboot seemed to put an end to the errors (after cleaning up the log file to make space), but I think the error was initially triggered by the update to selinux-policy-targeted about 5 minutes before the errors started.

Whatever the cause, there absolutely must be some sort of limits put in place in this program so it doesn't run amock like this when it doesn't get its way.  It should have a progressively longer delay between retries, or something like that, to bring it under control regardless of what error may come its way.

In bug 159929, Jason Vas Dias (jvdias@redhat.com) wrote on 2005-06-09 that the problem is fixed in dhcp-3.0.1-40_EL3.  I don't know if his patch would fix the problem I encountered as well, but if so I'd like to see this fix for FC3 as well.  Thanks.

Version-Release number of selected component (if applicable):
dhclient-3.0.1-42_FC3

How reproducible:
Didn't try

Steps to Reproduce:
1. I'm not sure how to reproduce the exact error.  It happened on only one system, and it's not happening now.  This system was running kernel-2.6.11-1.27_FC3 at the time that yum updated selinux-policy-targeted and selinux-policy-targeted-sources to 1.17.30-3.13.  I'm running a custom policy in /etc/selinux/targeted/src/policy/domains/misc/local.te, so I don't know if the update actually resulted in a change of policy or not (my understanding was it wouldn't), but the update did trigger a run of /usr/sbin/load_policy.  Just under 5 minutes later, dhclient went haywire.  Later in the day, we booted up kernel-2.6.11-1.35_FC3, and the errors didn't come back.  I also touched the local.te file and did a "make reload" to make sure the policy was completely up to date.  I could try rebooting the previous kernel, but I'm doubtful I can recreate all the circumstances that led to the errors.

Someone familiar with the code could probably see what things can trigger the "null pointer" error above and find some way of tripping up dhclient.

Actual Results:  dhclient retried over 100 times per second, generating 7 log entries each time.  /var/log/messages filled up its partition (which was the root partition on this system) and the system became unusable until manual intervention, i.e. cleaning up /var/log/messages, took place.

Expected Results:  dhclient retries a few times, then gives up for a good while (a few minutes) before retrying again.  The more it fails, the longer it should wait.

Additional info:

These are the corresponding log entries from the DHCP server:

Jun 27 05:03:44 cliff dhcpd: DHCPREQUEST for 140.193.80.134 from 00:30:84:94:88:73 via eth0
Jun 27 05:03:44 cliff dhcpd: DHCPACK on 140.193.80.134 to 00:30:84:94:88:73 via
eth0
Jun 27 05:03:44 cliff dhcpd: DHCPDECLINE of 140.193.80.134 from 00:30:84:94:88:73 via eth0: not found
Jun 27 05:03:44 cliff dhcpd: DHCPDISCOVER from 00:30:84:94:88:73 via eth0
Jun 27 05:03:44 cliff dhcpd: DHCPOFFER on 140.193.80.134 to 00:30:84:94:88:73 via eth0

Why would dhclient decline an address it requested and got an acknowledgement for, and immediately ask for its address again?
Comment 1 Jason Vas Dias 2005-07-05 10:14:56 EDT
The problem is caused by an incorrect SELinux security context for
/sbin/dhclient-script - it somehow ended up with a context that did
not allow it to link to the dynamic loader library, and the execve of 
dhclient-script therefore exited with an error. This does not occur
with the default selinux-policy-targeted RPM.

When dhclient-script exits with an error, then dhclient assumes the
new configuration parameters were rejected due to local policy,
and sends a DHCPDECLINE to the server, and then starts off a new
configuration session from the initial state, sending a DHCPDISCOVER.

With the latest dhcp-3.0.1-40_EL3 version in RHEL-3, fixing 
bug 159929, dhclient will wait a random time between 1 and 15 
seconds before entering the initial state after a DHCPDECLINE is sent.  

The "null pointer" message is harmless - when a DHCPDECLINE occurs,
the code would attempt free-ing the lease structure memory twice,
which is detected by the allocation code and causes no problems -
just the error message. This multiple free attempt is also removed
from the code with dhcp-3.0.1-40_EL3.

I am now applying the changes to the FC-3 branch and will be issuing
an FC-3 update for dhcp-3.0.1-44_FC3 today . 
Comment 2 Jason Vas Dias 2005-07-12 14:24:48 EDT
This bug is now fixed with the dhcp-3.0.1-44_FC3, today released as FC3 update 
546, and available from
 ftp://download.fedora.redhat.com:/pub/fedora/linux/core/updates/3/i386
Please try it out and let me know of any issues - thank you.

Note You need to log in before you can comment on or make changes to this bug.