Bug 170769 - Dhcpd will fail to start up, exits with glibc error
Summary: Dhcpd will fail to start up, exits with glibc error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: dhcp
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jason Vas Dias
QA Contact:
URL:
Whiteboard:
Depends On: 147375
Blocks: 168429
TreeView+ depends on / blocked
 
Reported: 2005-10-14 14:58 UTC by Jason Vas Dias
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2006-0114
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-07 18:14:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0114 0 qe-ready SHIPPED_LIVE dhcp bug fix update 2006-03-06 05:00:00 UTC

Description Jason Vas Dias 2005-10-14 14:58:39 UTC
+++ This bug was initially created as a clone of Bug #147375 +++

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
On reboot or on service start/restart the dhcpd startup will emit an
error message like the one below:

Starting dhcpd: *** glibc detected *** free(): invalid pointer:
0x0867da10 ***
Internet Systems Consortium DHCP Server V3.0.1
Copyright 2004 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
                                                           [FAILED]

It does not necessarily happen all the time but it may also take
several invocations of the startup to get dhcp started.


Version-Release number of selected component (if applicable):
dhcp-3.0.1-30_FC3

How reproducible:
Sometimes

Steps to Reproduce:
1. service dhcpd restart  - it may work or it may take several attempts
to get the service started.  It will also happen if dhcpd is invoked
directly.
2.
3.
    

Actual Results:  # service dhcpd start
Starting dhcpd: *** glibc detected *** free(): invalid pointer:
0x083bca10 ***
Internet Systems Consortium DHCP Server V3.0.1
Copyright 2004 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
                                                           [FAILED]


Expected Results:  # service dhcpd start
Starting dhcpd:                                            [  OK  ]


Additional info:

Fedora Core 3 - All current updates for installed software

AMD 2700+ - 1GB Ram 240GB Disk (160GB SATA - 80 GB IDE)
1 DVD writer, 1 CD/RW Writer

Running as NAT firewall/workstation to home network.

glibc-2.3.4-2.fc3
glibc-kernheaders-2.4-9.1.87
kernel-2.6.10-1.760_FC3

I have not yet tried to rebuild SRPM on this system to see if it makes
any difference.

-- Additional comment from jvdias on 2005-02-07 13:56 EST --
Is this an AMD 64-bit or a 32-bit machine ?

I am not able to reproduce this problem on an i386 platform .

Please append the complete output of the dhcpd run attempt when 
it fails - try this: 

# cd /var/lib/dhcp
# script /tmp/dhcpd.log
# ulimit -c unlimited
# dhcpd -d -f -tf /tmp/dhcpd.trace.log
  ( wait for problem, press CTRL-C)
# exit
# ls -l core*
# gzip core*

and then append the /tmp/dhcpd.*.log and any core.*.gz files to this
bug or send them to me - thanks.










-- Additional comment from wrsturm on 2005-02-07 14:18 EST --
Created an attachment (id=110739)
dhcpd core file


-- Additional comment from wrsturm on 2005-02-07 14:19 EST --
Created an attachment (id=110740)
dhcp script log


-- Additional comment from wrsturm on 2005-02-07 14:20 EST --
Created an attachment (id=110741)
dhcp trace log


-- Additional comment from jvdias on 2005-02-07 14:37 EST --
Thanks! I am now investigating as top priority. 
Does the system have an AMD 64-bit or AMD 32-bit CPU ?
Does it have more than one CPU / hyperthreading enabled ?
 

-- Additional comment from jvdias on 2005-02-07 15:28 EST --
Please can you append the output of these commands to this bug:

# uname -a
# rpm -q dhcp --queryformat '%{ARCH} %{BUILDHOST}\n'

Thank you!


-- Additional comment from jvdias on 2005-02-07 15:53 EST --
The core file you sent is a 32-bit core file, but it is from an
executable which was linked to the glibc '32-bit compatibility mode'
/lib/tls/libc.so.6 from  glibc32-2.3.3-68, which is only installed 
on 64 bit systems.  
I've searched the AMD website for '2700+' but can find no data as
to whether the processor is 64-bit or 32-bit .
If you have a 64-bit machine ('uname -m' outputs x86_64), then you
should install the dhcp-3.0.1-30_FC3.x86_64.rpm, not the 32-bit
dhcp-3.0.1-30_FC3.i386.rpm - this incompatibility could be the source
of your problem. 
Please also do an 'rpm -qf `readlink /lib/tls/libc.so.6`' and tell me
which package is output - if  glibc32-2.3.3-68, this could also be 
a problem because glibc was upgraded to 2.3.4-2 while the
compatibility library is at glibc-2.3.3, and dhcp-3.0.1-30 was
compiled for  glibc-2.3.4. 

-- Additional comment from wrsturm on 2005-02-07 16:42 EST --
It is a 32bit single cpu no hyperthreading

uname -a
2.6.10-1.760_FC3 #1 Wed Feb 2 00:14:23 EST 2005 i686 athlon i386 GNU/Linux
It is an AMD Athlon 32 bit chip that in marketese performs as fasts as
an equivalent intel running at clockspeed of 2700MHz.  The actual
clock speed is 2170.352 MHz.

rpm -q dhcp --queryformat '%{ARCH} %{BUILDHOST}\n'
i386 tweety.build.redhat.com

uname -m
i686

A slightly different variant of the command you sent:
rpm -qf /lib/`readlink /lib/tls/libc.so.6`
glibc-2.3.4-2.fc3



-- Additional comment from wrsturm on 2005-02-07 17:12 EST --
Here is /proc/spuinfo

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2700+
stepping        : 1
cpu MHz         : 2170.352
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 4292.60



-- Additional comment from jvdias on 2005-02-07 19:51 EST --
I've tried running dhcpd in playback mode with the trace file and 
using your exact configuration and lease file hundreds of times in 
a loop with no exit / core generated, on an Intel i686 system .
I'm looking for an Athlon system on which to test, so far without
success - so it looks like yours is the only system on which this
problem can be reproduced at the moment.

Please try the following:
1. Replace the glibc-2.3.4-2.fc3.i686.rpm with 
   glibc-2.3.4-2.fc3.i386.rpm :
   
   # rpm -Uvh --force glibc-2.3.4-2.fc3.i386.rpm 
   
   (both the i386 and i686 RPMs can be downloaded from:
    ftp://download.fedora.redhat.com/pub/fedora/linux/core/updates/3
  
   Reboot and see if the problem still occurs, repeating steps in
   Comment #1 .
  
   If it does not, it would seem there is a problem with the i686
   optimized glibc on Athlon and I will take this up with the 
   glibc / anaconda developers .
  
   If it does, then there is a dhcpd problem - you can go ahead and
   put back the i686 glibc:

   # rpm -Uvh --force glibc-2.3.4-2.fc3.i686.rpm 
   
  
   The core file appears to be corrupt - I cannot obtain any 
   useful data from it on our systems here:
"
$ gdb /usr/sbin/dhcpd core.31535
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) 
...
Core was generated by `dhcpd -d -f -tf /tmp/dhcpd.trace.log'.
Program terminated with signal 6, Aborted.
Loaded symbols for /usr/sbin/dhcpd
...
#0  0x00ee27a2 in ?? ()
(gdb) where
#0  0x00ee27a2 in ?? ()
#1  0x00991955 in ?? ()
#2  0x00000000 in ?? ()
(gdb) quit
"
    If the gdb "where" commmand for the corefile shows anything
    different on your system, please append it to this bug.
 
    The core was generated by an abort in glibc, which is part
    of new memory validation routines with which there may be 
    a problem on the Athlon .

2.  If you still get an exit with core dump, please download this
    source RPM :
 http://people.redhat.com/~jvdias/DHCP/FC3/dhcp-3.0.1-30_FC3.src.rpm
    and build it with:
    # rpmbuild --rebuild dhcp-3.0.1-30_FC3.src.rpm
    This will build an unstripped debugging version of DHCP .
    Install the RPMS produced in /usr/src/redhat/RPMS/i386 
    and reproduce the problem. 
    The core file should then not be corrupt and doing a 'gdb where'
    will tell us what is causing the problem - please append the
    core file or output of gdb 'where' command generated as above.

Thank You!
 
 


-- Additional comment from jvdias on 2005-02-07 20:26 EST --
I've finally found a dual processor athlon on which I can reproduce
the problem. It would appear to be a glibc bug . You needn't gather
the information requested in the above comment . Installing the 
 i386 glibc may prevent the problem from occurring. I'm continuing 
to investigate - thanks. 


-- Additional comment from jvdias on 2005-02-08 18:48 EST --
I've found the problem. It was a memory corruption issue latent to
all previous dhcp versions, which just happened to trigger the new
glibc / gcc 'FORTIFY_SOURCE' runtime memory validation checks ONLY
on the Athlon FC3 platform - weird! But genuine problems were found
and are fixed with dhcp-3.0.1-32_FC3, which can be downloaded from :
  http://people.redhat.com/~jvdias/DHCP/FC3/3.0.1-32_FC3/i386
Please test this version and let me know if it fixes the problem -
it certainly does on the machine on which I was able to reproduce it.
Thank you!


-- Additional comment from wrsturm on 2005-02-08 19:58 EST --
That seems to have done it.  Tried a few restarts (4), a reboot then a
bunch more restarts(10) without an issue.  Thanks.

-- Additional comment from jvdias on 2005-02-10 10:27 EST --
 I contacted the upstream ISC DHCP maintainer on this issue, and ISC
 have agreed to fix this in the next release .

 But they pointed out that the subnet declaration:
   subnet 68.145.239.64 netmask 255.255.255.255 {}
 is what causes the problem, as a 32-bit netmask was never 
 envisioned to be used here (but it is not forbidden in the
 documentation - it just doesn't make any sense) .
 
 I think what you are trying to achieve is to get DHCP to ignore
 the interface with address 68.145.239.64 ?  This would be 
 achieved by omitting the 68.145.239.64 subnet declaration altogether.
 Yes, dhcp will emit a message about 
  "No subnet declaration for xxxx (68.145.239.64)" 
 but this message is harmless - that interface will still be ignored.
 By default, dhcpd will bind to address 0.0.0.0 (the "ANY") address
 on the interface for which it has a subnet declaration. Using the
 'local-address' option makes it bind to a specific address - so
 you could specify
  'local-address 10.0.0.1;'
 and dhcpd would bind ONLY to address 10.0.0.1 on the 10.0.0/24 
 interface .
 
  

-- Additional comment from wrsturm on 2005-02-10 21:15 EST --
Yep.  Thats what I was trying to do.  I have made the changes here and
will 'live' with the error message (until I forget why I did this).

Any day now.  :-)  

-- Additional comment from marius.andreiana on 2005-08-20 03:20 EST --
Closing as errata

Comment 1 Jason Vas Dias 2005-10-14 14:59:59 UTC
fixed with dhcp-3.0.1-40_EL4+

Comment 7 Red Hat Bugzilla 2006-03-07 18:14:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0114.html



Note You need to log in before you can comment on or make changes to this bug.