Bug 147375

Summary: Dhcpd will fail to start up, exits with glibc error
Product: [Fedora] Fedora Reporter: Warren Sturm <warren.sturm>
Component: dhcpAssignee: Jason Vas Dias <jvdias>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: marius.andreiana
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-20 03:20:08 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 170767, 170769    
Attachments:
Description Flags
dhcpd core file
none
dhcp script log
none
dhcp trace log none

Description Warren Sturm 2005-02-07 13:32:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
On reboot or on service start/restart the dhcpd startup will emit an
error message like the one below:

Starting dhcpd: *** glibc detected *** free(): invalid pointer:
0x0867da10 ***
Internet Systems Consortium DHCP Server V3.0.1
Copyright 2004 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
                                                           [FAILED]

It does not necessarily happen all the time but it may also take
several invocations of the startup to get dhcp started.


Version-Release number of selected component (if applicable):
dhcp-3.0.1-30_FC3

How reproducible:
Sometimes

Steps to Reproduce:
1. service dhcpd restart  - it may work or it may take several attempts
to get the service started.  It will also happen if dhcpd is invoked
directly.
2.
3.
    

Actual Results:  # service dhcpd start
Starting dhcpd: *** glibc detected *** free(): invalid pointer:
0x083bca10 ***
Internet Systems Consortium DHCP Server V3.0.1
Copyright 2004 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
                                                           [FAILED]


Expected Results:  # service dhcpd start
Starting dhcpd:                                            [  OK  ]


Additional info:

Fedora Core 3 - All current updates for installed software

AMD 2700+ - 1GB Ram 240GB Disk (160GB SATA - 80 GB IDE)
1 DVD writer, 1 CD/RW Writer

Running as NAT firewall/workstation to home network.

glibc-2.3.4-2.fc3
glibc-kernheaders-2.4-9.1.87
kernel-2.6.10-1.760_FC3

I have not yet tried to rebuild SRPM on this system to see if it makes
any difference.
Comment 1 Jason Vas Dias 2005-02-07 13:56:25 EST
Is this an AMD 64-bit or a 32-bit machine ?

I am not able to reproduce this problem on an i386 platform .

Please append the complete output of the dhcpd run attempt when 
it fails - try this: 

# cd /var/lib/dhcp
# script /tmp/dhcpd.log
# ulimit -c unlimited
# dhcpd -d -f -tf /tmp/dhcpd.trace.log
  ( wait for problem, press CTRL-C)
# exit
# ls -l core*
# gzip core*

and then append the /tmp/dhcpd.*.log and any core.*.gz files to this
bug or send them to me - thanks.








Comment 2 Warren Sturm 2005-02-07 14:18:09 EST
Created attachment 110739 [details]
dhcpd core file
Comment 3 Warren Sturm 2005-02-07 14:19:27 EST
Created attachment 110740 [details]
dhcp script log
Comment 4 Warren Sturm 2005-02-07 14:20:01 EST
Created attachment 110741 [details]
dhcp trace log
Comment 5 Jason Vas Dias 2005-02-07 14:37:23 EST
Thanks! I am now investigating as top priority. 
Does the system have an AMD 64-bit or AMD 32-bit CPU ?
Does it have more than one CPU / hyperthreading enabled ?
 
Comment 6 Jason Vas Dias 2005-02-07 15:28:37 EST
Please can you append the output of these commands to this bug:

# uname -a
# rpm -q dhcp --queryformat '%{ARCH} %{BUILDHOST}\n'

Thank you!
Comment 7 Jason Vas Dias 2005-02-07 15:53:24 EST
The core file you sent is a 32-bit core file, but it is from an
executable which was linked to the glibc '32-bit compatibility mode'
/lib/tls/libc.so.6 from  glibc32-2.3.3-68, which is only installed 
on 64 bit systems.  
I've searched the AMD website for '2700+' but can find no data as
to whether the processor is 64-bit or 32-bit .
If you have a 64-bit machine ('uname -m' outputs x86_64), then you
should install the dhcp-3.0.1-30_FC3.x86_64.rpm, not the 32-bit
dhcp-3.0.1-30_FC3.i386.rpm - this incompatibility could be the source
of your problem. 
Please also do an 'rpm -qf `readlink /lib/tls/libc.so.6`' and tell me
which package is output - if  glibc32-2.3.3-68, this could also be 
a problem because glibc was upgraded to 2.3.4-2 while the
compatibility library is at glibc-2.3.3, and dhcp-3.0.1-30 was
compiled for  glibc-2.3.4. 
Comment 8 Warren Sturm 2005-02-07 16:42:25 EST
It is a 32bit single cpu no hyperthreading

uname -a
2.6.10-1.760_FC3 #1 Wed Feb 2 00:14:23 EST 2005 i686 athlon i386 GNU/Linux
It is an AMD Athlon 32 bit chip that in marketese performs as fasts as
an equivalent intel running at clockspeed of 2700MHz.  The actual
clock speed is 2170.352 MHz.

rpm -q dhcp --queryformat '%{ARCH} %{BUILDHOST}\n'
i386 tweety.build.redhat.com

uname -m
i686

A slightly different variant of the command you sent:
rpm -qf /lib/`readlink /lib/tls/libc.so.6`
glibc-2.3.4-2.fc3

Comment 9 Warren Sturm 2005-02-07 17:12:08 EST
Here is /proc/spuinfo

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2700+
stepping        : 1
cpu MHz         : 2170.352
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips        : 4292.60

Comment 10 Jason Vas Dias 2005-02-07 19:51:07 EST
I've tried running dhcpd in playback mode with the trace file and 
using your exact configuration and lease file hundreds of times in 
a loop with no exit / core generated, on an Intel i686 system .
I'm looking for an Athlon system on which to test, so far without
success - so it looks like yours is the only system on which this
problem can be reproduced at the moment.

Please try the following:
1. Replace the glibc-2.3.4-2.fc3.i686.rpm with 
   glibc-2.3.4-2.fc3.i386.rpm :
   
   # rpm -Uvh --force glibc-2.3.4-2.fc3.i386.rpm 
   
   (both the i386 and i686 RPMs can be downloaded from:
    ftp://download.fedora.redhat.com/pub/fedora/linux/core/updates/3
  
   Reboot and see if the problem still occurs, repeating steps in
   Comment #1 .
  
   If it does not, it would seem there is a problem with the i686
   optimized glibc on Athlon and I will take this up with the 
   glibc / anaconda developers .
  
   If it does, then there is a dhcpd problem - you can go ahead and
   put back the i686 glibc:

   # rpm -Uvh --force glibc-2.3.4-2.fc3.i686.rpm 
   
  
   The core file appears to be corrupt - I cannot obtain any 
   useful data from it on our systems here:
"
$ gdb /usr/sbin/dhcpd core.31535
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) 
...
Core was generated by `dhcpd -d -f -tf /tmp/dhcpd.trace.log'.
Program terminated with signal 6, Aborted.
Loaded symbols for /usr/sbin/dhcpd
...
#0  0x00ee27a2 in ?? ()
(gdb) where
#0  0x00ee27a2 in ?? ()
#1  0x00991955 in ?? ()
#2  0x00000000 in ?? ()
(gdb) quit
"
    If the gdb "where" commmand for the corefile shows anything
    different on your system, please append it to this bug.
 
    The core was generated by an abort in glibc, which is part
    of new memory validation routines with which there may be 
    a problem on the Athlon .

2.  If you still get an exit with core dump, please download this
    source RPM :
 http://people.redhat.com/~jvdias/DHCP/FC3/dhcp-3.0.1-30_FC3.src.rpm
    and build it with:
    # rpmbuild --rebuild dhcp-3.0.1-30_FC3.src.rpm
    This will build an unstripped debugging version of DHCP .
    Install the RPMS produced in /usr/src/redhat/RPMS/i386 
    and reproduce the problem. 
    The core file should then not be corrupt and doing a 'gdb where'
    will tell us what is causing the problem - please append the
    core file or output of gdb 'where' command generated as above.

Thank You!
 
 
Comment 11 Jason Vas Dias 2005-02-07 20:26:46 EST
I've finally found a dual processor athlon on which I can reproduce
the problem. It would appear to be a glibc bug . You needn't gather
the information requested in the above comment . Installing the 
 i386 glibc may prevent the problem from occurring. I'm continuing 
to investigate - thanks. 
Comment 12 Jason Vas Dias 2005-02-08 18:48:13 EST
I've found the problem. It was a memory corruption issue latent to
all previous dhcp versions, which just happened to trigger the new
glibc / gcc 'FORTIFY_SOURCE' runtime memory validation checks ONLY
on the Athlon FC3 platform - weird! But genuine problems were found
and are fixed with dhcp-3.0.1-32_FC3, which can be downloaded from :
  http://people.redhat.com/~jvdias/DHCP/FC3/3.0.1-32_FC3/i386
Please test this version and let me know if it fixes the problem -
it certainly does on the machine on which I was able to reproduce it.
Thank you!
Comment 13 Warren Sturm 2005-02-08 19:58:25 EST
That seems to have done it.  Tried a few restarts (4), a reboot then a
bunch more restarts(10) without an issue.  Thanks.
Comment 14 Jason Vas Dias 2005-02-10 10:27:48 EST
 I contacted the upstream ISC DHCP maintainer on this issue, and ISC
 have agreed to fix this in the next release .

 But they pointed out that the subnet declaration:
   subnet 68.145.239.64 netmask 255.255.255.255 {}
 is what causes the problem, as a 32-bit netmask was never 
 envisioned to be used here (but it is not forbidden in the
 documentation - it just doesn't make any sense) .
 
 I think what you are trying to achieve is to get DHCP to ignore
 the interface with address 68.145.239.64 ?  This would be 
 achieved by omitting the 68.145.239.64 subnet declaration altogether.
 Yes, dhcp will emit a message about 
  "No subnet declaration for xxxx (68.145.239.64)" 
 but this message is harmless - that interface will still be ignored.
 By default, dhcpd will bind to address 0.0.0.0 (the "ANY") address
 on the interface for which it has a subnet declaration. Using the
 'local-address' option makes it bind to a specific address - so
 you could specify
  'local-address 10.0.0.1;'
 and dhcpd would bind ONLY to address 10.0.0.1 on the 10.0.0/24 
 interface .
 
  
Comment 15 Warren Sturm 2005-02-10 21:15:42 EST
Yep.  Thats what I was trying to do.  I have made the changes here and
will 'live' with the error message (until I forget why I did this).

Any day now.  :-)  
Comment 16 Marius Andreiana 2005-08-20 03:20:08 EDT
Closing as errata