Bug 143728

Summary: dhcp6s: segfault and wrong report on changing /etc/resolv.conf
Product: [Fedora] Fedora Reporter: Peter Bieringer <pb>
Component: dhcpv6Assignee: Jason Vas Dias <jvdias>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: sundaram
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 0.10-9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-05 08:30:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 166507    

Description Peter Bieringer 2004-12-25 22:41:52 UTC
Description of problem:

dhcp6c cannot be used, because it won't work as expected.


Version-Release number of selected component (if applicable):

# rpm -q dhcpv6_client
dhcpv6_client-0.10-8

How reproducible:

Always

Steps to Reproduce:
1. Setup dhcp6s on a server
2. run dhcp6c on a client, e.g.
# dhcp6c -D -f eth0

Actual results:

# dhcp6c -D -f eth0
Dec/25/2004 23:15:50 <3>[interface] (9)
Dec/25/2004 23:15:50 <5>[eth0] (4)
Dec/25/2004 23:15:50 <3>begin of closure [{] (1)
Dec/25/2004 23:15:50 <3>comment [#      information-only;] (19)
Dec/25/2004 23:15:50 <3>[send] (4)
Dec/25/2004 23:15:50 <3>[rapid-commit] (12)
Dec/25/2004 23:15:50 <3>end of sentence [;] (1)
Dec/25/2004 23:15:50 <3>[request] (7)
Dec/25/2004 23:15:50 <3>[prefix-delegation] (17)
Dec/25/2004 23:15:50 <3>end of sentence [;] (1)
Dec/25/2004 23:15:50 <3>comment [#information-only;] (18)
Dec/25/2004 23:15:50 <3>comment [#      request temp-address;] (23)
Dec/25/2004 23:15:50 <3>end of closure [}] (1)
Dec/25/2004 23:15:50 <3>end of sentence [;] (1)
Dec/25/2004 23:15:50 extracted an existing DUID from
/var/lib/dhcpv6/dhcp6c_duid: 00:01:00:01:09:60:73:93:00:00:39:ea:77:81
Dec/25/2004 23:15:50 link local addr is fe80::200:39ff:feea:7781
Dec/25/2004 23:15:50 res addr is fe80::200:39ff:feea:7781%eth0/28
Dec/25/2004 23:15:50 found an interface eth0 harware
00:00:39:ffffffea:77:ffffff81
Dec/25/2004 23:15:50  create iaid 3929604096 for interface eth0
Dec/25/2004 23:15:50 found an interface eth0 harware
00:00:39:ffffffea:77:ffffff81
Dec/25/2004 23:15:50  found interface eth0 iaid 3929604096
Dec/25/2004 23:15:50 interface eth0 iaid is 3929604096
Dec/25/2004 23:15:50 open_netlink_socket called
Dec/25/2004 23:15:50 netlink_send_rtmsg called
Dec/25/2004 23:15:50 netlink_recv_rtgenmsg called
Dec/25/2004 23:15:50 netlink_recv_rtgenmsg error
Dec/25/2004 23:15:50 netlink_send_rtgenmsg called
Dec/25/2004 23:15:50 netlink_recv_rtgenmsg called
Dec/25/2004 23:15:50 get_if_flags called
Dec/25/2004 23:15:50 get_if_flags called
Dec/25/2004 23:15:50 netlink_recv_rtgenmsg error
Dec/25/2004 23:15:50 create an event 0x89aa2d0 xid 0 for state 0
Dec/25/2004 23:15:50 reset a timer on eth0, state=INIT, timeo=0,
retrans=851
Dec/25/2004 23:15:51 ifp 0x89a6008 event 0x89aa2d0 a new XID (7314bf)
is generated
Dec/25/2004 23:15:51 set client ID
Dec/25/2004 23:15:51 set opt_8
Dec/25/2004 23:15:51 set rapid commit
Dec/25/2004 23:15:51 set IA_PD iaidinfo: iaid 3929604096 renewtime 0
rebindtime 0
Dec/25/2004 23:15:51 set opt_30
Dec/25/2004 23:15:51 send dst if eth0 addr is ff02::1:2%eth0 scope id is 2
Dec/25/2004 23:15:51 send solicit to ff02::1:2%eth0
Dec/25/2004 23:15:51 DUID is
00:01:00:01:09:60:73:93:00:00:39:ea:77:81, DUID_LEN is 14
Dec/25/2004 23:15:51 removing ID (ID:
00:01:00:01:09:60:73:93:00:00:39:ea:77:81)
Dec/25/2004 23:15:51 DUID is , DUID_LEN is 0
Dec/25/2004 23:15:51 reset a timer on eth0, state=SOLICIT, timeo=0,
retrans=1016
Dec/25/2004 23:15:51 receive packet info ifname eth0, addr is
fe80::200:39ff:feea:7781 scope id is 2
Dec/25/2004 23:15:51 receive reply from fe80::250:4ff:febd:b485%eth0
scope id 2 eth0
Dec/25/2004 23:15:51 get DHCP option client ID, len 14
Dec/25/2004 23:15:51   DUID: 00:01:00:01:09:60:73:93:00:00:39:ea:77:81
Dec/25/2004 23:15:51 get DHCP option server ID, len 14
Dec/25/2004 23:15:51   DUID: 00:01:00:01:09:60:5e:bc:00:50:04:bd:b4:85
Dec/25/2004 23:15:51 get DHCP option rapid commit, len 0
Dec/25/2004 23:15:51 get DHCP option opt_30, len 47
Dec/25/2004 23:15:51 get option iaid is 3929604096, renewtime 60,
rebindtime 90
Dec/25/2004 23:15:51   IA address option: opt_31, len 31
Dec/25/2004 23:15:51   get IAPREFIX prefix information:
fec0:0:0:1::/64 preferlifetime 130 validlifetime 200
Dec/25/2004 23:15:51 status code for this prefix is: success
Dec/25/2004 23:15:51 get DHCP option preference, len 1
Dec/25/2004 23:15:51 get option preferrence is ff
Dec/25/2004 23:15:51 get DHCP option status code, len 2
Dec/25/2004 23:15:51   this message status code: success
Dec/25/2004 23:15:51 get DHCP option DNS_RESOLVERS, len 16
Dec/25/2004 23:15:51 get DHCP option opt_26, len 18
Dec/25/2004 23:15:51 expand domain name pbg.bieringer.de, size 16
Dec/25/2004 23:15:51 reply message XID is (7314bf)
Dec/25/2004 23:15:51 ifp 0x89a6008 event 0x89aa2d0 id is 7314bf
Dec/25/2004 23:15:51 serverID is
00:01:00:01:09:60:5e:bc:00:50:04:bd:b4:85 len is 14
Dec/25/2004 23:15:51 new server DUID
00:01:00:01:09:60:5e:bc:00:50:04:bd:b4:85, len 14
Dec/25/2004 23:15:51 parse domain name dyn.pbg.bieringer.de
Dec/25/2004 23:15:51 parse domain name pbg.bieringer.de
Dec/25/2004 23:15:51 parse domain name bieringer.de
Dec/25/2004 23:15:51 parse name server fec0:0:0:1::1
Dec/25/2004 23:15:51  received nameserver[0] fec0:0:0:1::1
Dec/25/2004 23:15:51 nameserver fec0:0:0:1::1 found in resolv.conf
Dec/25/2004 23:15:51  received domainname[0] pbg.bieringer.de
Dec/25/2004 23:15:51 domain name pbg.bieringer.de found in resolv.conf
Dec/25/2004 23:15:51  rename failed for resolv.conf file
Segmentation fault


Expected results:

Working


Additional info:

For me it looks that at least the additional patch is wrong:
dhcpv6-0.10-change_resolv_conf.patch

Function "int change_resolv_conf(" either returns 1 or -1, but below
return value is compared against 0:

-       if (rename(resolv_dhcpv6_file, RESOLV_CONF_FILE)) {
+       if (change_resolv_conf(resolv_dhcpv6_file)!=0) {
                dprintf(LOG_ERR, "%s" " rename failed for resolv.conf
file", FNAME);
                return(-1);


Therefore it will never be counted as succesful, but as far as I see
the external call did the change in /etc/resolv.conf.


The reason, why it crashed afterwards can be perhaps seen here:

Last lines of
# strace -f dhcp6c -D -f eth0

[pid  5555] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
[pid  5555] stat64("/var/lock/subsys/nscd", 0xfeeca1b0) = -1 ENOENT
(No such file or directory)
[pid  5555] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  5555] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  5555] exit_group(1)               = ?
Process 5554 resumed
Process 5555 detached
<... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0) = 5555
--- SIGCHLD (Child exited) @ 0 (0) ---
time(NULL)                              = 1104013342
write(2, "Dec/25/2004 23:22:22  rename fai"..., 57Dec/25/2004 23:22:22
 rename failed for resolv.conf file
) = 57
close(10)                               = 0
munmap(0xf6ffe000, 4096)                = 0
close(10)                               = -1 EBADF (Bad file descriptor)
ioctl(-1, SNDCTL_TMR_TIMEBASE or TCGETS, 0xfee9a6f8) = -1 EBADF (Bad
file descriptor)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xf6ffd000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV (core dumped) +++

Backtrace in core:

#0  0x002630cc in memcpy () from /lib/tls/libc.so.6
(gdb) bt
#0  0x002630cc in memcpy () from /lib/tls/libc.so.6
#1  0x00256d74 in _IO_file_xsgetn_internal () from /lib/tls/libc.so.6
#2  0x00257bd8 in _IO_sgetn_internal () from /lib/tls/libc.so.6
#3  0x0024c152 in fread () from /lib/tls/libc.so.6
#4  0x08059259 in dprintf ()
#5  0x08059589 in dprintf ()
#6  0x0804ba40 in ?? ()
#7  0xfee9aa00 in ?? ()
#8  0x08068e08 in optind ()
#9  0x0805ffcd in _IO_stdin_used ()
#10 0x08068e40 in optind ()
#11 0x0000000e in ?? ()
#12 0x00000002 in ?? ()
#13 0x083d8098 in ?? ()
#14 0x00000000 in ?? ()

Comment 1 Peter Bieringer 2004-12-25 22:42:48 UTC
Grmml, mean dhcp6c in subject, not dhcp6s

Comment 2 Jason Vas Dias 2005-01-03 14:26:38 UTC
I am now investigating and will remedy ASAP.

Comment 3 Jason Vas Dias 2005-01-05 18:20:28 UTC
I've been trying to reproduce this problem, so far without success .

Firstly, this line is correct :
+       if (change_resolv_conf(resolv_dhcpv6_file)!=0) {

The change_resolv_conf file will invoke the
/etc/sysconfig/network-scripts/network-functions  
'change_resolv_conf' function which restarts nscd if it is running -
ONLY if the new resolv.conf file failed to be written will this 
function return a non-zero (failure) value, which is returned by 
waitpid() to the change_resolv_conf() function and thence to the 
caller : update_resolver() . 
update_resolver() then prints the "rename failed for resolv.conf file"
message and returns to resolv_parse(), which returns to dhcp6c.c, 
which ignores the status. 
So it is what dhcp6c then goes on to do, after the change_resolv_conf
file returns !=0, that is causing the core dump. 

I've been able to set up a very simple server and client, get a lease,
and the resolv.conf file is changed successfully, with no core dumps,
so I'm not able to reproduce this problem - perhaps there is something
in your dhcpv6 config files that differs to mine ?

Please could you attach your dhcp6c + dhcp6s config files to this bug
(or send them to jvdias) .

If you can still reproduce the problem, please download and install
the dhcpv6-debuginfo-0.10-8.$arch.rpm for your architecture, from:
  http://people.redhat.com/~jvdias/dhcpv6/0.10-8
and then reproduce the problem - gdb will then be able to resolve
symbol information from the core file which should tell us the 
source code file and line number where the core occurred .






Comment 4 Peter Bieringer 2005-01-07 10:40:41 UTC
Here the new debug output, note that the segfault was caused by libc,
so I installed the glibc-debuginfo* rpm:

# LANG="C" gdb /sbin/dhcp6c core.7603
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging
symbols found)...Using host libthread_db library
"/lib/tls/libthread_db.so.1".

Core was generated by `dhcp6c -D -f eth0'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libresolv.so.2...Reading symbols from
/usr/lib/debug/lib/libresolv-2.3.4.so.debug...done.
done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libcrypto.so.4...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /lib/tls/libc.so.6...Reading symbols from
/usr/lib/debug/lib/tls/libc-2.3.4.so.debug...done.
done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/lib/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/libkrb5.so.3...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /usr/lib/libk5crypto.so.3...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/libdl.so.2...Reading symbols from
/usr/lib/debug/lib/libdl-2.3.4.so.debug...done.
done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...Reading symbols from
/usr/lib/debug/lib/ld-2.3.4.so.debug...done.
done.
Loaded symbols for /lib/ld-linux.so.2
#0  ?? () at ../sysdeps/i386/i686/memcpy.S:59 from /lib/tls/libc.so.6
59      ../sysdeps/i386/i686/memcpy.S: No such file or directory.
        in ../sysdeps/i386/i686/memcpy.S
(gdb) bt
#0  ?? () at ../sysdeps/i386/i686/memcpy.S:59 from /lib/tls/libc.so.6
#1  0x00349d74 in _IO_file_xsgetn (fp=0x875a7a0, data=0x875aaa1,
n=8192) at fileops.c:1394
#2  0x0034abd8 in _IO_sgetn (fp=0xffbea7c8, data=0x875aaa1,
n=141929121) at genops.c:490
#3  0x0033f152 in _IO_fread (buf=0x875aaa1, size=1, count=8192,
fp=0x875a7a0) at iofread.c:44
#4  0x08059259 in dprintf ()
#5  0x08059589 in dprintf ()
#6  0x0804ba40 in ?? ()
#7  0xfef90bd0 in ?? ()
#8  0x08068e08 in optind ()
#9  0x0805ffcd in _IO_stdin_used ()
#10 0x08068e40 in optind ()
#11 0x0000000e in ?? ()
#12 0x00000002 in ?? ()
#13 0x08750098 in ?? ()
#14 0x00000000 in ?? ()


My configs:
# cat /etc/dhcp6s.conf
interface eth0 {
        server-preference 255;
        renew-time 60;
        rebind-time 90;
        prefer-life-time 130;
        valid-life-time 200;
        allow rapid-commit;
        option dns_servers fec0:0:0:1::1 muc.bieringer.de;

        link AAA {
                pool{
                        range fec0:0:0:1::10 to fec0:0:0:1::19/64;
                        prefix fec0:0:0:1::/64;
                };
        };
};


# cat /etc/dhcp6c.conf
interface eth0 {
        send rapid-commit;
        request prefix-delegation;
};

# cat /etc/resolv.conf
; generated by /sbin/dhclient-script
search muc.bieringer.de
nameserver 192.168.1.1
nameserver fec0:0:0:1::1

BTW: is it a good idea to store the temporary files in /etc? Each
crash leaves one like /etc/resolv.conf.dhcpv67603






Comment 5 Jason Vas Dias 2005-01-07 13:59:53 UTC
Thank you! With the configs you sent I should be able to duplicate
this problem and have it fixed by the end of today.
BTW, even though the core is in glibc, it is the stack frame of 
dhcp6c that was of interest, which is not shown . But I should have
enough info to fix now - thanks.

Comment 6 Jason Vas Dias 2005-01-08 01:19:14 UTC
 I've fixed the problem causing the core - it was new behaviour
 of flex-2.5.4 making the conf file parsers fail.
 
 dhcpv6-0.10-9 has the fix, and will be in FC3 updates shortly .
 Meanwhile, you can download it for the i386 from :
     http://people.redhat.com/~jvdias/dhcpv6/FC3 .
 
 The configuration you gave is for "Prefix Delegation" to a radvd
 router - dhcpv6 will write the resolv.conf and radvd.conf on the
 client, which is able to get the lease for the prefix .

 I would have expected it to allocate client addresses from the pool,
 like DHCP does - but no. Documentation for dhcpv6 is awful and 
 I will try to improve it .  I've only been able to allocate static
 leases with 'host' declarations that supply both the DUID and 
 IAIDINFO for each host . If you get dynamic address lease pools
 working, please let me know the config - thanks .  
 


Comment 7 Peter Bieringer 2005-01-12 16:39:06 UTC
I have the same problems with address delegation, but the crash issue
is fixed, that's ok for now.