Bug 722581

Summary: Fails to start if iface not up | segfault on IgnoreMissing
Product: [Fedora] Fedora Reporter: Pete Zaitcev <zaitcev>
Component: radvdAssignee: Petr Pisar <ppisar>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: jskala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://lists.litech.org/pipermail/radvd-devel-l/2010-September/000491.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-29 17:03:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
radvd.conf none

Description Pete Zaitcev 2011-07-15 17:46:33 UTC
Description of problem:

On a home router with 1 ethernet (over USB) and 1 wifi interface,
radvd fails to start with:

Jul 15 11:37:53 elanor radvd[1357]: version 1.7 started
Jul 15 11:37:53 elanor radvd[1357]: interface wlanhome is not RUNNING
Jul 15 11:37:53 elanor radvd[1357]: interface wlanhome does not exist
Jul 15 11:37:53 elanor radvd[1357]: error parsing or activating the config file: /etc/radvd.conf
Jul 15 11:37:53 elanor radvd[1357]: Exiting, failed to read config file.

It appears that some dependencies are not satisfied under systemd.
Something permits radvd to proceed before interfaces are up, which
was not a problem before. This issue appeared after an upgrade from
Fedora 14.

Version-Release number of selected component (if applicable):

radvd-1.7-3.fc15.i686
systemd-26-8.fc15.i686

How reproducible:

Every time on reboot.

Steps to Reproduce:
1. Configure a system with hostapd controlling interface
2. Configure radvd
3. reboot
  
Actual results:

radvd fails to start

Expected results:

radvd starts

Additional info:

radvd.conf to be attached

Comment 1 Pete Zaitcev 2011-07-15 17:52:57 UTC
Created attachment 513422 [details]
radvd.conf

Comment 2 Pete Zaitcev 2011-07-15 17:54:08 UTC
Also, this may be obvious, but just to be clear: if I log in over ssh
and say "systemctl start radvd.service", everything starts right up,
because by that time hostapd brought up wlanhome.

Comment 3 Pete Zaitcev 2011-07-15 18:02:41 UTC
Before anyone asks: if I uncomment IgnoreMissing, the following happens:

Jul 15 11:58:44 elanor radvd[1367]: interface wlanhome seems to have come back up, trying to reinitialize
Jul 15 11:58:44 elanor radvd[1367]: attempting to reread config file
Jul 15 11:58:44 elanor radvd[1367]: resuming normal operation
Jul 15 11:58:44 elanor kernel: [   40.608051] radvd[1367]: segfault at 50 ip 0080b14a sp bff5b3a0 error 6 in radvd[807000+63000]
Jul 15 11:58:44 elanor systemd[1]: radvd.service: main process exited, code=killed, status=11

Comment 4 Jiri Skala 2011-08-12 10:56:28 UTC
(In reply to comment #3)
> Jul 15 11:58:44 elanor radvd[1367]: interface wlanhome seems to have come back
> up, trying to reinitialize

hmm this looks like radvd checks the interface is available. I'm not able to reproduce it. Are you able to provide me backtrace?

Try to watch following bug:

https://bugzilla.redhat.com/show_bug.cgi?id=729183

Comment 5 Jiri Skala 2011-10-06 06:10:28 UTC
Do you have some comments to comment #4?

Comment 6 Pete Zaitcev 2011-10-12 05:09:39 UTC
Core seems to be impossible to catch, even with a shell wrapper, dunno
what's up with that. But gdb -p says:

Program received signal SIGSEGV, Segmentation fault.
0x002c114a in alarm_handler (sig=14) at timer.c:152
152                     tm->prev->next = tm->next;
(gdb) where
#0  0x002c114a in alarm_handler (sig=14) at timer.c:152
#1  <signal handler called>
#2  0x00447416 in __kernel_vsyscall ()
#3  0x008b7ccd in ___newselect_nocancel () from /lib/libc.so.6
#4  0x002bfc5e in recv_rs_ra (sock=4, msg=0xbfbecdc4 "\206", addr=0xbfbed3a0, 
    pkt_info=0xbfbecdc0, hoplimit=0xbfbecdbc) at recv.c:46
#5  0x002bf087 in main (argc=3, argv=0xbfbed484) at radvd.c:341
(gdb)

Comment 7 Pete Zaitcev 2011-10-19 03:56:07 UTC
radvd-1.8.2-2.fc15 may be ok. I jerked the interface around, it survived.
Waiting for a reboot to confirm.

Comment 8 Fedora Admin XMLRPC Client 2012-03-29 14:21:46 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Petr Pisar 2012-03-29 15:06:42 UTC
I experienced this problem too (on Gentoo) and upgrade to 1.8.1 fixed it. I will dig sources to find a reference it's fixed indeed.

Comment 10 Petr Pisar 2012-03-29 17:03:07 UTC
The 1.7 code is:
    while (tm->next && tm->prev && tm->expires.tv_sec != LONG_MAX && check_time_diff(tm, tv))
    {
        tm->prev->next = tm->next;

Last line segfaults. It could segfault only if tm or rm->prev were NULL, but tm->prev is checked in the while condition and NULL tm would segfault in the condition. The code is called in SIGALRM handler. The tm is taken from a linked list. I guess there was a race between the condition and the check.

Studying radvd-1.8 shows the code has changed significantly. The handler as well as recv_rs_ra() have changed. Also radvd-1.8 uses NETLINK now in more cases.

I did some tests with version you reported and with latest F15 version and none of them crashed (I played with two pairs of veth devices).

I believe the crash is fixed in 1.8.2.

Regarding the premature exit if IgnoreIfMissing is off: I think you need enable this option or modify the systemd unit file on your own. Distribution cannot wait for all devices because they can appear and disappear dynamically. Even the IgnoreIfMissing has been default since radvd-1.8.