Bug 1058698
Summary: | radvd segfaulting at clear_timer | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Michal Bruncko <michal.bruncko> | ||||
Component: | radvd | Assignee: | Pavel Šimerda (pavlix) <psimerda> | ||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | ||||
Severity: | high | Docs Contact: | Filip Hanzelka <fhanzelk> | ||||
Priority: | unspecified | ||||||
Version: | 6.5 | CC: | herrold, psimerda, psklenar, thozza | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Known Issue | |||||
Doc Text: |
The *radvd* occasionally terminates unexpectedly due to a race condition
In the *Router Advertisement Daemon* (radvd), there is a race condition in *radvd* timer handling. Consequently, the *radvd* occasionally terminates unexpectedly.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-08 15:08:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1159820, 1356054, 1359261 | ||||||
Attachments: |
|
Description
Michal Bruncko
2014-01-28 11:41:12 UTC
Created attachment 856539 [details]
icmpv6 radvd, rasol capture when radvd segaults
not sure if this helps, but I tried to catch responsible communication when the radvd segfaults.
Jan 28 12:53:52 gateway1 kernel: : radvd[21090]: segfault at 20 ip 00007f1546ae15b5 sp 00007fff16884c70 error 6 in radvd[7f1546add000+16000]
you can find corresponding packet in capture (to which radvd tried to respond).
just fyi: tried to run radvd in selinux permissive mode - with same results. Looks similar to: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=611297 yes it looks like that. it is possible to build new version from upstream? currently radvd is unusable for us completely as it segfaulting regularly. Thank you for taking the time to report this issue to us. We appreciate the feedback and use reports such as this one to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto I understand such reply template in cases where I am requesting new feature or fix minor (with specific circumstances) bug, but not for segmentation fault issues with 100% reproducibility (and where the patch is already available). All my other segfault reports to various RHEL packages were handled without need of such responses formulas.
I have not asked to provide me any ETA for fixing this issue, I have just asked if this will be even fixed in current RHEL version, nothing more, nothing less.
This bug reporting system is great in way that I can provide feedback to package maintainers about their packages and help to improve this distribution at all - as I did it many times before.
> Thank you for taking the time to report this issue to us. We appreciate the
> feedback and use reports such as this one to guide our efforts at improving our
> products.
Yes, maybe next time, I will simply decide if I report any further issue with this specific package. I have built radvd by myself.
(In reply to Michal Bruncko from comment #7) > I understand such reply template in cases where I am requesting new feature > or fix minor (with specific circumstances) bug, but not for segmentation > fault issues with 100% reproducibility (and where the patch is already > available). You requested information regarding RHEL updates planning with respect to this specific bug and such requests are handled by Red Hat support. > All my other segfault reports to various RHEL packages were > handled without need of such responses formulas. > I have not asked to provide me any ETA for fixing this issue, I have just > asked if this will be even fixed in current RHEL version, nothing more, > nothing less. The purpose of the template is to direct any interested customers to the right person to discuss updates as quickly as possible. If it doesn't apply to you, you can ignore it. > This bug reporting system is great in way that I can provide feedback to > package maintainers about their packages and help to improve this > distribution at all - as I did it many times before. I, as a maintainer, appreciate the report. > Yes, maybe next time, I will simply decide if I report any further issue > with this specific package. While it is a valid decision, please note that I'm giving you as much feedback as possible, including the link to a similar bug report in another distribution for reference. > I have built radvd by myself. With a patch from the Debian bug link or a newer upstream version of radvd, if I may ask? > While it is a valid decision, please note that I'm giving you as much feedback as possible, including the link to a similar bug report in another distribution for reference. I appreciate this as well. The first step when I found any such failure is to report it asap to maintainer. I have not even looked on google in this case to check for possible resolutions. But on other site I tried to collect all information as I was possible to get about the failure itself. The functionality of radvd component is not so critical to us as the IPv6 stack is only supplementary (but desirable) to existing - parallel - IPv4 infrastructure. > With a patch from the Debian bug link or a newer upstream version of radvd, if I may ask? It was that patch provided on debian bugreport [1] - as I wanted to stay most compatible with existing distro packages and versions. I rebuilt it yesterday, but today I found that the radvd segfaulted again - but in the external component: radvd[19295]: segfault at 0 ip (null) sp 00007fffab242598 error 4 in libnss_files-2.12.so[7f7584725000+c000] it looks to me as the giving NULL to the tm->prev and tm->next is definitely not good idea (as it was done by that patch) - but of course this is only my layman opinion as I don't know purpose of tm->prev/next. now I am trying to trace back this sigsegv. but based on fix on debian bugreport, they decided to move to the new upstream version instead. maybe tomorrow morning I will have outputs from traceback as there are no people in network in meantime. for me the issue started when I moved (with clean install) from physical machine (i386 arch) to virtual machine (and x86_64 arch) on top of XenServer virtualization host (the system version remains same). [1] - http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=timer.c.patch;att=1;bug=611297 Got it: # gdb --args /usr/sbin/radvd -u radvd -d 5 -s -m stderr GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/radvd...Reading symbols from /usr/lib/debug/usr/sbin/radvd.debug...done. done. (gdb) run Starting program: /usr/sbin/radvd -u radvd -d 5 -s -m stderr [Feb 05 00:29:58] radvd: version 1.6 started [Feb 05 00:29:58] radvd: mtu for eth0 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth0 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth0 is 48 [Feb 05 00:29:58] radvd: prefix length for eth0 is 64 [Feb 05 00:29:58] radvd: interface definition for eth0 is ok [Feb 05 00:29:58] radvd: mtu for eth2.10 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth2.10 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth2.10 is 48 [Feb 05 00:29:58] radvd: prefix length for eth2.10 is 64 [Feb 05 00:29:58] radvd: interface definition for eth2.10 is ok [Feb 05 00:29:58] radvd: mtu for eth2.11 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth2.11 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth2.11 is 48 [Feb 05 00:29:58] radvd: prefix length for eth2.11 is 64 [Feb 05 00:29:58] radvd: interface definition for eth2.11 is ok [Feb 05 00:29:58] radvd: mtu for eth2.12 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth2.12 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth2.12 is 48 [Feb 05 00:29:58] radvd: prefix length for eth2.12 is 64 [Feb 05 00:29:58] radvd: interface definition for eth2.12 is ok [Feb 05 00:29:58] radvd: mtu for eth3.100 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.100 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.100 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.100 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.100 is ok [Feb 05 00:29:58] radvd: mtu for eth3.101 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.101 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.101 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.101 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.101 is ok [Feb 05 00:29:58] radvd: mtu for eth3.102 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.102 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.102 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.102 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.102 is ok [Feb 05 00:29:58] radvd: mtu for eth3.103 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.103 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.103 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.103 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.103 is ok [Feb 05 00:29:58] radvd: mtu for eth3.104 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.104 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.104 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.104 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.104 is ok [Feb 05 00:29:58] radvd: mtu for eth3.120 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth3.120 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth3.120 is 48 [Feb 05 00:29:58] radvd: prefix length for eth3.120 is 64 [Feb 05 00:29:58] radvd: interface definition for eth3.120 is ok [Feb 05 00:29:58] radvd: mtu for eth4 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth4 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth4 is 48 [Feb 05 00:29:58] radvd: prefix length for eth4 is 64 [Feb 05 00:29:58] radvd: interface definition for eth4 is ok [Feb 05 00:29:58] radvd: mtu for eth5 is 1500 [Feb 05 00:29:58] radvd: hardware type for eth5 is 1 [Feb 05 00:29:58] radvd: link layer token length for eth5 is 48 [Feb 05 00:29:58] radvd: prefix length for eth5 is 64 [Feb 05 00:29:58] radvd: interface definition for eth5 is ok [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth5: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth4: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.120: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.104: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.103: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.102: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.101: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth3.100: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth2.12: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth2.11: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth2.10: Permission denied [Feb 05 00:29:58] radvd: failed to set CurHopLimit (64) for eth0: Permission denied ^C Program received signal SIGINT, Interrupt. 0x00007ffff7b145c3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) b abort Breakpoint 1 at 0x7ffff7a66f90: file abort.c, line 54. (gdb) cont Continuing. [Feb 05 00:31:16] radvd: recvmsg len=64 [Feb 05 00:31:16] radvd: if_index 8 [Feb 05 00:31:16] radvd: received packet from unknown interface: 8 [Feb 05 00:32:56] radvd: recvmsg len=16 [Feb 05 00:32:56] radvd: if_index 15 [Feb 05 00:32:56] radvd: found Interface: eth3.100 [Feb 05 00:32:56] radvd: random mdelay for eth3.100: 95.81 [Feb 05 00:32:56] radvd: calling schedule_timer from clear_timer context [Feb 05 00:32:56] radvd: sending RA on eth3.100 [Feb 05 00:32:56] radvd: setting timer: 81.95 secs [Feb 05 00:32:56] radvd: setting timer: 81 secs 952257 usecs [Feb 05 00:32:56] radvd: calling schedule_timer from set_timer context [Feb 05 00:32:56] radvd: calling alarm: 81 secs, 952211 usecs [Feb 05 00:33:47] radvd: recvmsg len=64 [Feb 05 00:33:47] radvd: if_index 8 [Feb 05 00:33:47] radvd: received packet from unknown interface: 8 [Feb 05 00:34:18] radvd: check_time_diff, difference: -1 sec + 998010 usec Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007ffff7fed8bb in alarm_handler (sig=<value optimized out>) at timer.c:164 #2 <signal handler called> #3 0x00007ffff7b145c3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #4 0x00007ffff7febcb5 in recv_rs_ra (sock=7, msg=0x7fffffffdd40 "\206", addr=0x7fffffffe320, pkt_info=0x7fffffffdd30, hoplimit=0x7fffffffdd38) at recv.c:46 #5 0x00007ffff7fed1eb in main (argc=<value optimized out>, argv=<value optimized out>) at radvd.c:313 So is it because of a specific incoming packet or a specific condition not directly related to the packed? I have no idea :) The first segfault was caused by packet that I've already attached here. I think that same packet is source of the second sigsegv. it looks like the situation is same, because the occurrence of sigsegv is same like before applying patch [1] - it just segfault in another part of code processing. maybe using tcpreplay I can test if this is caused by every same packet or not.. just regarding this case. I moved to latest upstream version 1.9.8 two weeks ago and no any problems till now. no any segfault happen and building was successfully done using spec file for distribution version of radvd (only change is that libdaemon-devel is new required build dependency). Thanks for the information. So I analyzed the code carefully and radvd 1.6 apparently uses SIGALRM to handle the timer events and manipulates internal structures from the signal handler which results in race conditions. Since at least version 1.8 radvd uses `ppoll()` to handle the timer in a safer way. |