Bug 1166878 - unbound-1.5.0-1.fc22 crashing often
Summary: unbound-1.5.0-1.fc22 crashing often
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: unbound
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tomáš Hozza
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-21 20:42 UTC by Kevin Fenzi
Modified: 2014-12-20 08:36 UTC (History)
7 users (show)

Fixed In Version: unbound-1.5.1-2.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-04 14:53:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2014-11-21 20:42:12 UTC
Unbound 1.5.0-1.fc22 seems to be crashing a lot here. 

I can't seem to see any pattern to the crashes, and sadly, abrt doesn't seem to pick up on them. :( 


Nov 21 13:35:09 voldemort.scrye.com systemd[1]: unbound.service: main process exited, code=killed, status=11/SEGV
Nov 21 13:35:09 voldemort.scrye.com systemd[1]: Unit unbound.service entered failed state.
Nov 21 13:35:09 voldemort.scrye.com systemd[1]: unbound.service failed.
Nov 21 13:38:34 voldemort.scrye.com runuser[4855]: pam_unix(runuser:session): session opened for user unbound by (uid=0)
Nov 21 13:38:34 voldemort.scrye.com runuser[4855]: pam_unix(runuser:session): session closed for user unbound
Nov 21 13:38:34 voldemort.scrye.com unbound-checkconf[4858]: unbound-checkconf: no errors in /etc/unbound/unbound.conf
Nov 21 13:38:34 voldemort.scrye.com unbound[4861]: Nov 21 13:38:34 unbound[4861:0] warning: increased limit(open files) from 1024 to 8290
Nov 21 13:38:34 voldemort.scrye.com unbound[4861]: [4861:0] warning: did not exit gracefully last time (30859)
Nov 21 13:38:34 voldemort.scrye.com unbound[4861]: [4861:0] notice: init module 0: validator
Nov 21 13:38:34 voldemort.scrye.com unbound[4861]: [4861:0] notice: init module 1: iterator
Nov 21 13:38:34 voldemort.scrye.com unbound[4861]: [4861:0] info: start of service (unbound 1.5.0).

[334209.703923] unbound[26854]: segfault at 7f7ace385444 ip 00007f79ce3ddf23 sp 00007f79cacc3550 error 4 in unbound[7f79ce38b000+d0000]
[338071.162154] unbound[30866]: segfault at 7f3e4472d444 ip 00007f3d44785f23 sp 00007f3d4106b4b0 error 4 in unbound[7f3d44733000+d0000]

I can try and run it in gdb and see if I can get a stack trace...

Comment 1 Tomáš Hozza 2014-11-24 08:26:07 UTC
Hi.

Please install the debuginfo package to get some stack trace. There seems to be a race condition in the new arc4random fallback code. However I would like to check if this is really the cause.

Thanks!

Comment 2 Paul Wouters 2014-11-24 11:37:16 UTC
A patch or link to svn was posted on the unbound mailing list. Currently in transit but will look at building updated packages later today if someone else hasn't picked this up by then.

Comment 3 Tomáš Hozza 2014-11-24 11:45:29 UTC
(In reply to Paul Wouters from comment #2)
> A patch or link to svn was posted on the unbound mailing list. Currently in
> transit but will look at building updated packages later today if someone
> else hasn't picked this up by then.

That's where I got the information posed in comment #1. But I want to have a backtrace to be sure that it is really the issue!

Comment 4 Valdis Kletnieks 2014-11-24 20:17:00 UTC
Am seeing this in -2 as well.   dmesg reports:

Nov 23 18:58:17 turing-police kernel: [159160.906004] unbound[63601]: segfault at 7f71130fc444 ip 00007f7013154f23 sp 00007fffcc4c1af0 error 4 in unbound[7f7013102000+d0000]
Nov 24 01:54:11 turing-police kernel: [184148.239571] unbound[119936]: segfault at 7fad3cb59444 ip 00007fac3cbb1f23 sp 00007fac39497550 error 4 in unbound[7fac3cb5f000+d0000]
Nov 24 08:18:02 turing-police kernel: [207210.400281] unbound[170769]: segfault at 7f1d74d93444 ip 00007f1c74debf23 sp 00007f1c716d1650 error 4
Nov 24 08:18:02 turing-police kernel: [207210.400287] unbound[170767]: segfault at 7f1d74d93444 ip 00007f1c74debf23 sp 00007fff1e1e5e90 error 4 in unbound[7f1c74d99000+d0000] in unbound[7f1c74d99000+d0000]
Nov 24 11:12:12 turing-police kernel: [  265.725841] unbound[1362]: segfault at 7fb212aab444 ip 00007fb112b03f23 sp 00007fb10f3e9550 error 4
Nov 24 11:12:12 turing-police kernel: [  265.725848] unbound[1348]: segfault at 7fb212aab444 ip 00007fb112b03f23 sp 00007ffffff31da0 error 4 in unbound[7fb112ab1000+d0000]
Nov 24 11:16:04 turing-police kernel: [  497.657898] unbound[2334]: segfault at 7fbc1d2b5444 ip 00007fbb1d30df23 sp 00007fbb19bf3480 error 4 in unbound[7fbb1d2bb000+d0000]
Nov 24 11:16:04 turing-police kernel: [  497.657920] unbound[2330]: segfault at 7fbc1d2b5444 ip 00007fbb1d30df23 sp 00007fff6ea88ae0 error 4 in unbound[7fbb1d2bb000+d0000]
Nov 24 11:18:43 turing-police kernel: [  656.454591] unbound[2618]: segfault at 7f3eedf50444 ip 00007f3dedfa8f23 sp 00007f3dea88e480 error 4 in unbound[7f3dedf56000+d0000]
Nov 24 11:18:43 turing-police kernel: [  656.454668] unbound[2617]: segfault at 7f3eedf50444 ip 00007f3dedfa8f23 sp 00007fffe5449620 error 4 in unbound[7f3dedf56000+d0000]

The '444' and 'f23' being constant pretty much tells me there's exactly one crash involved, with ASLR in effect...

Comment 5 Valdis Kletnieks 2014-11-24 20:18:07 UTC
(In reply to Tomas Hozza from comment #3)
> (In reply to Paul Wouters from comment #2)
> > A patch or link to svn was posted on the unbound mailing list. Currently in
> > transit but will look at building updated packages later today if someone
> > else hasn't picked this up by then.
> 
> That's where I got the information posed in comment #1. But I want to have a
> backtrace to be sure that it is really the issue!

I'll install the .debuginfo and attach a gdb to it and see if it crashes again...

Comment 6 Valdis Kletnieks 2014-11-25 15:59:29 UTC
Caught it, finally....

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fbb293c3700 (LWP 1372)]
arc4random () at compat/arc4random.c:220
220             _rs_random_u32(&val);
(gdb) where
#0  arc4random () at compat/arc4random.c:220
#1  0x00007fbb2cb0eb05 in arc4random_uniform (upper_bound=upper_bound@entry=10)
    at compat/arc4random_uniform.c:51
#2  0x00007fbb2cb007fc in ub_random_max (x=10, state=<optimized out>)
    at util/random.c:110
#3  iter_ns_probability (m=10, n=2, rnd=<optimized out>)
    at iterator/iter_utils.c:443
#4  query_for_targets.part.7.lto_priv.627 (qstate=qstate@entry=0x7fbb2490cdc0, 
    iq=iq@entry=0x7fbb2490cff0, ie=ie@entry=0x7fbb2d9a27a0, id=id@entry=1, 
    maxtargets=maxtargets@entry=3, num=num@entry=0x7fbb293c2800)
    at iterator/iterator.c:1377
#5  0x00007fbb2cad6b4f in query_for_targets () at iterator/iterator.c:1351
#6  processQueryTargets (id=1, ie=0x7fbb2d9a27a0, iq=0x7fbb2490cff0, 
    qstate=0x7fbb2490cdc0) at iterator/iterator.c:1764
#7  iter_handle (qstate=qstate@entry=0x7fbb2490cdc0, 
    iq=iq@entry=0x7fbb2490cff0, ie=ie@entry=0x7fbb2d9a27a0, id=id@entry=1)
    at iterator/iterator.c:2708
#8  0x00007fbb2cad92fa in process_response (event=<optimized out>, 
    outbound=<optimized out>, id=<optimized out>, ie=<optimized out>, 
    iq=<optimized out>, qstate=<optimized out>) at iterator/iterator.c:2869
#9  iter_operate (qstate=<optimized out>, event=<optimized out>, 
    id=<optimized out>, outbound=<optimized out>) at iterator/iterator.c:2902
#10 0x00007fbb2cac9df1 in mesh_run (mesh=mesh@entry=0x7fbb245e6b90, 
---Type <return> to continue, or q <return> to quit---
    mstate=mstate@entry=0x7fbb2490cd70, ev=module_event_pass, 
    ev@entry=module_event_new, e=<optimized out>, e@entry=0x0)
    at services/mesh.c:1076
#11 0x00007fbb2caaf211 in mesh_new_client (qid=52086, rep=0x7fbb293c2be0, 
    edns=0x7fbb293c2a80, qflags=256, qinfo=0x7fbb293c2a90, mesh=0x7fbb245e6b90)
    at services/mesh.c:370
#12 worker_handle_request.part.17.lto_priv.656 (c=<optimized out>, 
    arg=0x7fbb2d9c88e0, repinfo=0x7fbb293c2be0) at daemon/worker.c:986
#13 0x00007fbb2cabf362 in comm_point_udp_callback (fd=5, 
    event=<optimized out>, arg=<optimized out>) at util/netevent.c:700
#14 0x00007fbb2c3be3cc in event_base_loop () from /lib64/libevent-2.0.so.5
#15 0x00007fbb2cabf4dc in comm_base_dispatch (b=<optimized out>)
    at util/netevent.c:305
#16 0x00007fbb2cad255c in worker_work (worker=0x7fbb2d9c88e0)
    at daemon/worker.c:1285
#17 thread_start (arg=0x7fbb2d9c88e0) at daemon/daemon.c:475
#18 0x00007fbb2b70f53a in start_thread (arg=0x7fbb293c3700)
    at pthread_create.c:312
#19 0x00007fbb2b4462fd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) quit


Yep, died in arc4random().  Hope that helps....

Comment 7 Kevin Fenzi 2014-11-25 17:30:01 UTC
Sorry for the delay from my side. I had to get things done, so I just set systemd to restart unbound when it crashed. 

I attached gdb, but forgot to disable systemd restarting, but I got: 

Program received signal SIGSEGV, Segmentation fault.
arc4random () at compat/arc4random.c:220
220             _rs_random_u32(&val);

so, it's definitely the same place here at least.

Comment 8 Tomáš Hozza 2014-11-26 12:13:02 UTC
Thanks for the confirmation. I'll backport the fix and push new builds soon.

Comment 9 Paul Wouters 2014-11-26 14:00:03 UTC
if you create new builds, please also address the "main" issue from bug https://bugzilla.redhat.com/show_bug.cgi?id=1167291

A simple test is to restart the libreswan service

Comment 10 Tomáš Hozza 2014-11-26 14:17:27 UTC
(In reply to Paul Wouters from comment #9)
> if you create new builds, please also address the "main" issue from bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1167291
> 
> A simple test is to restart the libreswan service

Too late. The build are already completed.
unbound-1.5.0-2.fc22
unbound-1.5.0-2.fc21

Could just libreswan rebuild against the new unbound help?

Comment 11 Fedora Update System 2014-11-26 14:20:21 UTC
unbound-1.5.0-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/FEDORA-2014-15408/unbound-1.5.0-1.fc21

Comment 12 Paul Wouters 2014-11-26 14:27:00 UTC
I'm not sure if rebuilding helps. The new arc4random wants to do some initialization for the randomness in the library.

I think this affects everything linked against libunbound

Comment 13 Fedora Update System 2014-11-27 19:26:35 UTC
Package unbound-1.5.0-2.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing unbound-1.5.0-2.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-15408/unbound-1.5.0-2.fc21
then log in and leave karma (feedback).

Comment 14 Valdis Kletnieks 2014-11-28 02:15:38 UTC
Grabbed -2 off koji, installed, 24 hours uptime now, no problems seen.  I left a note on the admin page, forgot to login first. :)

Comment 15 Fedora Update System 2014-11-28 15:15:32 UTC
unbound-1.5.0-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/unbound-1.5.0-3.fc21

Comment 16 Fedora Update System 2014-11-28 18:09:51 UTC
unbound-1.5.1-0.1.rc1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/unbound-1.5.1-0.1.rc1.fc21

Comment 17 Fedora Update System 2014-12-10 16:46:24 UTC
unbound-1.5.1-2.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/unbound-1.5.1-2.fc21

Comment 18 Fedora Update System 2014-12-20 08:36:43 UTC
unbound-1.5.1-2.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.