Bug 1787426

Summary: bind-9.11.4-9.P2.el7.ppc64 SIGSEGV Crash
Product: Red Hat Enterprise Linux 7 Reporter: Anthony Zone <azone>
Component: bindAssignee: Petr Menšík <pemensik>
Status: CLOSED DUPLICATE QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.7CC: mjtarsel, mlichvar, mtarsel, thozza
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-27 14:52:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anthony Zone 2020-01-02 19:53:52 UTC
Description of problem:
Customer's system is running bind-9.11.4-9.P2.el7.ppc64 randomly experiences a SIGSEGV.  This is not seen when they roll back to bind-9.9.4-74.el7_6.1.ppc64

Version-Release number of selected component (if applicable):
bind-9.11.4-9.P2.el7.ppc64

How reproducible:

Customer can't reproduce on the fly but named appears to run for a day or two and then segfaults.


Steps to Reproduce:
1. Install bind-9.11.4-9.P2.el7.ppc64
2. unknown
3. Coredump

Actual results:

bind-9.11.4-9.P2.el7.ppc64 coredumps with signal 11 after unknown incident.

Expected results:

Continues to run and doesn't crash.

Additional info:

Looking at the core file we see a null pointer reference:

(gdb) bt
#0  ttl_sooner (v1=0x0, v2=0x3fff219f0280) at ../../../lib/dns/rbtdb.c:1127
#1  0x00003fff78cf15bc in isc_heap_delete (heap=0x3fff6c501278, idx=<optimized out>) at ../../../lib/isc/heap.c:233
#2  0x00003fff791a75e8 in free_rdataset (rdataset=0x3fff219f0280, mctx=<optimized out>, rbtdb=0x3fff6c554010)
    at ../../../lib/dns/rbtdb.c:1721
#3  clean_stale_headers (top=0x3fff421834f0, mctx=<optimized out>, rbtdb=0x3fff6c554010) at ../../../lib/dns/rbtdb.c:1805
#4  clean_cache_node (node=0x3fff6c582780, rbtdb=0x3fff6c554010) at ../../../lib/dns/rbtdb.c:1822
#5  decrement_reference (rbtdb=rbtdb@entry=0x3fff6c554010, node=node@entry=0x3fff6c582780, least_serial=least_serial@entry=0, 
    nlock=nlock@entry=isc_rwlocktype_read, tlock=tlock@entry=isc_rwlocktype_none, pruning=pruning@entry=isc_boolean_false)
    at ../../../lib/dns/rbtdb.c:2254
#6  0x00003fff791a9cc0 in detachnode (db=0x3fff6c554010, targetp=targetp@entry=0x3fff70f0e020) at ../../../lib/dns/rbtdb.c:5523
#7  0x00003fff791a9f6c in rdataset_disassociate (rdataset=<optimized out>) at ../../../lib/dns/rbtdb.c:8783
#8  0x00003fff792173d0 in dns_rdataset_disassociate (rdataset=<optimized out>) at ../../../lib/dns/rdataset.c:116
#9  0x00003fff79116980 in free_adbfetch (adb=0x3fff6c280010, fetch=<synthetic pointer>) at ../../../lib/dns/adb.c:1963
#10 fetch_callback (task=<optimized out>, ev=0x3fff373c70a0) at ../../../lib/dns/adb.c:3994
#11 0x00003fff78d18304 in dispatch (manager=0x3fff77ff7010) at ../../../lib/isc/task.c:1141
#12 run (uap=0x3fff77ff7010) at ../../../lib/isc/task.c:1313
#13 0x00003fff7894cafc in start_thread (arg=0x3fff70f0f0b0) at pthread_create.c:309
#14 0x00003fff78436f4c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104

(gdb) f 0
#0  ttl_sooner (v1=0x0, v2=0x3fff219f0280) at ../../../lib/dns/rbtdb.c:1127
1127            return (ISC_TF(h1->rdh_ttl < h2->rdh_ttl));
(gdb) p h1
$3 = (rdatasetheader_t *) 0x0
(gdb) p h2
$4 = (rdatasetheader_t *) 0x3fff219f0280

Comment 3 Petr Menšík 2020-01-09 19:50:44 UTC
It seems this crash matches recently fixed upstream issue, solved by merge request [1]. We were unable to figure out why only ppc64le platform seems to be affected by those issues, but very similar crashes were noticed in RHEL 8, tracked on bug #1740511.

1. https://gitlab.isc.org/isc-projects/bind9/merge_requests/2703

Comment 4 Miroslav Lichvar 2020-01-27 12:15:43 UTC
This does look like a duplicate of bug #1779589 (and RHEL8 bug #1740511).

There is a potential fix that modifies the memory order of some atomic operations. Could you please test the packages from the following build?

http://people.redhat.com/~mlichvar/tmp/bind-1779589/

Comment 6 Tomáš Hozza 2020-02-27 14:52:13 UTC
We believe that this bug is a duplicate of Bug #1779589. However since we do not have any reproducer, we can not be 100% sure. Please reopen if resolving Bug #1779589 won't have any effect.

*** This bug has been marked as a duplicate of bug 1779589 ***

Comment 7 Red Hat Bugzilla 2023-09-14 05:49:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days