Bug 1175321

Summary: named is crashing in load_configuration due to race condition in isc__task_beginexclusive
Product: Red Hat Enterprise Linux 6 Reporter: Mohit Agrawal <moagrawa>
Component: bindAssignee: Tomáš Hozza 🤓 <thozza>
Status: CLOSED ERRATA QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: gnaik, hmatsumo, mmatsuya, ovasik, psklenar, pspacek, shane.seymour, thozza, yozone
Target Milestone: rcKeywords: OtherQA, Patch
Target Release: 6.7   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: bind-9.8.2-0.34.rc1.el6 Doc Type: Bug Fix
Doc Text:
Due to a race condition in the beginexclusive() function, the BIND DNS server (named) could terminate unexpectedly while loading configuration. To fix this bug, a patch has been applied, and the race condition no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 05:50:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1126841    
Attachments:
Description Flags
Possible patch
none
Patch for the issue none

Description Mohit Agrawal 2014-12-17 13:47:25 UTC
Description of problem:
named is crashing in load_configuration due to race condition in isc__task_beginexclusive

Version-Release number of selected component (if applicable):
bind-9.8.2-0.10.rc1.el6.x86_64

How reproducible:
No Idea

Steps to Reproduce:
1.
2.
3.

Actual results:

It should not be crashed.
Expected results:

named should not be crashed
Additional info:

Comment 1 Mohit Agrawal 2014-12-17 13:50:26 UTC
As per bt pattern it seems named is crashing in isc__task_beginexclusive and the thread 5 is also waiting in the same function so it is returning bad result in thread 1 so it is crashing.

(gdb) bt
#0  0x00007f01bcc858a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f01bcc87085 in abort () at abort.c:92
#2  0x00007f01bf44cb6c in library_fatal_error (file=0x7f01bf4945fc "server.c", line=<value optimized out>, format=0x7f01bde383e2 "RUNTIME_CHECK(%s) %s", args=0x7f01baac3070) at ./main.c:260
#3  0x00007f01bde06174 in isc_error_fatal (file=<value optimized out>, line=<value optimized out>, format=<value optimized out>) at error.c:74
#4  0x00007f01bde061d4 in isc_error_runtimecheck (file=0x7f01bf4945fc "server.c", line=4493, expression=0x7f01bf49a41a "result == 0") at error.c:81
#5  0x00007f01bf46d5c3 in load_configuration (filename=0x7f01baac32b0 "\360#H\214\001\177", server=0x7f01bf3de010, first_time=isc_boolean_false) at server.c:4493
#6  0x00007f01bf46f8c6 in loadconfig (server=0x7f01bf3de010) at server.c:5805
#7  0x00007f01bf46fffe in reconfig (server=<value optimized out>, args=<value optimized out>) at server.c:5845
#8  ns_server_reconfigcommand (server=<value optimized out>, args=<value optimized out>) at server.c:6067
#9  0x00007f01bf445c67 in ns_control_docommand (message=<value optimized out>, text=0x7f01baac3880) at control.c:104
#10 0x00007f01bf449346 in control_recvmessage (task=0x7f01bf3ea010, event=<value optimized out>) at controlconf.c:458
#11 0x00007f01bde222f8 in dispatch (uap=0x7f01bf3d5010) at task.c:1012
#12 run (uap=0x7f01bf3d5010) at task.c:1157
#13 0x00007f01bd7d7851 in start_thread (arg=0x7f01baac4700) at pthread_create.c:301
#14 0x00007f01bcd3a67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 5
#5  0x00007f01bf46d5c3 in load_configuration (filename=0x7f01baac32b0 "\360#H\214\001\177", server=0x7f01bf3de010, first_time=isc_boolean_false) at server.c:4493
4493			RUNTIME_CHECK(result == ISC_R_SUCCESS);
(gdb) l
4488		}
4489	
4490		/* Ensure exclusive access to configuration data. */
4491		if (!exclusive) {
4492			result = isc_task_beginexclusive(server->task);
4493			RUNTIME_CHECK(result == ISC_R_SUCCESS);
4494			exclusive = ISC_TRUE;
4495		}
4496	
4497		/*
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f01bb4c5700 (LWP 16711))]#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f01bde219bc in isc__task_beginexclusive (task0=<value optimized out>) at task.c:1456
#2  0x00007f01bec90c16 in grow_entries (task=0x7f018eedf7a0, ev=0x0) at adb.c:520
#3  0x00007f01bde222f8 in dispatch (uap=0x7f01bf3d5010) at task.c:1012
#4  run (uap=0x7f01bf3d5010) at task.c:1157
#5  0x00007f01bd7d7851 in start_thread (arg=0x7f01bb4c5700) at pthread_create.c:301
#6  0x00007f01bcd3a67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 1
#1  0x00007f01bde219bc in isc__task_beginexclusive (task0=<value optimized out>) at task.c:1456
1456			WAIT(&manager->exclusive_granted, &manager->lock);
(gdb) p manager
$1 = (isc__taskmgr_t *) 0x7f01bf3d5010
(gdb) p *manager
$2 = {common = {impmagic = 1414744909, magic = 1098149223, methods = 0x7f01be04b480}, mctx = 0x7f01c0b992d0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, 
      __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0}, workers = 2, threads = 0x7f01bf3d2078, 
  default_quantum = 5, tasks = {head = 0x7f01bf3ea010, tail = 0x7f01972c4b50}, ready_tasks = {head = 0x7f01932435d0, tail = 0x7f018eb90f90}, work_available = {__data = {__lock = 0, 
      __futex = 18822202, __total_seq = 9411101, __wakeup_seq = 9411101, __woken_seq = 9411101, __mutex = 0x7f01bf3d5028, __nwaiters = 0, __broadcast_seq = 222}, 
    __size = "\000\000\000\000:4\037\001\035\232\217\000\000\000\000\000\035\232\217\000\000\000\000\000\035\232\217\000\000\000\000\000(P=\277\001\177\000\000\000\000\000\000\336\000\000", 
    __align = 80840742028705792}, exclusive_granted = {__data = {__lock = 0, __futex = 257, __total_seq = 129, __wakeup_seq = 128, __woken_seq = 128, __mutex = 0x7f01bf3d5028, __nwaiters = 2, 
      __broadcast_seq = 0}, 
    __size = "\000\000\000\000\001\001\000\000\201\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000(P=\277\001\177\000\000\002\000\000\000\000\000\000", 
    __align = 1103806595072}, tasks_running = 2, exclusive_requested = isc_boolean_true, exiting = isc_boolean_false}
(gdb)

Comment 3 Tomáš Hozza 🤓 2014-12-17 14:07:55 UTC
Result from the investigation:

The beginexclusive function should be called by a single task server-wide, but from the backtrace it is clear that it was exedcuted in two different threads. One thread is inside the function and the second called it and returned with different return value than SUCCESS. From the beginexclusive function code is clear that it can return only different return value than SUCCESS (LOCKBUSY) only when some other task already called the function.

Comment 6 Tomáš Hozza 🤓 2014-12-18 18:20:41 UTC
Created attachment 970700 [details]
Possible patch

Comment 8 Tomáš Hozza 🤓 2015-01-28 11:51:10 UTC
Created attachment 985105 [details]
Patch for the issue

New Patch with fixed bug, that I found during the backport.

Reported also upstream:
[ISC-Bugs #38470] Bug in IF condition in lib/dns/adb.c:new_adbentry()

Comment 24 errata-xmlrpc 2015-07-22 05:50:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1250.html