Bug 1175321 - named is crashing in load_configuration due to race condition in isc__task_beginexclusive
Summary: named is crashing in load_configuration due to race condition in isc__task_be...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: bind
Version: 6.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: 6.7
Assignee: Tomáš Hozza 🤓
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks: 1126841
TreeView+ depends on / blocked
 
Reported: 2014-12-17 13:47 UTC by Mohit Agrawal
Modified: 2019-09-12 08:08 UTC (History)
9 users (show)

Fixed In Version: bind-9.8.2-0.34.rc1.el6
Doc Type: Bug Fix
Doc Text:
Due to a race condition in the beginexclusive() function, the BIND DNS server (named) could terminate unexpectedly while loading configuration. To fix this bug, a patch has been applied, and the race condition no longer occurs.
Clone Of:
Environment:
Last Closed: 2015-07-22 05:50:21 UTC


Attachments (Terms of Use)
Possible patch (11.19 KB, patch)
2014-12-18 18:20 UTC, Tomáš Hozza 🤓
no flags Details | Diff
Patch for the issue (10.64 KB, patch)
2015-01-28 11:51 UTC, Tomáš Hozza 🤓
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1250 normal SHIPPED_LIVE bind bug fix and enhancement update 2015-07-20 17:50:10 UTC

Description Mohit Agrawal 2014-12-17 13:47:25 UTC
Description of problem:
named is crashing in load_configuration due to race condition in isc__task_beginexclusive

Version-Release number of selected component (if applicable):
bind-9.8.2-0.10.rc1.el6.x86_64

How reproducible:
No Idea

Steps to Reproduce:
1.
2.
3.

Actual results:

It should not be crashed.
Expected results:

named should not be crashed
Additional info:

Comment 1 Mohit Agrawal 2014-12-17 13:50:26 UTC
As per bt pattern it seems named is crashing in isc__task_beginexclusive and the thread 5 is also waiting in the same function so it is returning bad result in thread 1 so it is crashing.

(gdb) bt
#0  0x00007f01bcc858a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f01bcc87085 in abort () at abort.c:92
#2  0x00007f01bf44cb6c in library_fatal_error (file=0x7f01bf4945fc "server.c", line=<value optimized out>, format=0x7f01bde383e2 "RUNTIME_CHECK(%s) %s", args=0x7f01baac3070) at ./main.c:260
#3  0x00007f01bde06174 in isc_error_fatal (file=<value optimized out>, line=<value optimized out>, format=<value optimized out>) at error.c:74
#4  0x00007f01bde061d4 in isc_error_runtimecheck (file=0x7f01bf4945fc "server.c", line=4493, expression=0x7f01bf49a41a "result == 0") at error.c:81
#5  0x00007f01bf46d5c3 in load_configuration (filename=0x7f01baac32b0 "\360#H\214\001\177", server=0x7f01bf3de010, first_time=isc_boolean_false) at server.c:4493
#6  0x00007f01bf46f8c6 in loadconfig (server=0x7f01bf3de010) at server.c:5805
#7  0x00007f01bf46fffe in reconfig (server=<value optimized out>, args=<value optimized out>) at server.c:5845
#8  ns_server_reconfigcommand (server=<value optimized out>, args=<value optimized out>) at server.c:6067
#9  0x00007f01bf445c67 in ns_control_docommand (message=<value optimized out>, text=0x7f01baac3880) at control.c:104
#10 0x00007f01bf449346 in control_recvmessage (task=0x7f01bf3ea010, event=<value optimized out>) at controlconf.c:458
#11 0x00007f01bde222f8 in dispatch (uap=0x7f01bf3d5010) at task.c:1012
#12 run (uap=0x7f01bf3d5010) at task.c:1157
#13 0x00007f01bd7d7851 in start_thread (arg=0x7f01baac4700) at pthread_create.c:301
#14 0x00007f01bcd3a67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 5
#5  0x00007f01bf46d5c3 in load_configuration (filename=0x7f01baac32b0 "\360#H\214\001\177", server=0x7f01bf3de010, first_time=isc_boolean_false) at server.c:4493
4493			RUNTIME_CHECK(result == ISC_R_SUCCESS);
(gdb) l
4488		}
4489	
4490		/* Ensure exclusive access to configuration data. */
4491		if (!exclusive) {
4492			result = isc_task_beginexclusive(server->task);
4493			RUNTIME_CHECK(result == ISC_R_SUCCESS);
4494			exclusive = ISC_TRUE;
4495		}
4496	
4497		/*
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f01bb4c5700 (LWP 16711))]#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f01bde219bc in isc__task_beginexclusive (task0=<value optimized out>) at task.c:1456
#2  0x00007f01bec90c16 in grow_entries (task=0x7f018eedf7a0, ev=0x0) at adb.c:520
#3  0x00007f01bde222f8 in dispatch (uap=0x7f01bf3d5010) at task.c:1012
#4  run (uap=0x7f01bf3d5010) at task.c:1157
#5  0x00007f01bd7d7851 in start_thread (arg=0x7f01bb4c5700) at pthread_create.c:301
#6  0x00007f01bcd3a67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) f 1
#1  0x00007f01bde219bc in isc__task_beginexclusive (task0=<value optimized out>) at task.c:1456
1456			WAIT(&manager->exclusive_granted, &manager->lock);
(gdb) p manager
$1 = (isc__taskmgr_t *) 0x7f01bf3d5010
(gdb) p *manager
$2 = {common = {impmagic = 1414744909, magic = 1098149223, methods = 0x7f01be04b480}, mctx = 0x7f01c0b992d0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, 
      __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0}, workers = 2, threads = 0x7f01bf3d2078, 
  default_quantum = 5, tasks = {head = 0x7f01bf3ea010, tail = 0x7f01972c4b50}, ready_tasks = {head = 0x7f01932435d0, tail = 0x7f018eb90f90}, work_available = {__data = {__lock = 0, 
      __futex = 18822202, __total_seq = 9411101, __wakeup_seq = 9411101, __woken_seq = 9411101, __mutex = 0x7f01bf3d5028, __nwaiters = 0, __broadcast_seq = 222}, 
    __size = "\000\000\000\000:4\037\001\035\232\217\000\000\000\000\000\035\232\217\000\000\000\000\000\035\232\217\000\000\000\000\000(P=\277\001\177\000\000\000\000\000\000\336\000\000", 
    __align = 80840742028705792}, exclusive_granted = {__data = {__lock = 0, __futex = 257, __total_seq = 129, __wakeup_seq = 128, __woken_seq = 128, __mutex = 0x7f01bf3d5028, __nwaiters = 2, 
      __broadcast_seq = 0}, 
    __size = "\000\000\000\000\001\001\000\000\201\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000(P=\277\001\177\000\000\002\000\000\000\000\000\000", 
    __align = 1103806595072}, tasks_running = 2, exclusive_requested = isc_boolean_true, exiting = isc_boolean_false}
(gdb)

Comment 3 Tomáš Hozza 🤓 2014-12-17 14:07:55 UTC
Result from the investigation:

The beginexclusive function should be called by a single task server-wide, but from the backtrace it is clear that it was exedcuted in two different threads. One thread is inside the function and the second called it and returned with different return value than SUCCESS. From the beginexclusive function code is clear that it can return only different return value than SUCCESS (LOCKBUSY) only when some other task already called the function.

Comment 6 Tomáš Hozza 🤓 2014-12-18 18:20:41 UTC
Created attachment 970700 [details]
Possible patch

Comment 8 Tomáš Hozza 🤓 2015-01-28 11:51:10 UTC
Created attachment 985105 [details]
Patch for the issue

New Patch with fixed bug, that I found during the backport.

Reported also upstream:
[ISC-Bugs #38470] Bug in IF condition in lib/dns/adb.c:new_adbentry()

Comment 24 errata-xmlrpc 2015-07-22 05:50:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1250.html


Note You need to log in before you can comment on or make changes to this bug.