Bug 878139

Summary: [abrt] bind-utils-9.9.2-2.fc17: next_origin: Process /usr/bin/nslookup was killed by signal 11 (SIGSEGV)
Product: [Fedora] Fedora Reporter: Anatolii Vorona <vorona.tolik>
Component: bindAssignee: Tomáš Hozza <thozza>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: bhubbard, ovasik, thenscheid, thozza, ville.skytta
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:06e62dadd1325d9a82c7515b3bc8097c73697c0b
Fixed In Version: dhcp-4.2.5-2.fc17 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-18 01:29:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: core_backtrace
none
File: environ
none
File: backtrace
none
File: limits
none
File: cgroup
none
File: smolt_data
none
File: executable
none
File: maps
none
File: dso_list
none
File: proc_pid_status
none
File: var_log_messages
none
File: open_fds none

Description Anatolii Vorona 2012-11-19 17:53:29 UTC
Description of problem:

#!/bin/bash
alphabet="d s c r w k l m"
dzone="7.net"
for x in $alphabet
        do
	for y in $alphabet
                do
                for z in $alphabet
                        do
                        echo "$x$y$z$dzone $(nslookup $x$y$z$dzone | grep '*\|Address: ' ) "
                        done
                done
        done


Version-Release number of selected component:
bind-utils-9.9.2-2.fc17

Additional info:
libreport version: 2.0.18
abrt_version:   2.0.18
backtrace_rating: 4
cmdline:        nslookup ckk7.net
crash_function: next_origin
kernel:         3.6.6-1.fc17.x86_64

truncated backtrace:
:Thread no. 1 (4 frames)
: #0 next_origin at dighost.c:1914
: #1 connect_timeout at dighost.c:2712
: #2 dispatch at task.c:1116
: #3 run at task.c:1286

Comment 1 Anatolii Vorona 2012-11-19 17:53:32 UTC
Created attachment 647905 [details]
File: core_backtrace

Comment 2 Anatolii Vorona 2012-11-19 17:53:34 UTC
Created attachment 647906 [details]
File: environ

Comment 3 Anatolii Vorona 2012-11-19 17:53:36 UTC
Created attachment 647907 [details]
File: backtrace

Comment 4 Anatolii Vorona 2012-11-19 17:53:38 UTC
Created attachment 647908 [details]
File: limits

Comment 5 Anatolii Vorona 2012-11-19 17:53:40 UTC
Created attachment 647909 [details]
File: cgroup

Comment 6 Anatolii Vorona 2012-11-19 17:53:42 UTC
Created attachment 647910 [details]
File: smolt_data

Comment 7 Anatolii Vorona 2012-11-19 17:53:44 UTC
Created attachment 647911 [details]
File: executable

Comment 8 Anatolii Vorona 2012-11-19 17:53:47 UTC
Created attachment 647912 [details]
File: maps

Comment 9 Anatolii Vorona 2012-11-19 17:53:49 UTC
Created attachment 647913 [details]
File: dso_list

Comment 10 Anatolii Vorona 2012-11-19 17:53:51 UTC
Created attachment 647914 [details]
File: proc_pid_status

Comment 11 Anatolii Vorona 2012-11-19 17:53:53 UTC
Created attachment 647915 [details]
File: var_log_messages

Comment 12 Anatolii Vorona 2012-11-19 17:53:55 UTC
Created attachment 647916 [details]
File: open_fds

Comment 13 Fedora Admin XMLRPC Client 2013-04-25 11:38:04 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 14 Tomáš Hozza 2013-05-15 18:56:45 UTC
*** Bug 919637 has been marked as a duplicate of this bug. ***

Comment 15 Tomáš Hozza 2013-05-15 19:01:34 UTC
*** Bug 919710 has been marked as a duplicate of this bug. ***

Comment 16 Tomáš Hozza 2013-05-17 07:06:51 UTC
The issue is caused by a mistake in the patch added some time ago [1].

I appears that timing is critical for this issue to occur. host/nslookup sends
a UDP DNS QUERY and starts a timer with timeout. The timer handler is
"connect_timeout()". When answer is received "recv_done()" is called in which
in the end "clear_query(query)" is called. In some circumstances the "query"
passed is the lookup->current_query in which case the lookup->current_query
freed and set to NULL. Well and now it gets interesting when the timeout runs
out and the "connect_timeout()" is called.

<snip from connect_timeout()>
...
l = event->ev_arg;
query = l->current_query;  /* this is NULL */
...
<snip>
...
} else {
		fputs(l->cmdline, stdout);
		if (!next_origin(query))) {    /* <- query is NULL */
			printf(";; connection timed out; no servers could be "
			       "reached\n");
		} else {
			printf(";; connection timed out; trying next "
			       "origin\n");
		}
...

But there are situations when timeout handler is called before the current_query
is freed and set to NULL and then it works.

The lookup structure is protected by mutex. So it looks that the issue depends on
how the system schedules threads and which locks the lookup structure first.

But this is expected behaviour (I think) since the "connection_timeout()"
checks if query (current_query) is NULL. And later on when retrying to send
queries once more ISC_LIST_HEAD(l->q) is used instead of the current_query.
l->q is a list of queries and when it's empty, the whole lookup structure
is destroyed and also a timer if there is any. So there should not be
a situation when timeout handler is called and the queries list is empty.

From what I tested, using "ISC_LIST_HEAD(l->q)" instead of "query" when calling
next_origin() works well. It is also good to mention that for the next_origin()
function it is irrelevant with which query it is called. The parameter is used
only to get to the "parent" lookup structure pointer which is the same for all
queries in the list.

[1] http://lists.fedoraproject.org/pipermail/scm-commits/2011-October/677202.html

Comment 17 Tomáš Hozza 2013-05-17 08:20:52 UTC
Fixed in:
bind-9.9.3-0.7.rc2.fc20
bind-9.9.3-0.7.rc2.fc19
bind-9.9.2-12.P2.fc18
bind-9.9.2-8.P2.fc17

Comment 18 Ville Skyttä 2013-05-22 08:22:32 UTC
(In reply to Tomas Hozza from comment #17)
> bind-9.9.2-12.P2.fc18

It seems that at least for this, only a koji build exists but no update has been submitted, is that on purpose?

Comment 19 Tomáš Hozza 2013-05-22 09:52:38 UTC
(In reply to Ville Skyttä from comment #18)
> (In reply to Tomas Hozza from comment #17)
> > bind-9.9.2-12.P2.fc18
> 
> It seems that at least for this, only a koji build exists but no update has
> been submitted, is that on purpose?

This is true. I'm waiting for bind-9.9.3 to be released to push an update in
bodhi. Currently there is 9.9.3rc2. So this is intentional and therefore the
Bug status is MODIFIED and not ON_QA.

Comment 20 Fedora Update System 2013-06-03 19:48:42 UTC
bind-dyndb-ldap-2.6-2.fc18,dnsperf-2.0.0.0-4.fc18,dhcp-4.2.5-12.fc18,bind-9.9.3-2.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/bind-dyndb-ldap-2.6-2.fc18,dnsperf-2.0.0.0-4.fc18,dhcp-4.2.5-12.fc18,bind-9.9.3-2.fc18

Comment 21 Fedora Update System 2013-06-03 19:52:19 UTC
dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-2.fc17

Comment 22 Fedora Update System 2013-06-06 01:29:01 UTC
Package dhcp-4.2.5-2.fc17, dnsperf-2.0.0.0-3.fc17, bind-dyndb-ldap-2.5-2.fc17, bind-9.9.3-3.P1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dhcp-4.2.5-2.fc17 dnsperf-2.0.0.0-3.fc17 bind-dyndb-ldap-2.5-2.fc17 bind-9.9.3-3.P1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-10100/dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-3.P1.fc17
then log in and leave karma (feedback).

Comment 23 Fedora Update System 2013-06-18 01:29:11 UTC
bind-dyndb-ldap-2.6-2.fc18, dnsperf-2.0.0.0-4.fc18, dhcp-4.2.5-12.fc18, bind-9.9.3-3.P1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2013-06-18 01:36:32 UTC
dhcp-4.2.5-2.fc17, dnsperf-2.0.0.0-3.fc17, bind-dyndb-ldap-2.5-2.fc17, bind-9.9.3-3.P1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.