Bug 683648 - bind hang on shutdown at pthread_join
Summary: bind hang on shutdown at pthread_join
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 14
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Adam Tkac
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-09 22:56 UTC by Cesar Eduardo Barros
Modified: 2013-04-30 23:48 UTC (History)
7 users (show)

Fixed In Version: bind-9.7.4-0.1.b1.fc14
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-27 20:26:16 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
named.run.1 (69.13 KB, application/x-gzip)
2011-04-14 10:42 UTC, Cesar Eduardo Barros
no flags Details
named.run.0 (124.18 KB, application/x-gzip)
2011-04-14 10:42 UTC, Cesar Eduardo Barros
no flags Details
named.run (86.92 KB, application/x-gzip)
2011-04-14 10:43 UTC, Cesar Eduardo Barros
no flags Details
named.run.0 (2.00 MB, application/octet-stream)
2011-05-09 16:02 UTC, gene c
no flags Details
named.run.1 (2.00 MB, application/octet-stream)
2011-05-09 16:03 UTC, gene c
no flags Details
named.run.2 (2.00 MB, application/octet-stream)
2011-05-09 16:03 UTC, gene c
no flags Details
named.run.0 (2.00 MB, text/plain)
2011-05-09 17:17 UTC, gene c
no flags Details
named.run.1 (2.00 MB, text/plain)
2011-05-09 17:18 UTC, gene c
no flags Details
named.run.2 (2.00 MB, text/plain)
2011-05-09 17:18 UTC, gene c
no flags Details

Description Cesar Eduardo Barros 2011-03-09 22:56:51 UTC
Description of problem:

Sometimes, when attempting to stop bind (service named stop, as would be done on shutdown), the named process becomes stuck at pthread_join and fails to exit.

Version-Release number of selected component (if applicable):

bind-9.7.3-1.fc14.x86_64

How reproducible:

Often. It seems to need several hours to happen, so if I kill the process manually and start it with "service named start" and run "service named stop" immediately, it will work.

Steps to Reproduce:
1. Boot laptop with bind installed and configured to start on boot (but with nothing actually using it).
2. Use the laptop for several hours.
3. Attempt to shutdown the laptop.
  
Actual results:

The /etc/init.d/named script takes several seconds to give up, and then presents a umount error message. Twice. Resulting in noticeably slower shutdown.

Expected results:

It should stop quickly.

Additional info:

Here is what appears on the log for a stop attempt:

Mar  9 18:39:36 cesarb-inspiron named[1368]: received control channel command 'stop'
Mar  9 18:39:36 cesarb-inspiron named[1368]: shutting down: flushing changes
Mar  9 18:39:36 cesarb-inspiron named[1368]: stopping command channel on 127.0.0.1#953
Mar  9 18:39:36 cesarb-inspiron named[1368]: stopping command channel on ::1#953
Mar  9 18:39:36 cesarb-inspiron named[1368]: no longer listening on 127.0.0.1#53
Mar  9 18:39:36 cesarb-inspiron named[1368]: no longer listening on ::1#53

The process is stuck at this point. Attaching to it with gdb gives me this information (I saved a core dump with the gdb "gcore" command, so it should be possible to easily get more information even if I have difficulty reproducing the problem again):

$ su -c 'gdb -p 1368'
Senha:
GNU gdb (GDB) Fedora (7.2-46.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 1368
Reading symbols from /usr/sbin/named...Reading symbols from /usr/lib/debug/usr/sbin/lwresd.debug...done.
done.
Reading symbols from /usr/lib64/liblwres.so.60...Reading symbols from /usr/lib/debug/usr/lib64/liblwres.so.60.0.1.debug...done.
done.
Loaded symbols for /usr/lib64/liblwres.so.60
Reading symbols from /usr/lib64/libdns.so.69...Reading symbols from /usr/lib/debug/usr/lib64/libdns.so.69.1.2.debug...done.
done.
Loaded symbols for /usr/lib64/libdns.so.69
Reading symbols from /usr/lib64/libbind9.so.60...Reading symbols from /usr/lib/debug/usr/lib64/libbind9.so.60.0.4.debug...done.
done.
Loaded symbols for /usr/lib64/libbind9.so.60
Reading symbols from /usr/lib64/libisccfg.so.62...Reading symbols from /usr/lib/debug/usr/lib64/libisccfg.so.62.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/libisccfg.so.62
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from /usr/lib/debug/lib64/libgssapi_krb5.so.2.2.debug...done.
done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from /usr/lib/debug/lib64/libkrb5.so.3.3.debug...done.
done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from /usr/lib/debug/lib64/libk5crypto.so.3.1.debug...done.
done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from /usr/lib/debug/lib64/libcom_err.so.2.1.debug...done.
done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libcrypto.so.10...Reading symbols from /usr/lib/debug/lib64/libcrypto.so.1.0.0d.debug...done.
done.
Loaded symbols for /lib64/libcrypto.so.10
Reading symbols from /usr/lib64/libisccc.so.60...Reading symbols from /usr/lib/debug/usr/lib64/libisccc.so.60.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/libisccc.so.60
Reading symbols from /usr/lib64/libisc.so.62...Reading symbols from /usr/lib/debug/usr/lib64/libisc.so.62.1.1.debug...done.
done.
Loaded symbols for /usr/lib64/libisc.so.62
Reading symbols from /lib64/libcap.so.2...Reading symbols from /usr/lib/debug/lib64/libcap.so.2.17.debug...done.
done.
Loaded symbols for /lib64/libcap.so.2
Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.13.so.debug...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7f32fcda4700 (LWP 1374)]
[New Thread 0x7f32fd5a5700 (LWP 1373)]
[New Thread 0x7f32fdda6700 (LWP 1372)]
[New Thread 0x7f32fe5a7700 (LWP 1371)]
[New Thread 0x7f32feda8700 (LWP 1370)]
[New Thread 0x7f32ff5a9700 (LWP 1369)]
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/lib64/libxml2.so.2...Reading symbols from /usr/lib/debug/usr/lib64/libxml2.so.2.7.7.debug...done.
done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /lib64/libz.so.1...Reading symbols from /usr/lib/debug/lib64/libz.so.1.2.5.debug...done.
done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libm.so.6...Reading symbols from /usr/lib/debug/lib64/libm-2.13.so.debug...done.
done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.13.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libkrb5support.so.0...Reading symbols from /usr/lib/debug/lib64/libkrb5support.so.0.1.debug...done.
done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libdl.so.2...Reading symbols from /usr/lib/debug/lib64/libdl-2.13.so.debug...done.
done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...Reading symbols from /usr/lib/debug/lib64/libresolv-2.13.so.debug...done.
done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.13.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libattr.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libattr.so.1
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libnss_files.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.13.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f3300f96de5 in pthread_join (threadid=139857009219328, 
    thread_return=0x0) at pthread_join.c:89
89	    lll_wait_tid (pd->tid);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.2-6.fc12.x86_64 libattr-2.4.44-6.fc14.x86_64 libselinux-2.0.96-6.fc14.1.x86_64
(gdb) info threads
  7 Thread 0x7f32ff5a9700 (LWP 1369)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6 Thread 0x7f32feda8700 (LWP 1370)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  5 Thread 0x7f32fe5a7700 (LWP 1371)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 0x7f32fdda6700 (LWP 1372)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 0x7f32fd5a5700 (LWP 1373)  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 0x7f32fcda4700 (LWP 1374)  0x00007f33004e4193 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7f33030977e0 (LWP 1368)  0x00007f3300f96de5 in pthread_join (
    threadid=139857009219328, thread_return=0x0) at pthread_join.c:89
(gdb) bt
#0  0x00007f3300f96de5 in pthread_join (threadid=139857009219328, 
    thread_return=0x0) at pthread_join.c:89
#1  0x00007f33013db0c4 in isc__taskmgr_destroy (managerp=0x7f3303338e70)
    at task.c:1387
#2  0x00007f33030e770f in destroy_managers (argc=<value optimized out>, 
    argv=0x7fffc7fb0858) at ./main.c:633
#3  cleanup (argc=<value optimized out>, argv=0x7fffc7fb0858) at ./main.c:863
#4  main (argc=<value optimized out>, argv=0x7fffc7fb0858) at ./main.c:1066
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f32fcda4700 (LWP 1374))]#0  0x00007f33004e4193 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007f33004e4193 in epoll_wait ()
    at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f33013ec6ac in watcher (uap=0x7f3303061010) at socket.c:3721
#2  0x00007f3300f95ccb in start_thread (arg=0x7f32fcda4700)
    at pthread_create.c:301
#3  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f32fd5a5700 (LWP 1373))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f33013de015 in run (uap=0x7f330305f010) at timer.c:832
#2  0x00007f3300f95ccb in start_thread (arg=0x7f32fd5a5700)
    at pthread_create.c:301
#3  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f32fdda6700 (LWP 1372))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f33013db241 in dispatch (uap=0x7f330305d010) at task.c:961
#2  run (uap=0x7f330305d010) at task.c:1158
#3  0x00007f3300f95ccb in start_thread (arg=0x7f32fdda6700)
    at pthread_create.c:301
#4  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f32fe5a7700 (LWP 1371))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f33013db241 in dispatch (uap=0x7f330305d010) at task.c:961
#2  run (uap=0x7f330305d010) at task.c:1158
#3  0x00007f3300f95ccb in start_thread (arg=0x7f32fe5a7700)
    at pthread_create.c:301
#4  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f32feda8700 (LWP 1370))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f33013db241 in dispatch (uap=0x7f330305d010) at task.c:961
#2  run (uap=0x7f330305d010) at task.c:1158
#3  0x00007f3300f95ccb in start_thread (arg=0x7f32feda8700)
    at pthread_create.c:301
#4  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f32ff5a9700 (LWP 1369))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f33013db241 in dispatch (uap=0x7f330305d010) at task.c:961
#2  run (uap=0x7f330305d010) at task.c:1158
#3  0x00007f3300f95ccb in start_thread (arg=0x7f32ff5a9700)
    at pthread_create.c:301
#4  0x00007f33004e3c2d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Comment 1 Cesar Eduardo Barros 2011-03-10 00:56:18 UTC
For future reference, someone else reporting a similar problem: https://lists.isc.org/pipermail/bind-workers/2002-June/000700.html

Comment 2 Gabriele Turchi 2011-03-12 09:43:44 UTC
For me this happens every time on system shutdown, same version involved.

Comment 3 gene c 2011-03-31 17:05:16 UTC
I see similar on fully updated 64 bit F14: Here's what I see:


   shutdown ..

   network goes offline
   ntpd asks for an update (looks like it traps the kill -15 ..
         and stays up trying to clean up )
   named times out .. and gets error .. wont shutdown cleanly
           (it has received a kill -15 which it has trapped)
   named waits trying to clean up - cannot clean up coz network is down.
   ntpd finally exits (and stops nagging named)

  delay of shutdown now about 4 mins   .. .
       eventually named gets killed harder harder (and finally exits)

====================== messages =======================
Mar 31 06:55:38 lap1 ntpd[2767]: 192.5.41.41 interface 10.10.10.70 ->
                                 (null)
Mar 31 06:55:43 lap1 avahi-daemon[1467]: Got SIGTERM, quitting.
Mar 31 06:55:43 lap1 avahi-daemon[1467]: Leaving mDNS multicast
                                         group on
interface virbr0.IPv4 with address 192.168.122.1.
Mar 31 06:55:43 lap1 avahi-daemon[1467]: avahi-daemon 0.6.27 exiting.
Mar 31 06:55:43 lap1 ntpd[2767]: ntpd exiting on signal 15
Mar 31 06:55:43 lap1 ntpd[1546]: ntpd 4.2.6p3-RC10
                Thu Nov 25 16:18:33 UTC 2010 (1)
Mar 31 06:55:43 lap1 ntpd[1549]: proto: precision = 0.839 usec
Mar 31 06:55:43 lap1 ntpd[1549]: 0.0.0.0 c01d 0d kern kernel time
                                 sync enabled
Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 0 v4wildcard
                                 0.0.0.0 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 1 v6wildcard
                                 :: UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 2 lo 127.0.0.1
                                 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 3 virbr0
                                 192.168.122.1 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 4 wlan0
                                 fe80::21f:3bff:fe27:f3c5 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 5 lo ::1 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listening on routing socket on fd #22
                                 for interface updates
Mar 31 06:55:44 lap1 named[1483]: error (network unreachable)
             resolving 'tock.usno.navy.mil/AAAA/IN': 10.10.10.63#53
Mar 31 06:55:44 lap1 named[1483]: error (network unreachable)
             resolving 'charon.nofs.navy.mil/A/IN': 10.10.10.63#53


... (loads of similar errors)


Mar 31 06:55:49 lap1 named[1483]: error (network unreachable)
             resolving 'i.root-servers.net/AAAA/IN': 2001:dc3::35#53
Mar 31 06:55:49 lap1 named[1483]: error (network unreachable)
     resolving 'l.root-servers.net/AAAA/IN': 2001:500:1::803f:235#53
Mar 31 06:55:49 lap1 ntpd[1549]: ntpd exiting on signal 15
Mar 31 06:55:49 lap1 named[1483]: received control channel command
                                  'stop'
Mar 31 06:55:49 lap1 named[1483]: shutting down: flushing changes
Mar 31 06:55:49 lap1 named[1483]: stopping command channel on
                                  127.0.0.1#953
Mar 31 06:55:49 lap1 named[1483]: stopping command channel on
                                  ::1#953
Mar 31 06:55:49 lap1 named[1483]: no longer listening on
                                  127.0.0.1#53
Mar 31 06:56:13 lap1 dnsmasq[1991]: no servers found in
                                    /etc/resolv.conf, will retry


==================================================================

Comment 4 gene c 2011-03-31 18:11:33 UTC
I note in the above log that in spite of dnsmasq being chkconfig'ed off something has started it (NM perhaps) ...

Can this be a problem ?

Comment 5 Adam Tkac 2011-04-13 11:27:01 UTC
(In reply to comment #2)
> For me this happens every time on system shutdown, same version involved.

Does named also hang when you start your computer and then immediately turn it down? Or computer must run for some time.

Comment 6 gene c 2011-04-13 11:35:16 UTC
GOod observation!

No - the computer must be running a while - there is no problem with a reboot followed immediately bu a shutdown.

Comment 7 Adam Tkac 2011-04-13 12:12:05 UTC
Unfortunately I'm not able to figure why named hangs on shutdown only from backtrace.

Can you please try to do this?:

1. put OPTIONS='-d99' to /etc/sysconfig/named
2. put following statement to your named.conf:

logging {
        channel default_debug {
                file "data/named.run" versions 3 size 2m;
                print-category yes;
                severity debug 99;
        };
};

3. restart named
4. wait till you reproduce the issue (for example reboot machine) and then please attach /var/named/data/named.run, /var/named/data/named.run.1 and /var/named/data/named.run.2 files to this bug.

Thank you in advance.

Comment 8 Cesar Eduardo Barros 2011-04-14 10:42:02 UTC
Created attachment 492031 [details]
named.run.1

Lines before debug mode enabled were removed

Comment 9 Cesar Eduardo Barros 2011-04-14 10:42:37 UTC
Created attachment 492034 [details]
named.run.0

Comment 10 Cesar Eduardo Barros 2011-04-14 10:43:05 UTC
Created attachment 492035 [details]
named.run

Comment 11 Cesar Eduardo Barros 2011-04-14 10:48:58 UTC
I enabled debug mode as requested, restarted named, rebooted the machine (the bug did not trigger here since it was only a few minutes since named was started), waited a few hours, and shut down the machine (the bug triggered here). Then today I turned on the machine, waited for it to connect to the wireless, waited a bit more, and copied the logs out from /var/named/data.

Comment 12 gene c 2011-05-09 16:02:15 UTC
Created attachment 497851 [details]
named.run.0

Comment 13 gene c 2011-05-09 16:03:03 UTC
Created attachment 497852 [details]
named.run.1

Comment 14 gene c 2011-05-09 16:03:24 UTC
Created attachment 497853 [details]
named.run.2

Comment 15 gene c 2011-05-09 17:17:35 UTC
Created attachment 497871 [details]
named.run.0

changed to plain text

Comment 16 gene c 2011-05-09 17:18:14 UTC
Created attachment 497872 [details]
named.run.1

changed to plain text

Comment 17 gene c 2011-05-09 17:18:40 UTC
Created attachment 497873 [details]
named.run.2

changed to plain text

Comment 18 Adam Tkac 2011-05-12 08:18:48 UTC
I was able to reproduce this issue with current 9.7.3 F14 build but not with the latest upstream release.

I uploaded x86_64 test packages to http://atkac.fedorapeople.org/bind/rh683648, can you please verify they fixes the issue? Make sure you update at least both bind and bind-libs packages. Thank you in advance.

Comment 19 gene c 2011-05-12 15:04:08 UTC
Thank you.

I am installing these now - and will report back after the end of the day - i will leave it running long enough to ensure the problem would ordinarily be certain to occur.

Comment 20 gene c 2011-05-12 23:58:11 UTC
Initial long run test - the problem has gone away with the updated bind.
I'll run for 12 hours and check again to make absolutely sure.

Comment 21 gene c 2011-05-13 12:40:57 UTC
Confirmed after overnight test - this most certainly seems to have fixed the problem.

Thanks!

Comment 22 gene c 2011-05-13 20:04:03 UTC
BTW - any reason not to push bind 9.8.0 for F14 testing instead of 9.7.4 ?

Comment 23 Cesar Eduardo Barros 2011-05-14 16:35:40 UTC
Could you also upload the bind-chroot package? If I were to install these I would have to remove bind-chroot, and that could change the result.

$ su -c 'yum localinstall http://atkac.fedorapeople.org/bind/rh683648/bind-9.7.4-0.0.b1.fc14.x86_64.rpm http://atkac.fedorapeople.org/bind/rh683648/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm http://atkac.fedorapeople.org/bind/rh683648/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm'
Senha:
Plugins carregados: auto-update-debuginfo, langpacks, presto, refresh-packagekit
Adding pt_BR to language list
Configurando o processo de pacote local
bind-9.7.4-0.0.b1.fc14.x86_64.rpm                        | 3.9 MB     00:41     
Examinando /var/tmp/yum-root-r2rnZO/bind-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-9.7.4-0.0.b1.fc14.x86_64
Marcando /var/tmp/yum-root-r2rnZO/bind-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-9.7.3-1.fc14.x86_64
Found 8 installed debuginfo package(s)
[...]
bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm                   | 843 kB     00:09     
Examinando /var/tmp/yum-root-r2rnZO/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-libs-9.7.4-0.0.b1.fc14.x86_64
Marcando /var/tmp/yum-root-r2rnZO/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-libs-9.7.3-1.fc14.x86_64
bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm                  | 176 kB     00:03     
Examinando /var/tmp/yum-root-r2rnZO/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-utils-9.7.4-0.0.b1.fc14.x86_64
Marcando /var/tmp/yum-root-r2rnZO/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-utils-9.7.3-1.fc14.x86_64
Resolvendo dependências
--> Executando verificação da transação
--> Processando dependência: bind = 32:9.7.3-1.fc14 para o pacote: 32:bind-chroot-9.7.3-1.fc14.x86_64
---> Pacote bind.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado
---> Pacote bind-libs.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado
---> Pacote bind-utils.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado
--> Resolução de dependências finalizada
Error: Pacote: 32:bind-chroot-9.7.3-1.fc14.x86_64 (@updates)
           Requer: bind = 32:9.7.3-1.fc14
           Removendo: 32:bind-9.7.3-1.fc14.x86_64 (@updates)
               bind = 32:9.7.3-1.fc14
           Updated By: 32:bind-9.7.4-0.0.b1.fc14.x86_64 (/bind-9.7.4-0.0.b1.fc14.x86_64)
               bind = 32:9.7.4-0.0.b1.fc14
           Disponível: 32:bind-9.7.2-2.P2.fc14.x86_64 (fedora)
               bind = 32:9.7.2-2.P2.fc14
 Você pode tentar usar o parâmetro --skip-broken para contornar o problema
 Você pode tentar executar: rpm -Va --nofiles --nodigest

Comment 24 Adam Tkac 2011-05-17 09:14:26 UTC
(In reply to comment #22)
> BTW - any reason not to push bind 9.8.0 for F14 testing instead of 9.7.4 ?

Rebasing to the next major release in stable distribution is generally not so good idea because it can cause unexpected issues (for example BIND 9.8 decreased query timeout from 30 sec to 10 sec which might cause issues on slow networks).

Comment 25 Fedora Update System 2011-05-17 09:27:48 UTC
bind-9.7.4-0.1.b1.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/bind-9.7.4-0.1.b1.fc14

Comment 26 Adam Tkac 2011-05-17 09:29:03 UTC
(In reply to comment #23)
> Could you also upload the bind-chroot package? If I were to install these I
> would have to remove bind-chroot, and that could change the result.

I submitted new package as an update, you will test it when it lands into updates-testing.

Comment 27 gene c 2011-05-17 12:02:25 UTC
ok - i'll ack it (im a proventester) - thanks for fixing this.
Ok on not going up to 9.8 - I'd be happy with it but understand the minimal disruption thoughts.

Thanks again for taking care of this ...  I have had no issues since installing your updated version.

Comment 28 Adam Tkac 2011-05-17 12:09:21 UTC
(In reply to comment #27)
> Thanks again for taking care of this ...  I have had no issues since installing
> your updated version.

Ok, thanks for positive feedback!

Comment 29 Fedora Update System 2011-05-17 20:55:30 UTC
Package bind-9.7.4-0.1.b1.fc14:
* should fix your issue,
* was pushed to the Fedora 14 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing bind-9.7.4-0.1.b1.fc14'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/bind-9.7.4-0.1.b1.fc14
then log in and leave karma (feedback).

Comment 30 Cesar Eduardo Barros 2011-05-19 10:45:37 UTC
Yesterday, on the same situation where the hang on shutdown always happened, it did not happen with bind-9.7.4-0.1.b1.fc14.x86_64.

Comment 31 Fedora Update System 2011-05-27 20:26:09 UTC
bind-9.7.4-0.1.b1.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.