| Summary: | bind hang on shutdown at pthread_join | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Cesar Eduardo Barros <cesarb> | ||||||||||||||||||||
| Component: | bind | Assignee: | Adam Tkac <atkac> | ||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||
| Version: | 14 | CC: | ame.fedora, atkac, eddie, gjunk, ovasik, turchi, twaugh | ||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||
| OS: | Linux | ||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||
| Fixed In Version: | bind-9.7.4-0.1.b1.fc14 | Doc Type: | Bug Fix | ||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||
| Last Closed: | 2011-05-27 20:26:16 UTC | Type: | --- | ||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||
|
Description
Cesar Eduardo Barros
2011-03-09 22:56:51 UTC
For future reference, someone else reporting a similar problem: https://lists.isc.org/pipermail/bind-workers/2002-June/000700.html For me this happens every time on system shutdown, same version involved. I see similar on fully updated 64 bit F14: Here's what I see:
shutdown ..
network goes offline
ntpd asks for an update (looks like it traps the kill -15 ..
and stays up trying to clean up )
named times out .. and gets error .. wont shutdown cleanly
(it has received a kill -15 which it has trapped)
named waits trying to clean up - cannot clean up coz network is down.
ntpd finally exits (and stops nagging named)
delay of shutdown now about 4 mins .. .
eventually named gets killed harder harder (and finally exits)
====================== messages =======================
Mar 31 06:55:38 lap1 ntpd[2767]: 192.5.41.41 interface 10.10.10.70 ->
(null)
Mar 31 06:55:43 lap1 avahi-daemon[1467]: Got SIGTERM, quitting.
Mar 31 06:55:43 lap1 avahi-daemon[1467]: Leaving mDNS multicast
group on
interface virbr0.IPv4 with address 192.168.122.1.
Mar 31 06:55:43 lap1 avahi-daemon[1467]: avahi-daemon 0.6.27 exiting.
Mar 31 06:55:43 lap1 ntpd[2767]: ntpd exiting on signal 15
Mar 31 06:55:43 lap1 ntpd[1546]: ntpd 4.2.6p3-RC10
Thu Nov 25 16:18:33 UTC 2010 (1)
Mar 31 06:55:43 lap1 ntpd[1549]: proto: precision = 0.839 usec
Mar 31 06:55:43 lap1 ntpd[1549]: 0.0.0.0 c01d 0d kern kernel time
sync enabled
Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 0 v4wildcard
0.0.0.0 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 1 v6wildcard
:: UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 2 lo 127.0.0.1
UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 3 virbr0
192.168.122.1 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 4 wlan0
fe80::21f:3bff:fe27:f3c5 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 5 lo ::1 UDP 123
Mar 31 06:55:43 lap1 ntpd[1549]: Listening on routing socket on fd #22
for interface updates
Mar 31 06:55:44 lap1 named[1483]: error (network unreachable)
resolving 'tock.usno.navy.mil/AAAA/IN': 10.10.10.63#53
Mar 31 06:55:44 lap1 named[1483]: error (network unreachable)
resolving 'charon.nofs.navy.mil/A/IN': 10.10.10.63#53
... (loads of similar errors)
Mar 31 06:55:49 lap1 named[1483]: error (network unreachable)
resolving 'i.root-servers.net/AAAA/IN': 2001:dc3::35#53
Mar 31 06:55:49 lap1 named[1483]: error (network unreachable)
resolving 'l.root-servers.net/AAAA/IN': 2001:500:1::803f:235#53
Mar 31 06:55:49 lap1 ntpd[1549]: ntpd exiting on signal 15
Mar 31 06:55:49 lap1 named[1483]: received control channel command
'stop'
Mar 31 06:55:49 lap1 named[1483]: shutting down: flushing changes
Mar 31 06:55:49 lap1 named[1483]: stopping command channel on
127.0.0.1#953
Mar 31 06:55:49 lap1 named[1483]: stopping command channel on
::1#953
Mar 31 06:55:49 lap1 named[1483]: no longer listening on
127.0.0.1#53
Mar 31 06:56:13 lap1 dnsmasq[1991]: no servers found in
/etc/resolv.conf, will retry
==================================================================
I note in the above log that in spite of dnsmasq being chkconfig'ed off something has started it (NM perhaps) ... Can this be a problem ? (In reply to comment #2) > For me this happens every time on system shutdown, same version involved. Does named also hang when you start your computer and then immediately turn it down? Or computer must run for some time. GOod observation! No - the computer must be running a while - there is no problem with a reboot followed immediately bu a shutdown. Unfortunately I'm not able to figure why named hangs on shutdown only from backtrace.
Can you please try to do this?:
1. put OPTIONS='-d99' to /etc/sysconfig/named
2. put following statement to your named.conf:
logging {
channel default_debug {
file "data/named.run" versions 3 size 2m;
print-category yes;
severity debug 99;
};
};
3. restart named
4. wait till you reproduce the issue (for example reboot machine) and then please attach /var/named/data/named.run, /var/named/data/named.run.1 and /var/named/data/named.run.2 files to this bug.
Thank you in advance.
Created attachment 492031 [details]
named.run.1
Lines before debug mode enabled were removed
Created attachment 492034 [details]
named.run.0
Created attachment 492035 [details]
named.run
I enabled debug mode as requested, restarted named, rebooted the machine (the bug did not trigger here since it was only a few minutes since named was started), waited a few hours, and shut down the machine (the bug triggered here). Then today I turned on the machine, waited for it to connect to the wireless, waited a bit more, and copied the logs out from /var/named/data. Created attachment 497851 [details]
named.run.0
Created attachment 497852 [details]
named.run.1
Created attachment 497853 [details]
named.run.2
Created attachment 497871 [details]
named.run.0
changed to plain text
Created attachment 497872 [details]
named.run.1
changed to plain text
Created attachment 497873 [details]
named.run.2
changed to plain text
I was able to reproduce this issue with current 9.7.3 F14 build but not with the latest upstream release. I uploaded x86_64 test packages to http://atkac.fedorapeople.org/bind/rh683648, can you please verify they fixes the issue? Make sure you update at least both bind and bind-libs packages. Thank you in advance. Thank you. I am installing these now - and will report back after the end of the day - i will leave it running long enough to ensure the problem would ordinarily be certain to occur. Initial long run test - the problem has gone away with the updated bind. I'll run for 12 hours and check again to make absolutely sure. Confirmed after overnight test - this most certainly seems to have fixed the problem. Thanks! BTW - any reason not to push bind 9.8.0 for F14 testing instead of 9.7.4 ? Could you also upload the bind-chroot package? If I were to install these I would have to remove bind-chroot, and that could change the result. $ su -c 'yum localinstall http://atkac.fedorapeople.org/bind/rh683648/bind-9.7.4-0.0.b1.fc14.x86_64.rpm http://atkac.fedorapeople.org/bind/rh683648/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm http://atkac.fedorapeople.org/bind/rh683648/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm' Senha: Plugins carregados: auto-update-debuginfo, langpacks, presto, refresh-packagekit Adding pt_BR to language list Configurando o processo de pacote local bind-9.7.4-0.0.b1.fc14.x86_64.rpm | 3.9 MB 00:41 Examinando /var/tmp/yum-root-r2rnZO/bind-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-9.7.4-0.0.b1.fc14.x86_64 Marcando /var/tmp/yum-root-r2rnZO/bind-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-9.7.3-1.fc14.x86_64 Found 8 installed debuginfo package(s) [...] bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm | 843 kB 00:09 Examinando /var/tmp/yum-root-r2rnZO/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-libs-9.7.4-0.0.b1.fc14.x86_64 Marcando /var/tmp/yum-root-r2rnZO/bind-libs-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-libs-9.7.3-1.fc14.x86_64 bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm | 176 kB 00:03 Examinando /var/tmp/yum-root-r2rnZO/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm: 32:bind-utils-9.7.4-0.0.b1.fc14.x86_64 Marcando /var/tmp/yum-root-r2rnZO/bind-utils-9.7.4-0.0.b1.fc14.x86_64.rpm como uma atualização do 32:bind-utils-9.7.3-1.fc14.x86_64 Resolvendo dependências --> Executando verificação da transação --> Processando dependência: bind = 32:9.7.3-1.fc14 para o pacote: 32:bind-chroot-9.7.3-1.fc14.x86_64 ---> Pacote bind.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado ---> Pacote bind-libs.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado ---> Pacote bind-utils.x86_64 32:9.7.4-0.0.b1.fc14 definido para ser atualizado --> Resolução de dependências finalizada Error: Pacote: 32:bind-chroot-9.7.3-1.fc14.x86_64 (@updates) Requer: bind = 32:9.7.3-1.fc14 Removendo: 32:bind-9.7.3-1.fc14.x86_64 (@updates) bind = 32:9.7.3-1.fc14 Updated By: 32:bind-9.7.4-0.0.b1.fc14.x86_64 (/bind-9.7.4-0.0.b1.fc14.x86_64) bind = 32:9.7.4-0.0.b1.fc14 Disponível: 32:bind-9.7.2-2.P2.fc14.x86_64 (fedora) bind = 32:9.7.2-2.P2.fc14 Você pode tentar usar o parâmetro --skip-broken para contornar o problema Você pode tentar executar: rpm -Va --nofiles --nodigest (In reply to comment #22) > BTW - any reason not to push bind 9.8.0 for F14 testing instead of 9.7.4 ? Rebasing to the next major release in stable distribution is generally not so good idea because it can cause unexpected issues (for example BIND 9.8 decreased query timeout from 30 sec to 10 sec which might cause issues on slow networks). bind-9.7.4-0.1.b1.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/bind-9.7.4-0.1.b1.fc14 (In reply to comment #23) > Could you also upload the bind-chroot package? If I were to install these I > would have to remove bind-chroot, and that could change the result. I submitted new package as an update, you will test it when it lands into updates-testing. ok - i'll ack it (im a proventester) - thanks for fixing this. Ok on not going up to 9.8 - I'd be happy with it but understand the minimal disruption thoughts. Thanks again for taking care of this ... I have had no issues since installing your updated version. (In reply to comment #27) > Thanks again for taking care of this ... I have had no issues since installing > your updated version. Ok, thanks for positive feedback! Package bind-9.7.4-0.1.b1.fc14: * should fix your issue, * was pushed to the Fedora 14 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing bind-9.7.4-0.1.b1.fc14' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/bind-9.7.4-0.1.b1.fc14 then log in and leave karma (feedback). Yesterday, on the same situation where the hang on shutdown always happened, it did not happen with bind-9.7.4-0.1.b1.fc14.x86_64. bind-9.7.4-0.1.b1.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. |