Bug 709205
Summary: | bind 9.7.4 chews 100% CPU | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ian Donaldson <idonaldson0> | ||||||
Component: | bind | Assignee: | Tomáš Hozza <thozza> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 18 | CC: | aad, al.dunsmuir, bill-bugzilla.redhat.com, bugzilla.redhat.com, bugzilla.redhat, dan, dc, deron.meranda, fedora, giomac, gustavo.farias.santos, higkoohk, jesus, john.haxby, j, kas, mcepl, mcepl, me, mihai, nicku, niki.guldbrand, nitind, ovasik, pmatilai, pomec, redhat-bugzilla, redhat, rrauenza, ruth, silfreed, thub, webmaster | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | dhcp-4.2.5-2.fc17 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-06-18 01:28:28 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Ian Donaldson
2011-05-31 04:39:22 UTC
I should add that named *is* returning answers ok to queries, and recursion and zone transfers is only enabled for restricted hosts. Backing out to previous version... # ls bind* bind-9.7.3-1.fc14.i686.rpm bind-utils-9.7.3-1.fc14.i686.rpm bind-libs-9.7.3-1.fc14.i686.rpm # rpm -Uvh --oldpackage bind* ... and now named doesn't show in top anymore I'm not able to reproduce this issue, named on my machine works fine (i.e. CPU utilization is same with both versions). Can you please attach your named.conf? Please strip all private data. attached is named.conf minus private data. Original has 73 zones; 72 slaves, 1 master. Created attachment 502226 [details]
named.conf
Unfortunately I'm still unable to reproduce this issue. Does named consume 100% CPU immediately when it is started? Can you run named following way, please (as root)? $ strace -f -s 1024 -o stracelog named -gunamed -d99 > log 2>&1 When named starts to consume 100% CPU, simply terminate it and attach the "log" and "stracelog" files here (you can bzip2 them if they are too big). Thank you in advance. I've set up this FC14 9.7.4 bind on another (x86_64) FC14 server and tried to reproduce it. Started with the stock named.conf; no problem. Then progressively put in bits of my production named.conf. It was all happy until I commented out this line, when named started going nuts at startup (without any requests even being sent to it) managed-keys-directory "/var/named/dynamic"; my production named.conf doesn't have this line. I've confirmed that the stock named.conf from bind-9.7.4-0.1.b1.fc14.x86_64 with the above line commented will produce the problem every time. I can confirm the above: without the managed-keys-directory setting (which my configuration didn't have), bind-9.7.4-0.1.b1.fc14.x86_64 goes crazy eating cpu cycles like there's no tomorrow. Adding that line makes things normal again. Ah, that was the trick, removal of the managed-keys-directory directive triggers this bug. Let me check what happens. *** Bug 710315 has been marked as a duplicate of this bug. *** Me too. The "managed-keys-directory" directive mentioned in comment #7 did not help for me. In my /var/log/messages, there is the following message: Jun 13 21:32:14 thorin setroubleshoot: SELinux is preventing /usr/sbin/named fro m write access on the directory /var/named/chroot/var/named. For complete SELinu x messages. run sealert -l 75f9b442-f638-4f28-9d82-747a09131e2c After running setenforce 0; /etc/init.d/named restart; setenforce 1 my bind no longer eats 100 % of CPU. *** Bug 739059 has been marked as a duplicate of this bug. *** Adding managed-keys-directory to my conf did make it work properly again for me. Ruth For what it's worth, I have exactly the same issue and exactly the same fix as Ruth in comment #13, but with bind-9.8.1-1.fc15.x86_64 Yeah, both 9.8.1 and 9.7.4 are bad. I reported this issue to upstream but I haven't received any response, yet... Same with FC15 bind-9.8.1-1.fc15.x86_64 managed-keys-directory option + dir creation solved the problem - thanks for posting I've run named in debug mode + run strace I remember I saw something related to writing of jnl files that are now thrown into specified directory SELinux disabled DNSSEC disabled zone transfer disabled recursion disabled notify disabled I hit this on f15 also, while setting up DNSSEC for my recursive resolver and the workaround fixed it. This has something to do with dnssec keys, because this happened to me when I added: include "/etc/named.root.key"; which contains: managed-keys { # DNSKEY for the root zone. # Updates are published on root-dnssec-announce . initial-key 257 3 8 "AwEAAagAIKlVZrpC6Ia7gEzahOR+9W29euxhJhVVLOyQbSEW0O8gcCjF FVQUTf6v58fLjwBd0YI0EzrAcQqBGCzh/RStIoO8g0NfnfL2MTJRkxoX bfDaUeVPQuYEhg37NZWAJQ9VnMVDxP/VHL496M/QZxkjf5/Efucp2gaD X6RS6CXpoY68LsvPVjR0ZSwzz1apAzvN9dlzEheX7ICJBBtuA6G3LQpz W5hOA2hzCTMjJPJ8LbqF6dsV6DoBQzgul0sGIcGOYl7OyQdXfZ57relS Qageu+ipAdTTJ25AsRTAoub8ONGcLmqrAmRLKBP1dfwhYB4N7knNnulq QxA+Uk1ihz0="; }; That include might exist by default now, I dunno, this is an upgraded server. So, managed-keys without managed-keys-directory is the killer. What's surprising is that /var/named/dynamic exists outside of the chroot jail and its equivalent inside the jail doesn't exist. Even though adding the workaround stopped the CPU burn, there are no files at all in the directory. SELinux is disabled on this machine. This is still an issue in Fedora 16 with bind-9.8.1-4.P1.fc16.x86_64 Is it possible to update the bug report to include Fedora version 14, 15, and 16. That would make it a lot easier to find, as it is still unresolved. If it helps, the low-level debugging output from named when it is in this infinite loop state looks like this: 20-Nov-2011 01:35:18.439 general: debug 1: zone_maintenance: managed-keys-zone ./IN: enter 20-Nov-2011 01:35:18.439 general: debug 1: zone_settimer: managed-keys-zone ./IN: enter 20-Nov-2011 01:35:18.439 general: debug 1: zone_timer: managed-keys-zone ./IN: enter 20-Nov-2011 01:35:18.439 general: debug 1: zone_maintenance: managed-keys-zone ./IN: enter ....repeat indefinitely... Changing the product version per comment 18. I'm also having this issue still with FC16 (32bit for what it's worth), first saw it with FC14. I compiled my own bind-9.8.0-3.P1.fc15.i686.rpm to work around it on fc14. I'll attach my named.conf as well -- I went ahead and added managed-keys-directory "/var/named/dynamic"; and it made no difference. Created attachment 550096 [details]
named.conf
Not sure this helps at all, but, here's a stack trace: Thread 5 (Thread 0xb777ab40 (LWP 8250)): #0 0x002af416 in __kernel_vsyscall () #1 0x00ced4c2 in __lll_lock_wait () from /lib/libpthread.so.0 #2 0x00ce8e53 in _L_lock_693 () from /lib/libpthread.so.0 #3 0x00ce8c98 in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x002608bc in isc__timer_reset () from /usr/lib/libisc.so.83 #5 0x0073359a in ?? () from /usr/lib/libdns.so.81 #6 0x007539b9 in ?? () from /usr/lib/libdns.so.81 #7 0x0025d204 in ?? () from /usr/lib/libisc.so.83 #8 0x00ce6cd3 in start_thread () from /lib/libpthread.so.0 #9 0x010377ce in clone () from /lib/libc.so.6 Thread 4 (Thread 0xb6f79b40 (LWP 8251)): #0 0x002af416 in __kernel_vsyscall () #1 0x00cea84c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x0025d0d9 in ?? () from /usr/lib/libisc.so.83 #3 0x00ce6cd3 in start_thread () from /lib/libpthread.so.0 #4 0x010377ce in clone () from /lib/libc.so.6 Thread 3 (Thread 0xb6778b40 (LWP 8252)): #0 0x002af416 in __kernel_vsyscall () #1 0x00ced582 in __lll_unlock_wake () from /lib/libpthread.so.0 #2 0x00ce9cd7 in _L_unlock_622 () from /lib/libpthread.so.0 #3 0x00ce9c1a in __pthread_mutex_unlock_usercnt () from /lib/libpthread.so.0 #4 0x0025c43a in ?? () from /usr/lib/libisc.so.83 #5 0x0025fa14 in ?? () from /usr/lib/libisc.so.83 #6 0x00ce6cd3 in start_thread () from /lib/libpthread.so.0 #7 0x010377ce in clone () from /lib/libc.so.6 Thread 2 (Thread 0xb5f77b40 (LWP 8253)): #0 0x002af416 in __kernel_vsyscall () #1 0x010381d6 in epoll_wait () from /lib/libc.so.6 #2 0x002707f1 in ?? () from /usr/lib/libisc.so.83 #3 0x00ce6cd3 in start_thread () from /lib/libpthread.so.0 #4 0x010377ce in clone () from /lib/libc.so.6 Thread 1 (Thread 0xb77bc980 (LWP 8249)): #0 0x002af416 in __kernel_vsyscall () #1 0x00f74d82 in sigsuspend () from /lib/libc.so.6 #2 0x00261aa4 in isc__app_ctxrun () from /usr/lib/libisc.so.83 #3 0x00261ffe in isc__app_run () from /usr/lib/libisc.so.83 #4 0x003eee43 in main () Well, on my system, I fixed this (so far) by making this file: -rw-r--r-- 1 named named 0 Dec 30 16:20 /var/named/chroot/var/named/dynamic/managed-keys.bind I saw an error about it in a log file with debug turned up to 20.. channel simple_log { severity debug 20; print-category yes; print-severity yes; print-time yes; file "/var/log/named/tmp.log"; }; category default { simple_log; }; I also had one here: # ll /var/named/dynamic/managed-keys.bind -rw-r--r--. 1 named named 708 Dec 7 2010 /var/named/dynamic/managed-keys.bind but since my environment is chroot'ed, it is in the wrong place... Maybe the rpm needs to drop down a null length file in the chroot? I have the same problem of hi cpu usage by named, solved this way: mkdir /var/named/chroot/var/named/dynamic chown named:named /var/named/chroot/var/named/dynamic +1 Have big LA... http://savepic.su/2012809.png -sh-4.2$ uname -a Linux *.net 2.6.41.4-1.fc15.i686.PAE #1 SMP Tue Nov 29 11:47:02 UTC 2011 i686 i686 i386 GNU/Linux -sh-4.2$ cat /etc/issue Fedora release 15 (Lovelock) Adding line in named.conf managed-keys-directory "/var/named/dynamic"; Fixed, Thanks! I just ran into this bug in Fedora 17, so it appears nobody has fixed this, what seems to be a race condition or something driving the CPU crazy. I am running bind-chroot.x86_64 32:9.9.1-2.P1.fc17 I had named-chroot.service running, it immediately went to 137% CPU. The config was valid with the named-checkconf and the local zone was valid with named-checkzone. However, the directories under /var/named/chroot/var/named were all owned by root:named instead of named:named. This caused named service to run 130% CPU in 'top' but was not actually stopping it from running and returning results. It was a simple change required: # chown -R named: /var/named/chroot/var/named/{data,dynamic,slaves} This then triggered an SELinux report because the labels were named_conf_t rather than named_cache_t so one more command to fix: # restorecon -R /var/named/chroot/ But why did it not complain??! Surely the service should be checking the file permissions are valid if we are telling it to use a cache directory? The named-checkconf did not complain about the permissions being wrong. It would also be helpful, maybe not possible with upstream, to have the named-checkconf to check the SELinux settings. I happened across this bug report after a google search to see if it was known. It seems to have been around for a long time. This problem is still present with : 32:bind-9.8.2-0.10.rc1.el6_3.2.x86_64 on CentOs 6.3, on two different servers I solved the problem by means of : chown -R named.named /var/named/chroot restorecon -R /var/named/chroot/ I run bind 9.8.1-P1 on an Ubuntu 12.04 server and encountered this issue. I resolved it by: 1. Adding to /etc/bind/named.conf.options this line inside the options braces: managed-keys-directory "/var/named/dynamic"; 2. Creating /var/named and /var/named/dynamic and doing: sudo chown -R bind /var/named 3. Turning off AppArmor for named using: sudo aa-complain usr.sbin.named The last step was necessary because in dmesg I observed: [22242455.552280] type=1400 audit(1345503871.458:10): apparmor="DENIED" operation="mknod" parent=11043 profile="/usr/sbin/named" name="/var/named/dynamic/managed-keys.bind.jnl" pid=11045 comm="named" requested_mask="c" denied_mask="c" fsuid=106 ouid=106 (In reply to comment #28) > This problem is still present with : 32:bind-9.8.2-0.10.rc1.el6_3.2.x86_64 > on CentOs 6.3, on two different servers > > I solved the problem by means of : > > chown -R named.named /var/named/chroot > > restorecon -R /var/named/chroot/ That helps for bind-9.9.1-5.P2.fc17.x86_64 in Fedora 17. My custom named.conf was missing the: managed-keys-directory "/var/named/dynamic"; mentioned in comment #29, but adding that and chown&restorecon makes CPU idle properly. There is surely some sort of bug in bind. It needs access to something, but it cannot complain about the fact and starts chewing CPU. Please fix this. Can someone please update the version for this bug to Fedora 17, as this issue still exists is current releases. (In reply to comment #31) > Can someone please update the version for this bug to Fedora 17, as this > issue still exists is current releases. It is better to create a new bug and refer to this one. In this way we have a real user as a reporter who actually is active and cares about the bug. Thank you This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Still valid on F18 with bind-9.9.2-8.P1.fc18.i686 and adding the following line to named.conf fixes it: managed-keys-directory "dynamic"; Should I open a new bug for it, as noted above ? I vote yes: clone it. It's a significant problem with bind. Just created a new bug #948026 since I'm experiencing this very same problem using BIND named 9.9.2 on Fedora 18. The missing managed-keys-directory statement triggering this bug is there in the named.conf file that comes with the package. However, it is missing in the template configuration file /usr/share/doc/bind-9.8.2/sample/etc/named.conf that a lot of people might be using. A corrected sample file is attached to the new bug report here: https://bugzilla.redhat.com/show_bug.cgi?id=948026 This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. Notice from BIND-9.9.3 Release notes: Missing 'managed-keys-directory' is now handled better. Prior to this change, when misconfigured, named could loop and consume 100% CPU. [RT #30625] So bind-9.9.3 should fix the issue when managed-keys-directory is missing in named.conf. bind-9.9.3-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/bind-9.9.3-1.fc19 bind-dyndb-ldap-2.6-2.fc18,dnsperf-2.0.0.0-4.fc18,dhcp-4.2.5-12.fc18,bind-9.9.3-2.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/bind-dyndb-ldap-2.6-2.fc18,dnsperf-2.0.0.0-4.fc18,dhcp-4.2.5-12.fc18,bind-9.9.3-2.fc18 dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-2.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-2.fc17 Package dhcp-4.2.5-2.fc17, dnsperf-2.0.0.0-3.fc17, bind-dyndb-ldap-2.5-2.fc17, bind-9.9.3-3.P1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing dhcp-4.2.5-2.fc17 dnsperf-2.0.0.0-3.fc17 bind-dyndb-ldap-2.5-2.fc17 bind-9.9.3-3.P1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-10100/dhcp-4.2.5-2.fc17,dnsperf-2.0.0.0-3.fc17,bind-dyndb-ldap-2.5-2.fc17,bind-9.9.3-3.P1.fc17 then log in and leave karma (feedback). bind-9.9.3-3.P1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. bind-dyndb-ldap-2.6-2.fc18, dnsperf-2.0.0.0-4.fc18, dhcp-4.2.5-12.fc18, bind-9.9.3-3.P1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. dhcp-4.2.5-2.fc17, dnsperf-2.0.0.0-3.fc17, bind-dyndb-ldap-2.5-2.fc17, bind-9.9.3-3.P1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. Hello everyone: I meet this issue recently, even no query and cpu use > 100%. This issue gones away when I del 'include "/etc/named.root.key";' in named.conf. Try it! Please, alter version of Fedora for actual (FEDORA 20), because the error persist on versions actually. My bind is on version bind-9.9.1-4.P3.el6.x86_64 and only run below of 99% when I insert my named.conf add in section options {} the line: managed-keys-directory "/var/named/dynamic"; (In reply to Gustavo Farias dos Santos from comment #47) > Please, alter version of Fedora for actual (FEDORA 20), because the error > persist on versions actually. > > My bind is on version bind-9.9.1-4.P3.el6.x86_64 and only run below of 99% > when I insert my named.conf add in section options {} the line: > > managed-keys-directory "/var/named/dynamic"; Disregard, my system is RH 4.4.4-13 Thanks. Hi. There is no report for Fedora 20 user complaining about the current bind-9.9.4-12.P2.fc20 version. If there is any, the user is welcome to reopen this bug... bind-9.8.2-0.23.rc1.el6_5.1.x86_64 reporting from 6.5 bug exists. did create that /dynamic folder and all is good now. |