Bug 457036

Summary: Latest security patches have downgraded performance on big recursors
Product: Red Hat Enterprise Linux 5 Reporter: Adam Tkac <atkac>
Component: bindAssignee: Adam Tkac <atkac>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: cward, dmair, jch, john.haxby, msusta, ovasik, rphipps+bugzredhat, rvokal, syeghiay, tao
Target Milestone: rcKeywords: Rebase
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Cause: ======= Fix for CVE-2008-1447 changed internal socket management logic which caused that named needs more sockets simulateously opened than before. System call select() can't handle such many opened descriptors. Consequence: ============ When server load increases approximately to 1000 queries per second then messages like "too many open file descriptors" appear in the log and named fails to handle next incomming queries. Fix: ==== Internal socket infrastructure has been reworked. System call select() is no longer used in favor of epoll infrastructure. Result: ======= BIND is able to handle far more bigger number of queries simulateously. For complete list of bug fixes check /usr/share/doc/bind-*/CHANGES
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 03:36:57 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 513501    

Description Adam Tkac 2008-07-29 07:24:18 EDT
Description of problem:
Latest security patches have downgraded performance on big recursors. When named
has to handle about 10000 queries per second performance is pretty bad and many
queries are not responded. This is due latest security patches. ISC (= upstream)
is going to solve this issue in upcomming 9.3.5-P2 update so we should apply
patches too.

Version-Release number of selected component (if applicable):
# rpm -q bind
bind-9.3.4-6.0.2.P1.el5_2

How reproducible:
always

Steps to Reproduce:
Quite hard. named has to handle about 10000 queries per second.

Actual results:
Many queries fails due "too many open file descriptors"

Expected results:
nearly same performance as in previous versions

Additional info:
There are many mails about this problem on bind-users list
(http://marc.info/?l=bind-users) and also on vendor-sec list. Other vendors are
also going to release improved versions (at least Debian and Mandriva)

I recommend rebase to 9.3.5-P2 version which will contain fix for this issue. It
is maintenance release with many other fixes, no new features. At least we
should backport performance fixes from 9.3.5-P2. (but as I wrote above use
9.3.5-P2 code is better long-term solution)
Comment 2 Richard Phipps 2008-08-05 14:42:59 EDT
It does not take 10000 queries per second to reproduce this. I was getting
an average of about one msg ("too many open file descriptors") per minute
on a recursive DNS server that receives less 200 queries per sec. As far as I 
can tell, this happens because of queries that timeout.

This should be reproducible by hitting your test recursor with a bunch
of queries where the authoritative DNS server for those queries is unreachable.

This particular fix (noted in the the upstream P2 release)
may alleviate the "too many open file descriptors" in my situation:

2399.   [bug]           Abort timeout queries to reduce the number of open
                        UDP sockets. [RT #18367]
Comment 10 Adam Tkac 2009-04-06 06:05:04 EDT
Rebased to 9.3.6-P1.
Comment 13 Adam Tkac 2009-06-26 09:26:56 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Cause:
=======
Fix for CVE-2008-1447 changed internal socket management logic which caused that named needs more sockets simulateously opened than before. System call select() can't handle such many opened descriptors.

Consequence:
============
When server load increases approximately to 1000 queries per second then messages like "too many open file descriptors" appear in the log and named fails to handle next incomming queries.

Fix:
====
Internal socket infrastructure has been reworked. System call select() is no longer used in favor of epoll infrastructure.

Result:
=======
BIND is able to handle far more bigger number of queries simulateously.
Comment 14 Adam Tkac 2009-06-26 09:31:11 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -12,4 +12,6 @@
 
 Result:
 =======
-BIND is able to handle far more bigger number of queries simulateously.+BIND is able to handle far more bigger number of queries simulateously.
+
+For complete list of bug fixes check /usr/share/doc/bind-*/CHANGES
Comment 15 Chris Ward 2009-07-03 14:05:47 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 19 errata-xmlrpc 2009-09-02 03:36:57 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1420.html