Bug 457036 - Latest security patches have downgraded performance on big recursors
Summary: Latest security patches have downgraded performance on big recursors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: bind
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Adam Tkac
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 5.4, TechnicalNotes
TreeView+ depends on / blocked
 
Reported: 2008-07-29 11:24 UTC by Adam Tkac
Modified: 2018-11-26 17:19 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Cause: ======= Fix for CVE-2008-1447 changed internal socket management logic which caused that named needs more sockets simulateously opened than before. System call select() can't handle such many opened descriptors. Consequence: ============ When server load increases approximately to 1000 queries per second then messages like "too many open file descriptors" appear in the log and named fails to handle next incomming queries. Fix: ==== Internal socket infrastructure has been reworked. System call select() is no longer used in favor of epoll infrastructure. Result: ======= BIND is able to handle far more bigger number of queries simulateously. For complete list of bug fixes check /usr/share/doc/bind-*/CHANGES
Clone Of:
Environment:
Last Closed: 2009-09-02 07:36:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1420 0 normal SHIPPED_LIVE bind bug fix and enhancement update 2009-09-02 07:36:40 UTC

Description Adam Tkac 2008-07-29 11:24:18 UTC
Description of problem:
Latest security patches have downgraded performance on big recursors. When named
has to handle about 10000 queries per second performance is pretty bad and many
queries are not responded. This is due latest security patches. ISC (= upstream)
is going to solve this issue in upcomming 9.3.5-P2 update so we should apply
patches too.

Version-Release number of selected component (if applicable):
# rpm -q bind
bind-9.3.4-6.0.2.P1.el5_2

How reproducible:
always

Steps to Reproduce:
Quite hard. named has to handle about 10000 queries per second.

Actual results:
Many queries fails due "too many open file descriptors"

Expected results:
nearly same performance as in previous versions

Additional info:
There are many mails about this problem on bind-users list
(http://marc.info/?l=bind-users) and also on vendor-sec list. Other vendors are
also going to release improved versions (at least Debian and Mandriva)

I recommend rebase to 9.3.5-P2 version which will contain fix for this issue. It
is maintenance release with many other fixes, no new features. At least we
should backport performance fixes from 9.3.5-P2. (but as I wrote above use
9.3.5-P2 code is better long-term solution)

Comment 2 Richard Phipps 2008-08-05 18:42:59 UTC
It does not take 10000 queries per second to reproduce this. I was getting
an average of about one msg ("too many open file descriptors") per minute
on a recursive DNS server that receives less 200 queries per sec. As far as I 
can tell, this happens because of queries that timeout.

This should be reproducible by hitting your test recursor with a bunch
of queries where the authoritative DNS server for those queries is unreachable.

This particular fix (noted in the the upstream P2 release)
may alleviate the "too many open file descriptors" in my situation:

2399.   [bug]           Abort timeout queries to reduce the number of open
                        UDP sockets. [RT #18367]

Comment 10 Adam Tkac 2009-04-06 10:05:04 UTC
Rebased to 9.3.6-P1.

Comment 13 Adam Tkac 2009-06-26 13:26:56 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Cause:
=======
Fix for CVE-2008-1447 changed internal socket management logic which caused that named needs more sockets simulateously opened than before. System call select() can't handle such many opened descriptors.

Consequence:
============
When server load increases approximately to 1000 queries per second then messages like "too many open file descriptors" appear in the log and named fails to handle next incomming queries.

Fix:
====
Internal socket infrastructure has been reworked. System call select() is no longer used in favor of epoll infrastructure.

Result:
=======
BIND is able to handle far more bigger number of queries simulateously.

Comment 14 Adam Tkac 2009-06-26 13:31:11 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -12,4 +12,6 @@
 
 Result:
 =======
-BIND is able to handle far more bigger number of queries simulateously.+BIND is able to handle far more bigger number of queries simulateously.
+
+For complete list of bug fixes check /usr/share/doc/bind-*/CHANGES

Comment 15 Chris Ward 2009-07-03 18:05:47 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 19 errata-xmlrpc 2009-09-02 07:36:57 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1420.html


Note You need to log in before you can comment on or make changes to this bug.