Bug 807670 - Filedescriptor out of range in select
Summary: Filedescriptor out of range in select
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: python-qpid
Version: 2.1.2
Hardware: All
OS: All
high
high
Target Milestone: 2.5.1
: ---
Assignee: Ken Giusti
QA Contact: Leonid Zhaldybin
URL:
Whiteboard:
Depends On:
Blocks: 1076470
TreeView+ depends on / blocked
 
Reported: 2012-03-28 12:59 UTC by Petr Matousek
Modified: 2018-12-05 15:30 UTC (History)
11 users (show)

Fixed In Version: python-qpid-0.18-10.el7
Doc Type: Bug Fix
Doc Text:
Previously, the qpid-python client was written to use select() to manage file descriptors. A known limitation with select() caused issues for processes that opened more than 1024 files (for example, a web server). If file descriptors had numerical values above 1024, and qpid-python allocated a descriptor of this value, the call to select() would fail and the connection would drop. The fix introduces modifications to qpid-python to use poll() instead of select(), which removes the numerical file descriptor limitation and fixes the issue.
Clone Of:
: 1076470 (view as bug list)
Environment:
Last Closed: 2014-06-30 10:25:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
terminal transcript (9.39 KB, text/plain)
2013-01-23 17:21 UTC, Petr Matousek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Apache JIRA QPID-5588 0 None None None Never
Red Hat Product Errata RHBA-2014:0804 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 2 update 2014-06-30 14:25:19 UTC

Description Petr Matousek 2012-03-28 12:59:24 UTC
Description of problem:
Following unit tests from endpoints.py are currently failing with qpid-cpp-mrg-0.14-14:
qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks
qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks

The tests are failing by opening broker connection due to ValueError: filedescriptor out of range in select()

please see additional info for more details

This was seen on rhel5 and rhel6 (x86_64 & i386)

Version-Release number of selected component (if applicable):
python-qpid-0.14-6.el5
python-qpid-qmf-0.14-4.el5
qpid-cpp-client-0.14-14.el5
qpid-cpp-client-devel-0.14-14.el5
qpid-cpp-client-devel-docs-0.14-14.el5
qpid-cpp-client-rdma-0.14-14.el5
qpid-cpp-client-ssl-0.14-14.el5
qpid-cpp-mrg-debuginfo-0.14-14.el5
qpid-cpp-server-0.14-14.el5
qpid-cpp-server-cluster-0.14-14.el5
qpid-cpp-server-devel-0.14-14.el5
qpid-cpp-server-rdma-0.14-14.el5
qpid-cpp-server-ssl-0.14-14.el5
qpid-cpp-server-store-0.14-14.el5
qpid-cpp-server-xml-0.14-14.el5
qpid-java-client-0.14-3.el5
qpid-java-common-0.14-3.el5
qpid-java-example-0.14-3.el5
qpid-qmf-0.14-4.el5
qpid-qmf-debuginfo-0.14-4.el5
qpid-qmf-devel-0.14-4.el5
qpid-tests-0.14-1.el5
qpid-tools-0.14-1.el5
rh-qpid-cpp-tests-0.14-14.el5
ruby-qpid-qmf-0.14-4.el5

How reproducible:
100%

Steps to Reproduce:
1. start the broker and execute the tests:
# qpid-python-test --broker localhost:5672 'qpid.tests.messaging.endpoints.SetupTests.testOpen*Leaks'
  
Actual results:
two unit tests from python-qpid are currently failing

Expected results:
No error occurs by running these unit tests, result is pass  
or 
explanation provided, test removed/skipped

Additional info:

qpid-python-test --broker localhost:5672 'qpid.tests.messaging.endpoints.SetupTests.testOpen*Leaks'
qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks ................................................................................................................................... start
  Exception in thread Thread-1:
  Traceback (most recent call last):
    File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
      self.run()
    File "/usr/lib64/python2.4/threading.py", line 422, in run
      self.__target(*self.__args, **self.__kwargs)
    File "/usr/lib/python2.4/site-packages/qpid/selector.py", line 119, in run
      rd, wr, ex = select(self.reading, self.writing, (), timeout)
  ValueError: filedescriptor out of range in select()
  
qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks ................................................................................................................................... fail
Error during test:
  Traceback (most recent call last):
    File "/usr/bin/qpid-python-test", line 311, in run
      phase()
    File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/endpoints.py", line 87, in testOpenCloseResourceLeaks
      conn = Connection.establish(self.broker, **self.connection_options())
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 68, in establish
      conn.open()
    File "<string>", line 6, in open
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 244, in open
      self.attach()
    File "<string>", line 6, in attach
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 262, in attach
      self._ewait(lambda: self._transport_connected and not self._unlinked())
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait
      result = self._wait(lambda: self.error or predicate(), timeout)
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 181, in _wait
      return self._waiter.wait(predicate, timeout=timeout)
    File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 57, in wait
      self.condition.wait(3)
    File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 96, in wait
      sw.wait(timeout)
    File "/usr/lib/python2.4/site-packages/qpid/compat.py", line 53, in wait
      ready, _, _ = select([self], [], [], timeout)
  ValueError: filedescriptor out of range in select()
qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks .................................................................................................................................... fail
Error during test:
  Traceback (most recent call last):
    File "/usr/bin/qpid-python-test", line 311, in run
      phase()
    File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/endpoints.py", line 103, in testOpenFailResourceLeaks
      conn._wait(lambda: False, timeout=0.001)
    File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 181, in _wait
      return self._waiter.wait(predicate, timeout=timeout)
    File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 59, in wait
      self.condition.wait(timeout - passed)
    File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 96, in wait
      sw.wait(timeout)
    File "/usr/lib/python2.4/site-packages/qpid/compat.py", line 53, in wait
      ready, _, _ = select([self], [], [], timeout)
  ValueError: filedescriptor out of range in select()
Totals: 2 tests, 0 passed, 0 skipped, 0 ignored, 2 failed

Comment 2 Justin Ross 2012-11-06 20:34:35 UTC
Petr, is this still an issue with 0.18?

Comment 3 Petr Matousek 2013-01-23 17:17:30 UTC
Justin, I missed your needinfo question, my apologize.

I've investigated a bit in this issue and I realize that the test failures are caused by the fact that our QE test wrapper resets the maximum number of open file descriptors to 16384. When system default value (1024) is used the tests are 
passing. By further testing I realized that the tests starts to fail when the maximum number of open file descriptors is set to the number higher than 1052 (which I found as a bit strange number).

In detail:
qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks starts to fail when 'ulimit -n' is set to 1053 or higher 
qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks starts to report an error when 'ulimit -n' is set to 1054-1060 (but is passing), and starts to fail with value 1061 and higher.

This is valid across all supported OS's x architectures.

I can easily update our QE test to use the default system value of max open fds for these unit tests, but I'm not sure if this is not a defect. Can someone assess, please.

Comment 4 Petr Matousek 2013-01-23 17:21:35 UTC
Created attachment 686086 [details]
terminal transcript

please see the terminal transcript for details

Comment 6 Justin Ross 2013-02-21 22:32:20 UTC
(In reply to comment #3)
> Justin, I missed your needinfo question, my apologize.
> 
> I've investigated a bit in this issue and I realize that the test failures
> are caused by the fact that our QE test wrapper resets the maximum number of
> open file descriptors to 16384. When system default value (1024) is used the
> tests are 
> passing. By further testing I realized that the tests starts to fail when
> the maximum number of open file descriptors is set to the number higher than
> 1052 (which I found as a bit strange number).
> 
> In detail:
> qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks starts
> to fail when 'ulimit -n' is set to 1053 or higher 
> qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks starts
> to report an error when 'ulimit -n' is set to 1054-1060 (but is passing),
> and starts to fail with value 1061 and higher.
> 
> This is valid across all supported OS's x architectures.
> 
> I can easily update our QE test to use the default system value of max open
> fds for these unit tests, but I'm not sure if this is not a defect. Can
> someone assess, please.

Petr, that's pretty weird!  And worth investigating.  I'm going to keep this as an issue for 2.4.

Comment 9 Pavel Moravec 2014-02-27 17:07:45 UTC
This limits python client to max 1024 endpoints and there is already a real user case behind this.

class BaseWaiter in /usr/lib/python2.6/site-packages/qpid/compat.py must use epoll instead of select method. As the select() system call uses a fixed size buffer of size equal to the FD_SETSIZE kernel constant. On RHEL this is set to 1024.

Bumping severity/priority and notifying engineering via mail.

Comment 10 Ken Giusti 2014-02-28 19:00:23 UTC
Candidate fix pushed upstream:

https://svn.apache.org/viewvc?view=revision&revision=r1573028

Comment 16 Leonid Zhaldybin 2014-06-04 14:20:22 UTC
This issue has been fixed on RHEL6. On RHEL5, the issue is on python side, this cannot be fixed on MRG/Messaging side.

Packages used for testing:

python-qpid-0.18-11
python-qpid-qmf-0.18-23
qpid-cpp-0.18-23
qpid-qmf-0.18-23
qpid-tests-0.18-2
qpid-tools-0.18-10

-> VERIFIED

Comment 19 errata-xmlrpc 2014-06-30 10:25:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0804.html


Note You need to log in before you can comment on or make changes to this bug.