| Summary: | Filedescriptor out of range in select | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Petr Matousek <pematous> | ||||
| Component: | python-qpid | Assignee: | Ken Giusti <kgiusti> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Leonid Zhaldybin <lzhaldyb> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 2.1.2 | CC: | esammons, ftsiadim, iboverma, jross, kgiusti, lzhaldyb, mcressma, mmahudha, pmoravec, rafaels, tross | ||||
| Target Milestone: | 2.5.1 | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | python-qpid-0.18-10.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Previously, the qpid-python client was written to use select() to manage file descriptors. A known limitation with select() caused issues for processes that opened more than 1024 files (for example, a web server). If file descriptors had numerical values above 1024, and qpid-python allocated a descriptor of this value, the call to select() would fail and the connection would drop. The fix introduces modifications to qpid-python to use poll() instead of select(), which removes the numerical file descriptor limitation and fixes the issue.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1076470 (view as bug list) | Environment: | |||||
| Last Closed: | 2014-06-30 10:25:55 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1076470 | ||||||
| Attachments: |
|
||||||
Petr, is this still an issue with 0.18? Justin, I missed your needinfo question, my apologize. I've investigated a bit in this issue and I realize that the test failures are caused by the fact that our QE test wrapper resets the maximum number of open file descriptors to 16384. When system default value (1024) is used the tests are passing. By further testing I realized that the tests starts to fail when the maximum number of open file descriptors is set to the number higher than 1052 (which I found as a bit strange number). In detail: qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks starts to fail when 'ulimit -n' is set to 1053 or higher qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks starts to report an error when 'ulimit -n' is set to 1054-1060 (but is passing), and starts to fail with value 1061 and higher. This is valid across all supported OS's x architectures. I can easily update our QE test to use the default system value of max open fds for these unit tests, but I'm not sure if this is not a defect. Can someone assess, please. Created attachment 686086 [details]
terminal transcript
please see the terminal transcript for details
(In reply to comment #3) > Justin, I missed your needinfo question, my apologize. > > I've investigated a bit in this issue and I realize that the test failures > are caused by the fact that our QE test wrapper resets the maximum number of > open file descriptors to 16384. When system default value (1024) is used the > tests are > passing. By further testing I realized that the tests starts to fail when > the maximum number of open file descriptors is set to the number higher than > 1052 (which I found as a bit strange number). > > In detail: > qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks starts > to fail when 'ulimit -n' is set to 1053 or higher > qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks starts > to report an error when 'ulimit -n' is set to 1054-1060 (but is passing), > and starts to fail with value 1061 and higher. > > This is valid across all supported OS's x architectures. > > I can easily update our QE test to use the default system value of max open > fds for these unit tests, but I'm not sure if this is not a defect. Can > someone assess, please. Petr, that's pretty weird! And worth investigating. I'm going to keep this as an issue for 2.4. This limits python client to max 1024 endpoints and there is already a real user case behind this. class BaseWaiter in /usr/lib/python2.6/site-packages/qpid/compat.py must use epoll instead of select method. As the select() system call uses a fixed size buffer of size equal to the FD_SETSIZE kernel constant. On RHEL this is set to 1024. Bumping severity/priority and notifying engineering via mail. Candidate fix pushed upstream: https://svn.apache.org/viewvc?view=revision&revision=r1573028 This issue has been fixed on RHEL6. On RHEL5, the issue is on python side, this cannot be fixed on MRG/Messaging side. Packages used for testing: python-qpid-0.18-11 python-qpid-qmf-0.18-23 qpid-cpp-0.18-23 qpid-qmf-0.18-23 qpid-tests-0.18-2 qpid-tools-0.18-10 -> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0804.html |
Description of problem: Following unit tests from endpoints.py are currently failing with qpid-cpp-mrg-0.14-14: qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks The tests are failing by opening broker connection due to ValueError: filedescriptor out of range in select() please see additional info for more details This was seen on rhel5 and rhel6 (x86_64 & i386) Version-Release number of selected component (if applicable): python-qpid-0.14-6.el5 python-qpid-qmf-0.14-4.el5 qpid-cpp-client-0.14-14.el5 qpid-cpp-client-devel-0.14-14.el5 qpid-cpp-client-devel-docs-0.14-14.el5 qpid-cpp-client-rdma-0.14-14.el5 qpid-cpp-client-ssl-0.14-14.el5 qpid-cpp-mrg-debuginfo-0.14-14.el5 qpid-cpp-server-0.14-14.el5 qpid-cpp-server-cluster-0.14-14.el5 qpid-cpp-server-devel-0.14-14.el5 qpid-cpp-server-rdma-0.14-14.el5 qpid-cpp-server-ssl-0.14-14.el5 qpid-cpp-server-store-0.14-14.el5 qpid-cpp-server-xml-0.14-14.el5 qpid-java-client-0.14-3.el5 qpid-java-common-0.14-3.el5 qpid-java-example-0.14-3.el5 qpid-qmf-0.14-4.el5 qpid-qmf-debuginfo-0.14-4.el5 qpid-qmf-devel-0.14-4.el5 qpid-tests-0.14-1.el5 qpid-tools-0.14-1.el5 rh-qpid-cpp-tests-0.14-14.el5 ruby-qpid-qmf-0.14-4.el5 How reproducible: 100% Steps to Reproduce: 1. start the broker and execute the tests: # qpid-python-test --broker localhost:5672 'qpid.tests.messaging.endpoints.SetupTests.testOpen*Leaks' Actual results: two unit tests from python-qpid are currently failing Expected results: No error occurs by running these unit tests, result is pass or explanation provided, test removed/skipped Additional info: qpid-python-test --broker localhost:5672 'qpid.tests.messaging.endpoints.SetupTests.testOpen*Leaks' qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks ................................................................................................................................... start Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.4/site-packages/qpid/selector.py", line 119, in run rd, wr, ex = select(self.reading, self.writing, (), timeout) ValueError: filedescriptor out of range in select() qpid.tests.messaging.endpoints.SetupTests.testOpenCloseResourceLeaks ................................................................................................................................... fail Error during test: Traceback (most recent call last): File "/usr/bin/qpid-python-test", line 311, in run phase() File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/endpoints.py", line 87, in testOpenCloseResourceLeaks conn = Connection.establish(self.broker, **self.connection_options()) File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 68, in establish conn.open() File "<string>", line 6, in open File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 244, in open self.attach() File "<string>", line 6, in attach File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 262, in attach self._ewait(lambda: self._transport_connected and not self._unlinked()) File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait result = self._wait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 181, in _wait return self._waiter.wait(predicate, timeout=timeout) File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 57, in wait self.condition.wait(3) File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 96, in wait sw.wait(timeout) File "/usr/lib/python2.4/site-packages/qpid/compat.py", line 53, in wait ready, _, _ = select([self], [], [], timeout) ValueError: filedescriptor out of range in select() qpid.tests.messaging.endpoints.SetupTests.testOpenFailResourceLeaks .................................................................................................................................... fail Error during test: Traceback (most recent call last): File "/usr/bin/qpid-python-test", line 311, in run phase() File "/usr/lib/python2.4/site-packages/qpid/tests/messaging/endpoints.py", line 103, in testOpenFailResourceLeaks conn._wait(lambda: False, timeout=0.001) File "/usr/lib/python2.4/site-packages/qpid/messaging/endpoints.py", line 181, in _wait return self._waiter.wait(predicate, timeout=timeout) File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 59, in wait self.condition.wait(timeout - passed) File "/usr/lib/python2.4/site-packages/qpid/concurrency.py", line 96, in wait sw.wait(timeout) File "/usr/lib/python2.4/site-packages/qpid/compat.py", line 53, in wait ready, _, _ = select([self], [], [], timeout) ValueError: filedescriptor out of range in select() Totals: 2 tests, 0 passed, 0 skipped, 0 ignored, 2 failed