Hide Forgot
It happens occasionally that the call to ::io_getevents() may return EINTR (interrupted system call). The normal response is to ignore this error and retry the call. In the store's wmgr::get_events() call, however, this special handling is omitted, and the error code is being thrown. This causes the broker to close the connection needlessly. The rmgr::get_events() call does have the correct handling, however. It would be a trivial fix to replicate the handling already in place in rmgr::get_events() in wmgr::get_events(), ie to ignore the error and return 0. The man pages for the other libaio calls do not indicate that it is possible to return EINTR, so no such handling exists for these calls.
Fixed in r.4487 QE: This change has been tested for normal operation. However, I don't know how to force EINTR in libaio operations, and this makes it difficult to test for the positive case.
*** Bug 784890 has been marked as a duplicate of this bug. ***
Reproduce scenario of bug 784890 will be used for verification (long run of cluster_test/testset_cluster_tx).
Focused reproducers launched in soak mode, results in coming 4 days.
Issue is fixed, extensively tested on RHEL5.7 / 6.2 long-term (total test time slightly over 112 hours) cluster tests on packages: corosync-1.4.1-4.el6.x86_64 corosynclib-1.4.1-4.el6.x86_64 python-qpid-0.14-3.el6.noarch python-qpid-qmf-0.14-4.el6.x86_64 python-saslwrapper-0.10-2.el6.x86_64 qpid-cpp-client-0.14-6.el6.x86_64 qpid-cpp-client-devel-0.14-6.el6.x86_64 qpid-cpp-client-devel-docs-0.14-6.el6.noarch qpid-cpp-client-rdma-0.14-6.el6.x86_64 qpid-cpp-client-ssl-0.14-6.el6.x86_64 qpid-cpp-server-0.14-6.el6.x86_64 qpid-cpp-server-cluster-0.14-6.el6.x86_64 qpid-cpp-server-devel-0.14-6.el6.x86_64 qpid-cpp-server-rdma-0.14-6.el6.x86_64 qpid-cpp-server-ssl-0.14-6.el6.x86_64 qpid-cpp-server-store-0.14-6.el6.x86_64 qpid-cpp-server-xml-0.14-6.el6.x86_64 qpid-java-client-0.14-2.el6.noarch qpid-java-common-0.14-2.el6.noarch qpid-java-example-0.14-2.el6.noarch qpid-qmf-0.14-4.el6.x86_64 qpid-qmf-devel-0.14-4.el6.x86_64 qpid-tests-0.14-1.el6.noarch qpid-tools-0.14-1.el6.noarch rh-qpid-cpp-tests-0.14-6.el6.x86_64 ruby-qpid-qmf-0.14-4.el6.x86_64 ruby-saslwrapper-0.10-2.el6.x86_64 saslwrapper-0.10-2.el6.x86_64 saslwrapper-devel-0.10-2.el6.x86_64 sesame-1.0-2.el6.x86_64 -> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Some libaio calls to disk may be interrupted by the kernel. When this happens, the error message EINTR is returned. This was not being handled correctly by the store. Consequence: The store would create an exception, causing the connection to close. Result:The store now handles this case correctly, and the connection is no longer closed as a result of EINTR.