Bug 768407

Summary: C++ store module throws exception on EINTR from libaio
Product: Red Hat Enterprise MRG Reporter: Kim van der Riet <kim.vdriet>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED CURRENTRELEASE QA Contact: Frantisek Reznicek <freznice>
Severity: medium Docs Contact:
Priority: medium    
Version: DevelopmentCC: esammons, freznice, jross
Target Milestone: 2.1.2   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.14-5 Doc Type: Bug Fix
Doc Text:
Cause: Some libaio calls to disk may be interrupted by the kernel. When this happens, the error message EINTR is returned. This was not being handled correctly by the store. Consequence: The store would create an exception, causing the connection to close. Result:The store now handles this case correctly, and the connection is no longer closed as a result of EINTR.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Kim van der Riet 2011-12-16 15:16:35 UTC
It happens occasionally that the call to ::io_getevents() may return EINTR (interrupted system call). The normal response is to ignore this error and retry the call.

In the store's wmgr::get_events() call, however, this special handling is omitted, and the error code is being thrown. This causes the broker to close the connection needlessly. The rmgr::get_events() call does have the correct handling, however.

It would be a trivial fix to replicate the handling already in place in rmgr::get_events() in wmgr::get_events(), ie to ignore the error and return 0.

The man pages for the other libaio calls do not indicate that it is possible to return EINTR, so no such handling exists for these calls.

Comment 1 Kim van der Riet 2011-12-16 15:32:13 UTC
Fixed in r.4487

QE: This change has been tested for normal operation. However, I don't know how to force EINTR in libaio operations, and this makes it difficult to test for the positive case.

Comment 2 Kim van der Riet 2012-01-26 15:57:10 UTC
*** Bug 784890 has been marked as a duplicate of this bug. ***

Comment 3 Frantisek Reznicek 2012-01-30 09:05:34 UTC
Reproduce scenario of bug 784890 will be used for verification (long run of cluster_test/testset_cluster_tx).

Comment 4 Frantisek Reznicek 2012-02-24 11:28:40 UTC
Focused reproducers launched in soak mode, results in coming 4 days.

Comment 5 Frantisek Reznicek 2012-02-28 08:55:19 UTC
Issue is fixed, extensively tested on RHEL5.7 / 6.2 long-term (total test time slightly over 112 hours) cluster tests on packages:

  corosync-1.4.1-4.el6.x86_64
  corosynclib-1.4.1-4.el6.x86_64
  python-qpid-0.14-3.el6.noarch
  python-qpid-qmf-0.14-4.el6.x86_64
  python-saslwrapper-0.10-2.el6.x86_64
  qpid-cpp-client-0.14-6.el6.x86_64
  qpid-cpp-client-devel-0.14-6.el6.x86_64
  qpid-cpp-client-devel-docs-0.14-6.el6.noarch
  qpid-cpp-client-rdma-0.14-6.el6.x86_64
  qpid-cpp-client-ssl-0.14-6.el6.x86_64
  qpid-cpp-server-0.14-6.el6.x86_64
  qpid-cpp-server-cluster-0.14-6.el6.x86_64
  qpid-cpp-server-devel-0.14-6.el6.x86_64
  qpid-cpp-server-rdma-0.14-6.el6.x86_64
  qpid-cpp-server-ssl-0.14-6.el6.x86_64
  qpid-cpp-server-store-0.14-6.el6.x86_64
  qpid-cpp-server-xml-0.14-6.el6.x86_64
  qpid-java-client-0.14-2.el6.noarch
  qpid-java-common-0.14-2.el6.noarch
  qpid-java-example-0.14-2.el6.noarch
  qpid-qmf-0.14-4.el6.x86_64
  qpid-qmf-devel-0.14-4.el6.x86_64
  qpid-tests-0.14-1.el6.noarch
  qpid-tools-0.14-1.el6.noarch
  rh-qpid-cpp-tests-0.14-6.el6.x86_64
  ruby-qpid-qmf-0.14-4.el6.x86_64
  ruby-saslwrapper-0.10-2.el6.x86_64
  saslwrapper-0.10-2.el6.x86_64
  saslwrapper-devel-0.10-2.el6.x86_64
  sesame-1.0-2.el6.x86_64


-> VERIFIED

Comment 6 Kim van der Riet 2012-03-09 17:51:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Some libaio calls to disk may be interrupted by the kernel. When this happens, the error message EINTR is returned. This was not being handled correctly by the store.

Consequence: The store would create an exception, causing the connection to close.

Result:The store now handles this case correctly, and the connection is no longer closed as a result of EINTR.