Bug 768407 - C++ store module throws exception on EINTR from libaio
Summary: C++ store module throws exception on EINTR from libaio
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 2.1.2
: ---
Assignee: Kim van der Riet
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
: 784890 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-16 15:16 UTC by Kim van der Riet
Modified: 2018-11-26 18:36 UTC (History)
3 users (show)

Fixed In Version: qpid-cpp-mrg-0.14-5
Doc Type: Bug Fix
Doc Text:
Cause: Some libaio calls to disk may be interrupted by the kernel. When this happens, the error message EINTR is returned. This was not being handled correctly by the store. Consequence: The store would create an exception, causing the connection to close. Result:The store now handles this case correctly, and the connection is no longer closed as a result of EINTR.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Kim van der Riet 2011-12-16 15:16:35 UTC
It happens occasionally that the call to ::io_getevents() may return EINTR (interrupted system call). The normal response is to ignore this error and retry the call.

In the store's wmgr::get_events() call, however, this special handling is omitted, and the error code is being thrown. This causes the broker to close the connection needlessly. The rmgr::get_events() call does have the correct handling, however.

It would be a trivial fix to replicate the handling already in place in rmgr::get_events() in wmgr::get_events(), ie to ignore the error and return 0.

The man pages for the other libaio calls do not indicate that it is possible to return EINTR, so no such handling exists for these calls.

Comment 1 Kim van der Riet 2011-12-16 15:32:13 UTC
Fixed in r.4487

QE: This change has been tested for normal operation. However, I don't know how to force EINTR in libaio operations, and this makes it difficult to test for the positive case.

Comment 2 Kim van der Riet 2012-01-26 15:57:10 UTC
*** Bug 784890 has been marked as a duplicate of this bug. ***

Comment 3 Frantisek Reznicek 2012-01-30 09:05:34 UTC
Reproduce scenario of bug 784890 will be used for verification (long run of cluster_test/testset_cluster_tx).

Comment 4 Frantisek Reznicek 2012-02-24 11:28:40 UTC
Focused reproducers launched in soak mode, results in coming 4 days.

Comment 5 Frantisek Reznicek 2012-02-28 08:55:19 UTC
Issue is fixed, extensively tested on RHEL5.7 / 6.2 long-term (total test time slightly over 112 hours) cluster tests on packages:

  corosync-1.4.1-4.el6.x86_64
  corosynclib-1.4.1-4.el6.x86_64
  python-qpid-0.14-3.el6.noarch
  python-qpid-qmf-0.14-4.el6.x86_64
  python-saslwrapper-0.10-2.el6.x86_64
  qpid-cpp-client-0.14-6.el6.x86_64
  qpid-cpp-client-devel-0.14-6.el6.x86_64
  qpid-cpp-client-devel-docs-0.14-6.el6.noarch
  qpid-cpp-client-rdma-0.14-6.el6.x86_64
  qpid-cpp-client-ssl-0.14-6.el6.x86_64
  qpid-cpp-server-0.14-6.el6.x86_64
  qpid-cpp-server-cluster-0.14-6.el6.x86_64
  qpid-cpp-server-devel-0.14-6.el6.x86_64
  qpid-cpp-server-rdma-0.14-6.el6.x86_64
  qpid-cpp-server-ssl-0.14-6.el6.x86_64
  qpid-cpp-server-store-0.14-6.el6.x86_64
  qpid-cpp-server-xml-0.14-6.el6.x86_64
  qpid-java-client-0.14-2.el6.noarch
  qpid-java-common-0.14-2.el6.noarch
  qpid-java-example-0.14-2.el6.noarch
  qpid-qmf-0.14-4.el6.x86_64
  qpid-qmf-devel-0.14-4.el6.x86_64
  qpid-tests-0.14-1.el6.noarch
  qpid-tools-0.14-1.el6.noarch
  rh-qpid-cpp-tests-0.14-6.el6.x86_64
  ruby-qpid-qmf-0.14-4.el6.x86_64
  ruby-saslwrapper-0.10-2.el6.x86_64
  saslwrapper-0.10-2.el6.x86_64
  saslwrapper-devel-0.10-2.el6.x86_64
  sesame-1.0-2.el6.x86_64


-> VERIFIED

Comment 6 Kim van der Riet 2012-03-09 17:51:04 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: Some libaio calls to disk may be interrupted by the kernel. When this happens, the error message EINTR is returned. This was not being handled correctly by the store.

Consequence: The store would create an exception, causing the connection to close.

Result:The store now handles this case correctly, and the connection is no longer closed as a result of EINTR.


Note You need to log in before you can comment on or make changes to this bug.