Bug 472215 - qpidd rmgr::get_events() threw JERR__AIO: AIO error
qpidd rmgr::get_events() threw JERR__AIO: AIO error
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
urgent Severity high
: 1.1
: ---
Assigned To: Kim van der Riet
Kim van der Riet
Depends On:
  Show dependency treegraph
Reported: 2008-11-19 07:56 EST by Frantisek Reznicek
Modified: 2015-11-15 19:06 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-02-04 10:35:42 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2008-11-19 07:56:25 EST
Description of problem:

RHTS qpid_txtest_fails_bz458053 test triggered JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938)

RHTS run http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=36588
recipe 27623 test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 failed.

main transcript lays here:

on line 2434 can be found first failure {run(A) 70/200}.

Digging into details:
http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_txtest.transcript.log line 8812 shows jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error...

qpidd_txtest.transcript.log line 8812
'Queue cfb445aa5177ede791995-6: recoverMessages() failed: jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938)'

Corresponding journal can be found in:

Version-Release number of selected component (if applicable):
qpidd-0.3.714058-4.el4, rhm-0.3.2804-1.el4, libaio-0.3.105-2

How reproducible:

Steps to Reproduce:
1a. Schedule RHTS test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 on an RHEL4.7 x86_64 machine (hp-xw9400-02.rhts.bos.redhat.com)

1b. Analyze qpidd journal store here:

Actual results:
  run(A) 70/200 failed

Expected results:
  no failure

Additional info:
RHTS qpid_txtest_fails_bz458053 test results from
  (an RHEL4.7 x86_64 hp-xw9400-02.rhts.bos.redhat.com 2.6.9-78.ELsmp):

There is one more failure 'recoverMessages() failed: jexception 0x0900 rmgr::read() threw JERR_RMGR_UNKNOWNMAGIC' qpidd_txtest.transcript.log:11256 but I don' have journal for that.
Comment 1 Kim van der Riet 2008-11-20 08:54:13 EST
The read pipeline tries to read complete read pages, but when there is insufficient material to read, it will read whatever is available. However, as we are using O_DIRECT, we are constrained by disk softblock (sblk) boundaries of 512 bytes. Looking at the above, rsize=0x80 clearly violates this condition. Looking at the source, this condition may arise when the read pipeline catches up with the write pointer.

This is a logic error, modify by ensuring the readsize is floored to the closest sblk boundary.

This may be a difficult condition to reproduce as it arises based on dynamic asynchronous events in the read and write pipelines.
Comment 2 Kim van der Riet 2008-11-24 14:42:44 EST
Fixed in r.2875

QA:This was found by inspection of the code, but no known reproducer (other than blind chance and very small odds) exists. It should be sufficient to check that no regressions occur as a result of this checkin.
Comment 4 Frantisek Reznicek 2008-12-03 09:28:44 EST
6 long term qpid_test_transaction_integrity test instances on RHEL 5.2 / 4.7, i386 / x86_64 proves that issue has been fixed.
Validated on packages:rhm-0.3.2898-1.el5 qpidd-0.3.722122-2.el5
Comment 6 errata-xmlrpc 2009-02-04 10:35:42 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.