Bug 472215 - qpidd rmgr::get_events() threw JERR__AIO: AIO error
Summary: qpidd rmgr::get_events() threw JERR__AIO: AIO error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.1
: ---
Assignee: Kim van der Riet
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-19 12:56 UTC by Frantisek Reznicek
Modified: 2015-11-16 00:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 15:35:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0035 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 1.1 Release 2009-02-04 15:33:44 UTC

Description Frantisek Reznicek 2008-11-19 12:56:25 UTC
Description of problem:

RHTS qpid_txtest_fails_bz458053 test triggered JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938)

RHTS run http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=36588
recipe 27623 test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 failed.

main transcript lays here:
http://rhts.redhat.com/testlogs/36588/127623/1080066/TESTOUT.log

on line 2434 can be found first failure {run(A) 70/200}.

Digging into details:
http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_txtest.transcript.log line 8812 shows jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error...

qpidd_txtest.transcript.log line 8812
'Queue cfb445aa5177ede791995-6: recoverMessages() failed: jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938)'

Corresponding journal can be found in:
http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_journal_a0070.tar.bz2




Version-Release number of selected component (if applicable):
qpidd-0.3.714058-4.el4, rhm-0.3.2804-1.el4, libaio-0.3.105-2


How reproducible:
unknown

Steps to Reproduce:
1a. Schedule RHTS test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 on an RHEL4.7 x86_64 machine (hp-xw9400-02.rhts.bos.redhat.com)

1b. Analyze qpidd journal store here:
  http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_journal_a0070.tar.bz2

Actual results:
  run(A) 70/200 failed

Expected results:
  no failure

Additional info:
RHTS qpid_txtest_fails_bz458053 test results from
  (an RHEL4.7 x86_64 hp-xw9400-02.rhts.bos.redhat.com 2.6.9-78.ELsmp):

There is one more failure 'recoverMessages() failed: jexception 0x0900 rmgr::read() threw JERR_RMGR_UNKNOWNMAGIC' qpidd_txtest.transcript.log:11256 but I don' have journal for that.

Comment 1 Kim van der Riet 2008-11-20 13:54:13 UTC
The read pipeline tries to read complete read pages, but when there is insufficient material to read, it will read whatever is available. However, as we are using O_DIRECT, we are constrained by disk softblock (sblk) boundaries of 512 bytes. Looking at the above, rsize=0x80 clearly violates this condition. Looking at the source, this condition may arise when the read pipeline catches up with the write pointer.

This is a logic error, modify by ensuring the readsize is floored to the closest sblk boundary.

This may be a difficult condition to reproduce as it arises based on dynamic asynchronous events in the read and write pipelines.

Comment 2 Kim van der Riet 2008-11-24 19:42:44 UTC
Fixed in r.2875

QA:This was found by inspection of the code, but no known reproducer (other than blind chance and very small odds) exists. It should be sufficient to check that no regressions occur as a result of this checkin.

Comment 4 Frantisek Reznicek 2008-12-03 14:28:44 UTC
6 long term qpid_test_transaction_integrity test instances on RHEL 5.2 / 4.7, i386 / x86_64 proves that issue has been fixed.
Validated on packages:rhm-0.3.2898-1.el5 qpidd-0.3.722122-2.el5
->VERIFIED

Comment 6 errata-xmlrpc 2009-02-04 15:35:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html


Note You need to log in before you can comment on or make changes to this bug.