Description of problem: RHTS qpid_txtest_fails_bz458053 test triggered JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938) RHTS run http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=36588 recipe 27623 test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 failed. main transcript lays here: http://rhts.redhat.com/testlogs/36588/127623/1080066/TESTOUT.log on line 2434 can be found first failure {run(A) 70/200}. Digging into details: http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_txtest.transcript.log line 8812 shows jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error... qpidd_txtest.transcript.log line 8812 'Queue cfb445aa5177ede791995-6: recoverMessages() failed: jexception 0x0103 rmgr::get_events() threw JERR__AIO: AIO error. (AIO read operation failed: Invalid argument (-22) [pg=2 buf=0x2a97288200 rsize=0x80 offset=0x120200 fh=78]) (MessageStoreImpl.cpp:938)' Corresponding journal can be found in: http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_journal_a0070.tar.bz2 Version-Release number of selected component (if applicable): qpidd-0.3.714058-4.el4, rhm-0.3.2804-1.el4, libaio-0.3.105-2 How reproducible: unknown Steps to Reproduce: 1a. Schedule RHTS test /distribution/MRG_Messaging/qpid_txtest_fails_bz458053 on an RHEL4.7 x86_64 machine (hp-xw9400-02.rhts.bos.redhat.com) 1b. Analyze qpidd journal store here: http://rhts.redhat.com/testlogs/36588/127623/1080066/qpidd_journal_a0070.tar.bz2 Actual results: run(A) 70/200 failed Expected results: no failure Additional info: RHTS qpid_txtest_fails_bz458053 test results from (an RHEL4.7 x86_64 hp-xw9400-02.rhts.bos.redhat.com 2.6.9-78.ELsmp): There is one more failure 'recoverMessages() failed: jexception 0x0900 rmgr::read() threw JERR_RMGR_UNKNOWNMAGIC' qpidd_txtest.transcript.log:11256 but I don' have journal for that.
The read pipeline tries to read complete read pages, but when there is insufficient material to read, it will read whatever is available. However, as we are using O_DIRECT, we are constrained by disk softblock (sblk) boundaries of 512 bytes. Looking at the above, rsize=0x80 clearly violates this condition. Looking at the source, this condition may arise when the read pipeline catches up with the write pointer. This is a logic error, modify by ensuring the readsize is floored to the closest sblk boundary. This may be a difficult condition to reproduce as it arises based on dynamic asynchronous events in the read and write pipelines.
Fixed in r.2875 QA:This was found by inspection of the code, but no known reproducer (other than blind chance and very small odds) exists. It should be sufficient to check that no regressions occur as a result of this checkin.
6 long term qpid_test_transaction_integrity test instances on RHEL 5.2 / 4.7, i386 / x86_64 proves that issue has been fixed. Validated on packages:rhm-0.3.2898-1.el5 qpidd-0.3.722122-2.el5 ->VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html