qpidd with store plugin loaded fails i.e. exits during the journal recovery with message (exit code=1): 'critical Broker start-up failed: Timeout waiting for AIO in MessageStoreImpl::recoverMessages() (MessageStoreImpl.cpp:930)' Data stored in: mrg3.lab.bos.redhat.com:/root/qpid_test_transaction_integrity_105fails_rhel5.2_x86_64_081203.tar.bz2 qpid_test_transaction_integrity/qpid_test_transaction_integrity.log:2700 qpid_test_transaction_integrity/qpidd_txtest.transcript.log:13333 qpid_test_transaction_integrity/qpidd_journal_a0088.tar.bz2 Version-Release number of selected component (if applicable): qpidd rev. 722122 / rhm rev. 2898 How reproducible: sporadic Steps to Reproduce: 1. run qpid_test_transaction_integrity 2. check qpidd_txtest.transcript.log for message 3. go-to 1. if not found otherwise analyze stored journals Actual results: 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/xml.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so 2008-dec-02 08:36:32 warning SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it. 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/acl.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so 2008-dec-02 08:36:32 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so 2008-dec-02 08:36:32 info Management enabled 2008-dec-02 08:36:32 notice Journal "TplStore": Created 2008-dec-02 08:36:32 notice Store module initialized; dir=/root/qpid_test_transaction_integrity/rhts_qpidd/20081202_060832/broker.3213 2008-dec-02 08:36:32 info > Default files per journal: 8 2008-dec-02 08:36:32 info > Default journal file size: 24 (wpgs) 2008-dec-02 08:36:32 info > Default write cache page size: 32 (Kib) 2008-dec-02 08:36:32 info > Default number of write cache pages: 32 2008-dec-02 08:36:32 info > TPL files per journal: 8 2008-dec-02 08:36:32 info > TPL journal file size: 24 (wpgs) 2008-dec-02 08:36:32 info > TPL write cache page size: 4 (Kib) 2008-dec-02 08:36:32 info > TPL number of write cache pages: 64 2008-dec-02 08:36:33 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-1": Created 2008-dec-02 08:36:33 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-1": 10661 messages recovered; 0 messages in-doubt. 2008-dec-02 08:36:33 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-2": Created 2008-dec-02 08:36:33 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-2": 612 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:33 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-3": Created 2008-dec-02 08:36:33 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-3": 204 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:33 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-4": Created 2008-dec-02 08:36:34 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-4": 102 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:34 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-5": Created 2008-dec-02 08:36:34 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-5": 204 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:34 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-6": Created 2008-dec-02 08:36:34 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-6": 306 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:34 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-7": Created 2008-dec-02 08:36:34 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-7": 408 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:34 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-8": Created 2008-dec-02 08:36:34 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-8": 204 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:34 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-9": Created 2008-dec-02 08:36:35 info Recovered queue "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-9": 306 messages recovered; 0 messages in -doubt. 2008-dec-02 08:36:35 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Created 2008-dec-02 08:36:37 critical Broker start-up failed: Timeout waiting for AIO in MessageStoreImpl::recoverMessages() (MessageStoreImpl.cpp:930) Expected results: Successful journal recovery Additional info: Data stored in: mrg3.lab.bos.redhat.com: /root/qpid_test_transaction_integrity_105fails_rhel5.2_x86_64_081203.tar.bz2 another failure occurrences are stored in mrg3.lab.bos.redhat.com: /root/qpid_test_transaction_integrity_56fails_rhel4.7_x86_64_081203.tar.bz2
One more observation (RHEL5.2, x86_64) including trace qpidd logging stored in mrg3.lab.bos.redhat.com: /root/qpid_test_transaction_integrity_3fails_rhel5.2_x86_64_081204.tar.bz2 see qpid_test_transaction_integrity/qpid_test_transaction_integrity.log qpid_test_transaction_integrity/qpidd_txtest.transcript.log.a370 qpid_test_transaction_integrity/qpidd_journal_a0370.tar.bz2
Fixed in r.2939 This error was highlighted indirectly by the fix for Bug 472215. The rounding down of the read size to the nearest sblk boundary prevented the recover from reading the first dblk of a new page which was the last dblk containing data. However the deeper issue is that the initial recovery did not correct the sblk offset as it should by writing additional filler records until the dblk boundary was reached. This has been added, and solves this issue. QA - test as follows: 1. By extracting the journal from queue 1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10 and recovering from it - the recovery should time out without the fix and be normal with the fix in place. In addition, with the fix and running with --log-enable info+, you should see the following messages: 2008-dec-05 15:48:40 warning Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Bad record alignment found at fid=0x2 offs=0x158280 (likely journal overwrite boundary); 3 filler record(s) required. 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158280 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158300 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158380 2008-dec-05 15:48:40 info Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Bad record alignment fixed. 2. Check for regressions and make sure that a) no further occurrences of this bug; and b) nothing else was broken.
Manually and automatically (via qpid_test_transaction_integrity test) validated that issue has been fixed on RHEL5.2 / 4.7 i386 / x86_64 on packages: qpidd-0.4.725652-2.el5, rhm-0.4.2970-1.el5 ->VERIFIED
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html