Bug 474371
Summary: | qpidd+store exits on journal recovery because of 'Timeout waiting for AIO in MessageStoreImpl::recoverMessages()' | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> |
Component: | qpid-cpp | Assignee: | Kim van der Riet <kim.vdriet> |
Status: | CLOSED ERRATA | QA Contact: | Frantisek Reznicek <freznice> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 1.1 | CC: | esammons |
Target Milestone: | 1.1 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-02-04 15:36:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Frantisek Reznicek
2008-12-03 15:19:46 UTC
One more observation (RHEL5.2, x86_64) including trace qpidd logging stored in mrg3.lab.bos.redhat.com: /root/qpid_test_transaction_integrity_3fails_rhel5.2_x86_64_081204.tar.bz2 see qpid_test_transaction_integrity/qpid_test_transaction_integrity.log qpid_test_transaction_integrity/qpidd_txtest.transcript.log.a370 qpid_test_transaction_integrity/qpidd_journal_a0370.tar.bz2 Fixed in r.2939 This error was highlighted indirectly by the fix for Bug 472215. The rounding down of the read size to the nearest sblk boundary prevented the recover from reading the first dblk of a new page which was the last dblk containing data. However the deeper issue is that the initial recovery did not correct the sblk offset as it should by writing additional filler records until the dblk boundary was reached. This has been added, and solves this issue. QA - test as follows: 1. By extracting the journal from queue 1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10 and recovering from it - the recovery should time out without the fix and be normal with the fix in place. In addition, with the fix and running with --log-enable info+, you should see the following messages: 2008-dec-05 15:48:40 warning Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Bad record alignment found at fid=0x2 offs=0x158280 (likely journal overwrite boundary); 3 filler record(s) required. 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158280 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158300 2008-dec-05 15:48:40 notice Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Recover phase write: Wrote filler record at offs=0x158380 2008-dec-05 15:48:40 info Journal "1e2214bf2e87d528f104c6fea6b0dc5401a6d6df987dff5a63b1cf3338bfe3a5-10": Bad record alignment fixed. 2. Check for regressions and make sure that a) no further occurrences of this bug; and b) nothing else was broken. Manually and automatically (via qpid_test_transaction_integrity test) validated that issue has been fixed on RHEL5.2 / 4.7 i386 / x86_64 on packages: qpidd-0.4.725652-2.el5, rhm-0.4.2970-1.el5 ->VERIFIED An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html |