Bug 462046

Summary: Automate bdb 'recover' routine on retsrating crashed broker
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED ERRATA QA Contact: Kim van der Riet <kim.vdriet>
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: freznice
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 15:36:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2008-09-12 08:08:19 UTC
It may be possible to do this on every restart.

Comment 1 Gordon Sim 2008-09-12 08:09:46 UTC
Kim, I've assigned this to you to investigate when you get done with 1.0.1 issues, is that ok? If not assign back to me.

Comment 2 Kim van der Riet 2008-10-29 18:04:37 UTC
The db4 BD_RECOVER flag was added to the DbEnv::open() which not only recovers the database if it is corrupted, but also upgrades the database if it is from a previous version of db4. However, to do this, the creation of the BDB databases have to be delayed until after the recover. As these were previously initialized in the constructor of the MessageStoreImpl constructor, I had to change them to pointers and create them in MessageStoreImpl::init() after the recovery is complete.

Note that there are two levels of recover, DB_RECOVER and DB_RECOVER_FATAL. At present the former is used and works against the test corrupted database; testing will reveal if there is any requirement to use the latter.

Fixed in r.2693.
Updated RHEL4 patch to match in r.2694.

To test, run the txtest soak test without the call to db_recover (currently it is one of the steps in the test script prior to restarting the broker). However, this will not give confirmation that an actual corrupted database was overcome, as the recover is silent.

It may be useful to run this test against a version prior to this fix and check that several errors are encountered in a typical soak run. Then running the same test against this new version with other conditions unchanged should yield no errors due to corruption.

Comment 3 Frantisek Reznicek 2008-11-11 13:19:04 UTC
RHTS test qpid_start_fails_bz458466 (db_recover is disabled now) proves that issue is fixed now. Validated on RHEL4.7/5.2 i386/x86_64 using packages:
qpidd-0.3.712127-4.el4/5 and rhm-0.3.2759-2.el4/5
->VERIFIED

Comment 5 errata-xmlrpc 2009-02-04 15:36:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html