Bug 487214
Summary: | upgrade node to 5.3, openais dies after trying to join 5.2 cluster | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Shane Bradley <sbradley> | |
Component: | openais | Assignee: | Steven Dake <sdake> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | medium | Docs Contact: | ||
Priority: | urgent | |||
Version: | 5.3 | CC: | cbuissar, cluster-maint, cward, davdunc, edamato, mgoulish, schlegel, sghosh, shota.a, slords, tao | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openais-0.80.5-4.el5_4 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 489445 (view as bug list) | Environment: | ||
Last Closed: | 2009-09-02 11:29:26 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 489445, 490307, 509894 |
Description
Shane Bradley
2009-02-24 20:20:37 UTC
I took a fresh install of 3 nodes. lowest node id was 5.3, second and third highest node id are 5.2. I started cman on 5.2 nodes with service cman start. then I started cman on 5.3 node with no segfault or crash. Could you be more specific in the steps you take to reproduce the issue? The part about the platform of the lowest node id is important since it indicates who is responsible for synchronization. Regards -steve Well I'm embarrassed to say this defect shouldn't have made it through engineering unit testing but it still doesn't reproduce on my hardware. Thanks for access to your test cluster. Anyway, I have a patch for the problem. Immediately clone a 5.3.z bugzilla and we will release it in the new 5.3.z upcoming release of openais. the 5.3.z release is pending so there is some urgency here. The root of the problem is checkpoints of type GLOBALID are rejected in checkpoint sync, but not in section sync or refcount sync. the globalid checkpoint is virtual, meaning it doesn't really consume a checkpoint position in the system. At the beginning of the sync algo, the globalid sets the global checkpoint id, but then ignores the rest of the message, not actually creating message. this same check must be added to refcount and section synchronization to be backward compatible. The 5.3 code only sends this checkpoint sync at the start of synchronization, whereas the old sync code would actually create a checkpoint for the section. Then when refcount went to refcount the "virtual checkpoint", it wouldn't find it and assert. Thanks for your help Regards -steve *** Bug 489596 has been marked as a duplicate of this bug. *** ~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative. Verified rolling upgrades from 5.2 and 5.3 GA to 5.4 snapshot 4. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1366.html |