Description of problem: groupd creates uint32 global id's for each group. It doesn't use them itself, but provides them to each registered app to use if it wants. (The dlm and gfs each use the global id in messages to distinguish between different lockspaces/fs's.) groupd's method of creating these gid's (local counter | local nodeid) can result in duplicate gid's in the cluster given a somewhat uncommon sequence of events. This has been sitting on my todo list for a long time, have finally fixed it. Version-Release number of selected component (if applicable): How reproducible: uncommon Steps to Reproduce: 1. mount fsX on nodeA 2. mount fsX on nodeB 3. umount fsX on nodeA 4. stop cluster stuff on nodeA 5. start cluster stuff on nodeA 6. mount fsY on nodeA 7. mount fsY on nodeB Actual results: dlm messages will get mixed up between the X and Y lockspaces causing dlm recovery to be stuck Expected results: Additional info:
Fix checked into HEAD and RHEL5 branches.
Requesting blocker status for this defect. Closes a hole discovered during dlm recovery unit testing. Fix is available and impact is minimal. We are rebuilding these packages in the next week, so can pick up this fix as well.
checked into RHEL50 branch. Checking in app.c; /cvs/cluster/cluster/group/daemon/app.c,v <-- app.c new revision: 1.52.4.2; previous revision: 1.52.4.1 done Checking in cpg.c; /cvs/cluster/cluster/group/daemon/cpg.c,v <-- cpg.c new revision: 1.36.4.2; previous revision: 1.36.4.1 done Checking in gd_internal.h; /cvs/cluster/cluster/group/daemon/gd_internal.h,v <-- gd_internal.h new revision: 1.44.4.2; previous revision: 1.44.4.1 done
A package has been built which should help the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you.