Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 217626

Summary: Failure to update global_last_id results in same ID being issued to multiple components
Product: [Retired] Red Hat Cluster Suite Reporter: Jonathan Earl Brassow <jbrassow>
Component: cmanAssignee: David Teigland <teigland>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cfeist, cluster-maint, jbrassow, lenny, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-05 21:41:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 214808    
Attachments:
Description Flags
Patch to fix problem none

Description Jonathan Earl Brassow 2006-11-29 00:11:09 UTC
Got the following by simply running a create/delete loop of cluster mirrors:

node4 (the node on which the commands are being run):

*nothing on console*

[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 6 5]

DLM Lock Space:  "clustered_log"                     7  10 join
S-4,4,1
[4 6]


Node5:

SM: 00000000 process_reply duplicateid=9 nodeid=4 2/2

[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[5 6 4]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 5 6]

DLM Lock Space:  "clustered_log"                     0  10 join
S-1,80,3
[]


Node6:

SM: 01000007 process_join_stop: bad num nodes 2 3
SM: 01000007 process_one_uevent error -1 state 2

[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[6 5 4]

DLM Lock Space:  "clvmd"                             7   3 run
U-2,0,4
[6 4 5]

DLM Lock Space:  "clustered_log"                     7  10 run       -
[6]

Comment 1 Jonathan Earl Brassow 2006-11-29 00:11:09 UTC
Created attachment 142344 [details]
Patch to fix problem

Comment 2 Christine Caulfield 2006-11-29 09:49:47 UTC
Can you check this please Dave?

Comment 3 David Teigland 2006-11-29 15:04:19 UTC
I believe that the fix for bug 206193 (which went out in the
last rhel4 errata?) has created this bug.  I'm thinking that
two consecutive groups created by a single node will have the
same global id due to global_last_id not being updated on the
node (which I expected would happen).

I'm trying to understand how the tests showed that the change
in 206193 worked but didn't show this extremely basic bug.  Also,
what flawed assumption was I making while working on the other
bug that caused me not to see this.

Comment 4 David Teigland 2006-12-01 20:45:12 UTC
The fix for bug 206193 introduced this even worse bug.
We need to update our global_last_id locally when we create a new one.  

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.2.3; previous revision: 1.4.2.2
done

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.8.3; previous revision: 1.4.8.2
done


Comment 6 Christine Caulfield 2007-01-24 15:56:36 UTC
*** Bug 207690 has been marked as a duplicate of this bug. ***

Comment 7 Chris Feist 2008-08-05 21:41:09 UTC
Fixed in current release (4.7).