Got the following by simply running a create/delete loop of cluster mirrors: node4 (the node on which the commands are being run): *nothing on console* [root@neo-04 ~]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 7 2 run - [4 5 6] DLM Lock Space: "clvmd" 7 3 run - [4 6 5] DLM Lock Space: "clustered_log" 7 10 join S-4,4,1 [4 6] Node5: SM: 00000000 process_reply duplicateid=9 nodeid=4 2/2 [root@neo-05 ~]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 7 2 run - [5 6 4] DLM Lock Space: "clvmd" 7 3 run - [4 5 6] DLM Lock Space: "clustered_log" 0 10 join S-1,80,3 [] Node6: SM: 01000007 process_join_stop: bad num nodes 2 3 SM: 01000007 process_one_uevent error -1 state 2 [root@neo-06 ~]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 7 2 run - [6 5 4] DLM Lock Space: "clvmd" 7 3 run U-2,0,4 [6 4 5] DLM Lock Space: "clustered_log" 7 10 run - [6]
Created attachment 142344 [details] Patch to fix problem
Can you check this please Dave?
I believe that the fix for bug 206193 (which went out in the last rhel4 errata?) has created this bug. I'm thinking that two consecutive groups created by a single node will have the same global id due to global_last_id not being updated on the node (which I expected would happen). I'm trying to understand how the tests showed that the change in 206193 worked but didn't show this extremely basic bug. Also, what flawed assumption was I making while working on the other bug that caused me not to see this.
The fix for bug 206193 introduced this even worse bug. We need to update our global_last_id locally when we create a new one. % cvs commit sm_message.c Checking in sm_message.c; /cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v <-- sm_message.c new revision: 1.4.2.3; previous revision: 1.4.2.2 done % cvs commit sm_message.c Checking in sm_message.c; /cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v <-- sm_message.c new revision: 1.4.8.3; previous revision: 1.4.8.2 done
*** Bug 207690 has been marked as a duplicate of this bug. ***
Fixed in current release (4.7).