Bug 217626 - Failure to update global_last_id results in same ID being issued to multiple components
Summary: Failure to update global_last_id results in same ID being issued to multiple ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
: 207690 (view as bug list)
Depends On:
Blocks: 214808
TreeView+ depends on / blocked
 
Reported: 2006-11-29 00:11 UTC by Jonathan Earl Brassow
Modified: 2009-04-16 20:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-05 21:41:09 UTC
Embargoed:


Attachments (Terms of Use)
Patch to fix problem (524 bytes, patch)
2006-11-29 00:11 UTC, Jonathan Earl Brassow
no flags Details | Diff

Description Jonathan Earl Brassow 2006-11-29 00:11:09 UTC
Got the following by simply running a create/delete loop of cluster mirrors:

node4 (the node on which the commands are being run):

*nothing on console*

[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 6 5]

DLM Lock Space:  "clustered_log"                     7  10 join
S-4,4,1
[4 6]


Node5:

SM: 00000000 process_reply duplicateid=9 nodeid=4 2/2

[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[5 6 4]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 5 6]

DLM Lock Space:  "clustered_log"                     0  10 join
S-1,80,3
[]


Node6:

SM: 01000007 process_join_stop: bad num nodes 2 3
SM: 01000007 process_one_uevent error -1 state 2

[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[6 5 4]

DLM Lock Space:  "clvmd"                             7   3 run
U-2,0,4
[6 4 5]

DLM Lock Space:  "clustered_log"                     7  10 run       -
[6]

Comment 1 Jonathan Earl Brassow 2006-11-29 00:11:09 UTC
Created attachment 142344 [details]
Patch to fix problem

Comment 2 Christine Caulfield 2006-11-29 09:49:47 UTC
Can you check this please Dave?

Comment 3 David Teigland 2006-11-29 15:04:19 UTC
I believe that the fix for bug 206193 (which went out in the
last rhel4 errata?) has created this bug.  I'm thinking that
two consecutive groups created by a single node will have the
same global id due to global_last_id not being updated on the
node (which I expected would happen).

I'm trying to understand how the tests showed that the change
in 206193 worked but didn't show this extremely basic bug.  Also,
what flawed assumption was I making while working on the other
bug that caused me not to see this.

Comment 4 David Teigland 2006-12-01 20:45:12 UTC
The fix for bug 206193 introduced this even worse bug.
We need to update our global_last_id locally when we create a new one.  

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.2.3; previous revision: 1.4.2.2
done

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.8.3; previous revision: 1.4.8.2
done


Comment 6 Christine Caulfield 2007-01-24 15:56:36 UTC
*** Bug 207690 has been marked as a duplicate of this bug. ***

Comment 7 Chris Feist 2008-08-05 21:41:09 UTC
Fixed in current release (4.7).


Note You need to log in before you can comment on or make changes to this bug.