Bug 217626 - Failure to update global_last_id results in same ID being issued to multiple components
Failure to update global_last_id results in same ID being issued to multiple ...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
: 207690 (view as bug list)
Depends On:
Blocks: 214808
  Show dependency treegraph
 
Reported: 2006-11-28 19:11 EST by Jonathan Earl Brassow
Modified: 2009-04-16 16:31 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-05 17:41:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix problem (524 bytes, patch)
2006-11-28 19:11 EST, Jonathan Earl Brassow
no flags Details | Diff

  None (edit)
Description Jonathan Earl Brassow 2006-11-28 19:11:09 EST
Got the following by simply running a create/delete loop of cluster mirrors:

node4 (the node on which the commands are being run):

*nothing on console*

[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 6 5]

DLM Lock Space:  "clustered_log"                     7  10 join
S-4,4,1
[4 6]


Node5:

SM: 00000000 process_reply duplicateid=9 nodeid=4 2/2

[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[5 6 4]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 5 6]

DLM Lock Space:  "clustered_log"                     0  10 join
S-1,80,3
[]


Node6:

SM: 01000007 process_join_stop: bad num nodes 2 3
SM: 01000007 process_one_uevent error -1 state 2

[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[6 5 4]

DLM Lock Space:  "clvmd"                             7   3 run
U-2,0,4
[6 4 5]

DLM Lock Space:  "clustered_log"                     7  10 run       -
[6]
Comment 1 Jonathan Earl Brassow 2006-11-28 19:11:09 EST
Created attachment 142344 [details]
Patch to fix problem
Comment 2 Christine Caulfield 2006-11-29 04:49:47 EST
Can you check this please Dave?
Comment 3 David Teigland 2006-11-29 10:04:19 EST
I believe that the fix for bug 206193 (which went out in the
last rhel4 errata?) has created this bug.  I'm thinking that
two consecutive groups created by a single node will have the
same global id due to global_last_id not being updated on the
node (which I expected would happen).

I'm trying to understand how the tests showed that the change
in 206193 worked but didn't show this extremely basic bug.  Also,
what flawed assumption was I making while working on the other
bug that caused me not to see this.
Comment 4 David Teigland 2006-12-01 15:45:12 EST
The fix for bug 206193 introduced this even worse bug.
We need to update our global_last_id locally when we create a new one.  

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.2.3; previous revision: 1.4.2.2
done

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.8.3; previous revision: 1.4.8.2
done
Comment 6 Christine Caulfield 2007-01-24 10:56:36 EST
*** Bug 207690 has been marked as a duplicate of this bug. ***
Comment 7 Chris Feist 2008-08-05 17:41:09 EDT
Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.