217626 – Failure to update global_last_id results in same ID being issued to multiple components

Bug 217626 - Failure to update global_last_id results in same ID being issued to multiple components

Summary: Failure to update global_last_id results in same ID being issued to multiple ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cman
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Teigland
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	207690 (view as bug list)
Depends On:
Blocks:	214808
TreeView+	depends on / blocked

Reported:	2006-11-29 00:11 UTC by Jonathan Earl Brassow
Modified:	2009-04-16 20:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-08-05 21:41:09 UTC
Embargoed:

Attachments	(Terms of Use)
Patch to fix problem (524 bytes, patch) 2006-11-29 00:11 UTC, Jonathan Earl Brassow	no flags	Details \| Diff
View All

Description Jonathan Earl Brassow 2006-11-29 00:11:09 UTC

Got the following by simply running a create/delete loop of cluster mirrors:

node4 (the node on which the commands are being run):

*nothing on console*

[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 6 5]

DLM Lock Space:  "clustered_log"                     7  10 join
S-4,4,1
[4 6]


Node5:

SM: 00000000 process_reply duplicateid=9 nodeid=4 2/2

[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[5 6 4]

DLM Lock Space:  "clvmd"                             7   3 run       -
[4 5 6]

DLM Lock Space:  "clustered_log"                     0  10 join
S-1,80,3
[]


Node6:

SM: 01000007 process_join_stop: bad num nodes 2 3
SM: 01000007 process_one_uevent error -1 state 2

[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           7   2 run       -
[6 5 4]

DLM Lock Space:  "clvmd"                             7   3 run
U-2,0,4
[6 4 5]

DLM Lock Space:  "clustered_log"                     7  10 run       -
[6]

Comment 1 Jonathan Earl Brassow 2006-11-29 00:11:09 UTC

Created attachment 142344 [details]
Patch to fix problem

Comment 2 Christine Caulfield 2006-11-29 09:49:47 UTC

Can you check this please Dave?

Comment 3 David Teigland 2006-11-29 15:04:19 UTC

I believe that the fix for bug 206193 (which went out in the
last rhel4 errata?) has created this bug.  I'm thinking that
two consecutive groups created by a single node will have the
same global id due to global_last_id not being updated on the
node (which I expected would happen).

I'm trying to understand how the tests showed that the change
in 206193 worked but didn't show this extremely basic bug.  Also,
what flawed assumption was I making while working on the other
bug that caused me not to see this.

Comment 4 David Teigland 2006-12-01 20:45:12 UTC

The fix for bug 206193 introduced this even worse bug.
We need to update our global_last_id locally when we create a new one.  

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.2.3; previous revision: 1.4.2.2
done

% cvs commit sm_message.c 
Checking in sm_message.c;
/cvs/cluster/cluster/cman-kernel/src/Attic/sm_message.c,v  <--  sm_message.c
new revision: 1.4.8.3; previous revision: 1.4.8.2
done

Comment 6 Christine Caulfield 2007-01-24 15:56:36 UTC

*** Bug 207690 has been marked as a duplicate of this bug. ***

Comment 7 Chris Feist 2008-08-05 21:41:09 UTC

Fixed in current release (4.7).

Note You need to log in before you can comment on or make changes to this bug.