508128 – long scheduling delays of corosync process cause totem to meltdown

Bug 508128 - long scheduling delays of corosync process cause totem to meltdown

Summary: long scheduling delays of corosync process cause totem to meltdown

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	openais
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Steven Dake
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	508124
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-25 18:00 UTC by Steven Dake
Modified:	2016-04-26 14:00 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	508124
Environment:
Last Closed:	2009-12-15 21:11:52 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Purposed patch, sent to ml (10.89 KB, patch) 2009-07-15 08:00 UTC, Jan Friesse	no flags	Details \| Diff
View All

Description Steven Dake 2009-06-25 18:00:44 UTC

+++ This bug was initially created as a clone of Bug #508124 +++

Description of problem:
When a process pauses for longer then the token timeout, the other
processors in the system form a new ring.  The remaining processor then
eventually reschedules and processes the pending membership multicast
messages in its kernel queues.  This wreaks havok on the membership of
the other nodes.

While a proper kernel shouldn't pause for long periods, its a reality
that many kernels still have long periods of spinlocking without
scheduling and no proper preemption.

This patch resolves the scenario by creating a timer which records a
time stamp at an interval that is the token timeout / 5.  Then if a
process executes the membership algorithm by receiving a join message,
the current time is retrieved and compared to the timestamp.  If they
differ by more then token timeout / 2, it is assumed the process
couldn't schedule (because it couldn't trigger the timer callbacks via
poll) and calls totemnet to flush any pending multicasts in the file
descriptor responsible for receiving multicast messages.  This results
in the old membership messages being thrown away allowing the new
membership to form properly.

This can be tested by ctrl-z a corosync process in a 8 node cluster.
Then use fg to bring it into the foreground.  Pre-patch - bad news -
post patch, prints a notice and proceeds properly.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.setup 8 node cluster
2.ctrl-z 1 node
3.wait until other nodes form new ring
4. fg ctrl-z node
  
Actual results:
totem membership explodes

Expected results:
new ring formed properly

Additional info:

--- Additional comment from sdake on 2009-06-25 13:52:09 EDT ---

patch posted to ml.

Comment 1 Steven Dake 2009-06-25 18:01:33 UTC

this problem exists with openais as well.

Comment 4 Jan Friesse 2009-07-15 08:00:28 UTC

Created attachment 353797 [details]
Purposed patch, sent to ml

This is backport of trunk 2304.

Note You need to log in before you can comment on or make changes to this bug.