316031 – dm-mirror: incorrect order of mirror presuspend ops causes cluster mirror hang

Bug 316031 - dm-mirror: incorrect order of mirror presuspend ops causes cluster mirror hang

Summary: dm-mirror: incorrect order of mirror presuspend ops causes cluster mirror hang

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	358871
TreeView+	depends on / blocked

Reported:	2007-10-02 20:24 UTC by Jonathan Earl Brassow
Modified:	2007-11-17 01:14 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2007-0791
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-15 16:33:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0791	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6	2007-11-14 18:25:55 UTC

Description Jonathan Earl Brassow 2007-10-02 20:24:16 UTC

Cluster mirrors do not (and can not) allow writes to mirror regions which have
not recovered yet.

When a mirror suspends it:
1) stops recovery
2) flushes the work queue (finishing outstanding writes)
3) calls log presuspend

The log has no way of knowing that the mirror is suspending until #3, so it
blocks any outstanding writes to regions that are not-in-sync.  This could hang
the mirror because recovery has been stopped and the not-in-sync regions will
never be cleared - delaying the writes indefinitely.

The solution is to switch the order of operations to:
1) stop recovery
2) call log presuspend
3) flush work queue

By doing this, the log knows that recovery has been stopped and that there will
be no collisions with writes.  Therefore, it can allow writes to regions which
have not yet been recovered.

Comment 3 Jason Baron 2007-10-03 22:35:12 UTC

committed in stream U6 build 61. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/

Comment 6 errata-xmlrpc 2007-11-15 16:33:08 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.