161808 – nodes don't wait for first mounter to finish recovery

Bug 161808 - nodes don't wait for first mounter to finish recovery

Summary: nodes don't wait for first mounter to finish recovery

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Teigland
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-06-27 15:15 UTC by David Teigland
Modified:	2010-01-12 03:05 UTC (History)
CC List:	0 users
Fixed In Version:	RHBA-2005-740
Clone Of:
Environment:
Last Closed:	2005-10-07 16:56:51 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2005:740	0	normal	SHIPPED_LIVE	GFS-kernel bug fix update	2005-10-07 04:00:00 UTC

Description David Teigland 2005-06-27 15:15:58 UTC

Description of problem:
When the first node mounts a gfs fs, other nodes can't be allowed
to mount until that first node has recovered all journals and
called others_may_mount().  It's the job of lock_dlm to do this,
but it's not -- it allows other nodes to mount while the first
may still be recovering journals.

This can potentially lead to fs corruption.

Version-Release number of selected component (if applicable):


How reproducible:
often

Steps to Reproduce:
1. all mounted nodes fail at once and are reset
2. all nodes come back and mount at once 
3. the first mounter doesn't complete recovery
of all journals before others also mount
  
Actual results:


Expected results:


Additional info:

Comment 1 David Teigland 2005-06-27 15:19:23 UTC

Email from Ken:

there is a chance of corruption.  The scenario I see is:

1) 3 machines are mounted.
2) They all fail at the same time.
3) Machine A comes back up and starts replay on the three journals
   serially.
4) Machine B comes back up, replays it's own journal really quickly
   while Machine A is still working on the first journal.
5) Machine B starts a workload and comes across blocks that are inconsistent
   because the third journal hasn't been replayed yet.  Because all the
   machines died, there are not expired locks to protect the data.

In order to hit the failure case, you always need at least three nodes
to have been mounted at one time or another.  But not all three nodes
need to be running at the power failure time.  (They key is that there
must be a dirty journal beyond the first two to be mounted.)

Comment 2 David Teigland 2005-06-29 07:32:34 UTC

Fixed on RHEL4 and STABLE branches.

The likelihood of this bug causing a problem or corruption is
even smaller than originally thought.  Even if the lock module
doesn't prevent other mounts until first recovery is done,
there's a gfs lock the other mounters block on that has nearly
the same effect.

Comment 4 Red Hat Bugzilla 2005-10-07 16:56:51 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-740.html

Note You need to log in before you can comment on or make changes to this bug.