Bug 515348 - RFE - allow ability to mount with 'o debug' yet not panic if corruption is detected
Summary: RFE - allow ability to mount with 'o debug' yet not panic if corruption is de...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: doc-Global_File_System_Guide
Version: 5.4
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: 5.5
Assignee: Steven J. Levine
QA Contact: Joshua Wulf
URL:
Whiteboard:
Depends On: 488499
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-03 19:47 UTC by Subhendu Ghosh
Modified: 2016-10-04 04:28 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 488499
: 572651 572721 (view as bug list)
Environment:
Last Closed: 2010-06-07 23:34:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Subhendu Ghosh 2009-08-03 19:47:04 UTC
+++ This bug was initially created as a clone of Bug #488499 +++

Description of problem:
This ability would allow QA to continue running revolver with the debug mount option (in order to acquire better debugging in case of real failures) while at the same time cause I/O errors on the filesystem as one of the failure injection methods. 

Version-Release number of selected component (if applicable):
gfs-utils-0.1.18-1.el5
kmod-gfs-0.1.31-3.el5

--- Additional comment from rpeterso on 2009-03-04 13:12:27 EDT ---

What "better debugging" are you looking for?  Today, the "-o debug"
mount option for GFS doesn't provide additional GFS messages.  It only
causes GFS to call BUG() and dump the call stack when it does a
withdraw.  Are you saying that you don't want the subsequent kernel
panic()?

--- Additional comment from cmarthal on 2009-03-04 14:23:14 EDT ---

We definitely need the call stack for when legitimate corruption is detected. So if that's only way to grab that now, then it would be nice if it didn't panic afterwards. I wonder if anyone besides QA mounts with the debug flag and what their thoughts would be about just removing the panic altogether.

--- Additional comment from rpeterso on 2009-05-05 15:17:56 EDT ---

I need to dig into this a little deeper.  I know that RHEL4 handles
withdrawing on errors differently from RHEL5.  RHEL4 did a BUG_ON
but RHEL5 is a little more graceful.

Adding Dave T and Steve W to the cc list to get a historical
perspective on the topic.  I wasn't part of the design process so
I don't know the impact of changing this.  This seems more like a
design issue to me, although I can see its value in debugging.
Is this something that we should do for RHEL6 rather than RHEL5.x?

--- Additional comment from swhiteho on 2009-05-06 05:24:16 EDT ---


The error handling in gfs is pretty poor really. I don't really expect that it will make a lot of difference what is done. As Bob says, the debug option doesn't change any of the messages which are printed, and if the fs withdraws then it usually means something pretty serious has gone wrong and there is not anything which can be done to recover from it.

We hope to improve on this in gfs2 by adding an errors= mount option along the lines of ext2/3/4 and being able to handle more of the possible fs errors by returning I/O errors to the user rather than panicing.

I'm not at all sure that there is anything that we can reasonably change in gfs though at this stage.

--- Additional comment from teigland on 2009-05-06 11:19:07 EDT ---

There are two issues here:
1. mount option to enable extra messages (unfortunately there are very few)
2. mount option to enable panic instead of withdraw (on i/o errors)

Both make sense, and we need both (I've seen customer requests for each).
The "problem" is quite trivial: they don't have distinct mount options.  We should let "debug" mean one of them (probably 1), and add a new option for the other.

--- Additional comment from rpeterso on 2009-07-30 12:07:19 EDT ---

*** Bug 509233 has been marked as a duplicate of this bug. ***

--- Additional comment from rpeterso on 2009-07-30 12:11:17 EDT ---

There are really two general classes of GFS errors: (1) withdraw
problems due to file system inconsistency, and (2) run-time errors
(for example, memory corruption) that cause an assertion error.

So really there should be two mount options: ar_debug, which means
to BUG on withdraw, and -o panic_on_assert (versus BUG() on assert).

--- Additional comment from rpeterso on 2009-07-31 17:48:20 EDT ---

Created an attachment (id=355866)
FIrst crack at a patch

This is a first-stab and is in no way complete.  At the very least
I need to change the man pages.  Just thought I'd post what I have
so far.

This implements "errors=panic|continue|remount-ro" similar to ext3.
If none of the three are specified, the default behavior remains
unchanged from how it is today.  That can be restored with
errors=default.

--- Additional comment from rpeterso on 2009-08-03 13:25:20 EDT ---

Created an attachment (id=356067)
Second prototype patch

This patch adds the missing man page elements and fixes a bug I
spotted.

Comment 1 Subhendu Ghosh 2009-08-03 19:48:25 UTC
Need to update docs with new mount options and debug impact

Comment 3 Steven J. Levine 2009-08-11 19:51:27 UTC
Just nothing that I've seen this bug and had an IRC exchange with Bob who pointed me to the Target Release field that indicates this fix is targeted for RHEL 5.5 (at the earliest).

So I'll hold off on docs updates until that release.

Comment 6 Steven J. Levine 2010-03-17 19:56:21 UTC
Some of the features described here are in RHEL 5.5 (-o errors) and others may make it into a later release. So this bug is for RHEL 5.5 -- I have documented the new option in the GFS and GFS2 manuals -- and I have cloned two bugs for the 5.6 features:

572651: GFS for 5.6

572721: GFS2 for 5.6

The documentation of the new mount option will be available in the 5.5 versions of the GFS and GFS2 documents.

Comment 7 John Skeoch 2010-06-07 23:34:27 UTC
Verified documentation of -o errors in:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/Global_File_System/

Red_Hat_Enterprise_Linux-Global_File_System-5-web-en-US-4-14.el5

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html-single/Global_File_System_2/

Red_Hat_Enterprise_Linux-Global_File_System_2-5-web-en-US-7-16.el5


Note You need to log in before you can comment on or make changes to this bug.