517145 – [RFE] GFS: New mount option: -o errors=withdraw|panic

Bug 517145 - [RFE] GFS: New mount option: -o errors=withdraw|panic

Summary: [RFE] GFS: New mount option: -o errors=withdraw|panic

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs-kmod
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	5.5
Assignee:	Robert Peterson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	488499 5.5TechNotes-Updates 557292 572651
TreeView+	depends on / blocked

Reported:	2009-08-12 18:22 UTC by Robert Peterson
Modified:	2018-10-20 00:53 UTC (History)
CC List:	15 users (show)
Fixed In Version:	gfs-kmod-0.1.34-5.el5
Doc Type:	Enhancement
Doc Text:	A new "-o errors=" mount option has been added for gfs file systems, similar to some other file systems such as ext3. The option controls how gfs behaves in the unlikely event that file system errors occur. The normal behaviour (same as -o errors=withdraw) is to withdraw from the file system and make it inaccessible until the next reboot, and in some cases the system may remain running. If -o errors=panic is specified file system errors will cause a kernel panic to occur.
Clone Of:	488499
Clones:	518106 (view as bug list)
Environment:
Last Closed:	2010-03-30 08:55:58 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed patch (8.67 KB, patch) 2009-08-13 19:23 UTC, Robert Peterson	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0291	0	normal	SHIPPED_LIVE	Moderate: gfs-kmod security, bug fix and enhancement update	2010-03-29 14:12:22 UTC

Comment 1 David Teigland 2009-08-12 19:10:22 UTC

I decided to go and look at the code to see what's actually changed.... and I
can't find anything that's changed.

In various error conditions, gfs currently calls the following:

gfs_assert() -> if ar_oopses_ok then BUG() -> panic(); else panic()
gfs_assert_warn() -> if ar_debug then BUG() -> panic()
gfs_assert_withdraw() -> gfs_lm_withdraw -> if ar_debug then BUG() -> panic()
gfs_io_error() -> gfs_lm_withdraw -> if ar_debug then BUG() -> panic()
gfs_metatype_check() -> gfs_lm_withdraw -> if ar_debug then BUG() -> panic()
gfs_consist*() -> gfs_lm_withdraw -> if ar_debug then BUG() -> panic()

mounting with -o debug sets ar_debug = 1 and ar_oopses_ok = 1.

So, shouldn't mount -o debug still panic the machine as it always has?

Comment 3 Steve Whitehouse 2009-08-12 19:56:40 UTC

There is a userland only way to ensure that a node reboots when a withdraw occurs:

Create a file /etc/udev/rules.d/51-gfs.rules with the single line:

KERNEL=="lock_module", ACTION=="offline", PROGRAM+="/sbin/reboot -fin"

Then whenever a withdraw happens, reboot will be called. Since it requires no kernel changes, then if it is acceptable it can be used right away. Let us know if that doesn't fix the problem.

Bob also has a back up plan if the above doesn't solve the problem.

Comment 4 Paul Batkowski 2009-08-12 20:38:05 UTC

How about making this a bit more reliable and having the PROGRAM+="echo c > /proc/sysrq-trigger" instead of calling reboot? This would be closer to Veritas cluster behavior where a node is panickd in the case that shared storage access is lost.

Comment 5 Toure Dunnon 2009-08-12 20:45:04 UTC

so instead of "c" you could you "b" which will reboot the system and not dump a core everytime...

Comment 6 David Teigland 2009-08-12 20:49:27 UTC

This is getting ridiculous.  To make gfs panic you turn off withdraw, you don't reconfigure withdraw to do precisely what withdraw is meant to avoid!

Comment 13 Robert Peterson 2009-08-13 19:23:23 UTC

Created attachment 357362 [details]
Proposed patch

This is my proposed patch for RHEL5.  I have tested the withdraw
code path, the panic code path, and all the mount option
combinations I could think of and they all work as expected.

Comment 14 Robert Peterson 2009-08-13 19:41:39 UTC

This bug will be used to add the mount options "-o errors=panic"
and "-o errors=withdraw" for mounting GFS file systems.

If "-o errors" is not specified or if "errors=withdraw" is
specified, GFS file system errors will be treated as they are
today.  That is, if the error is not severe, the file system will
simply "withdraw" and the file system will become unusable until
the system is rebooted.  What happens next depends on the value of
the "kernel.panic_on_oops" sysctl value.  The default behavior for
the kernel is to panic when the oops occurs.  If the users have
overridden this with "sysctl -w kernel.panic_on_oops=0" the system
will keep running after the file system withdraw occurs.

If "-o errors=panic" is specified, even non-severe file system errors
will cause the system to force a kernel panic.  This may be
useful (when combined with other options) for automatically
rebooting a system with errors when a fencing device is used that
does not force a reboot (fabric-level fencing, such as fence_scsi).

Comment 19 Robert Peterson 2009-08-19 15:49:23 UTC

Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
A new "-o errors=" mount option has been added for gfs file systems, similar to some other file systems such as ext3.  The option controls how gfs behaves in the unlikely event that file system errors occur. The normal behaviour (same as -o errors=withdraw) is to withdraw from the file system and make it inaccessible until the next reboot, and in some cases the system may remain running.  If -o errors=panic is specified file system errors will cause a kernel panic to occur.

Comment 23 Robert Peterson 2009-10-09 19:27:36 UTC

This patch is now pushed to the master branch of the gfs1-utils
git tree and the RHEL55 and STABLE3 branches of the cluster git
tree for inclusion into 5.5.  Changing status to POST.

Comment 24 Robert Peterson 2009-10-09 21:00:40 UTC

Build 2024343 successful.  Fix is in gfs-kmod-0.1.34-5.el5.
Changing status to Modified.

Comment 27 Chris Ward 2010-02-11 10:10:57 UTC

~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 28 Jaroslav Kortus 2010-03-03 19:45:34 UTC

with errors=panic the node panics as expected.
with errors=withdraw (default) following happens:

GFS: fsid=Z_Cluster:vedder0.0: withdrawing from cluster at user's request
GFS: fsid=Z_Cluster:vedder0.0: about to withdraw from the cluster
GFS: fsid=Z_Cluster:vedder0.0: telling LM to withdraw
VFS:Filesystem freeze failed

and gfs_tool withdraw hangs:
open("/proc/fs/gfs", O_RDWR)            = 3
write(3, "list", 4)                     = 4
read(3, "18446675904265158656 253:2 Z_Clu"..., 1048575) = 47
close(3)                                = 0
open("/proc/mounts", O_RDONLY)          = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b1efa792000
read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 742
stat("/dev/mapper/vedder-vedder0", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 2), ...}) = 0
close(3)                                = 0
munmap(0x2b1efa792000, 4096)            = 0
open("/proc/fs/gfs", O_RDWR)            = 3
write(3, "withdraw 18446675904265158656\n", 30) = 30
read(3,

Comment 29 Robert Peterson 2010-03-04 14:40:32 UTC

Hi Jaroslav,

I suspect that once gfs has withdrawn from a file system, it closes
the connection to the lock manager and all, and refuses to do
anything to that file system, including responding to the read()
shown in your strace.

I believe what you discovered is not a problem with the kernel
patch.  The patch adds errors=panic and withdraw options to the
GFS kernel module.  The panic function is new, and you said it
works.  The "withdraw" option should revert to previous behavior,
which should not have changed.  Can you try the same test on a
GFS file system without the patch and see if your test behaves/hangs
the same way?  If the test behaves the same way as it does now,
there is no regression.  (It might be a bug we should solve, but not
a regression).  If the test behaves differently and it used to do
your command without hanging on the read, then it could be a
regression.

Bob Peterson

Comment 30 Robert Peterson 2010-03-04 15:55:51 UTC

I spoke with Steve Whitehouse about this issue this morning.
It turns out that the hang in comment #28 is caused not by the
patch for this bug, but by the gfs patch to allow gfs to use
the standard interface for freeze/thaw.  Apparently, when a
withdraw occurs, the kernel generates a uevent which causes
gfs_controld to use dm to isolate the storage.  The gfs_controld
program does so by calling dmsetup with the suspend parameter.
That causes the file system to be frozen, so it can't respond
to the withdraw.

Based on an idea from Steve Whitehouse, I patched gfs_controld
so that when it calls dmsetup, it uses the --nolockfs and --noflush
parameters.  With this patch in place, the hang did not occur.
The patch looks something like this:

+++ b/group/gfs_controld/recover.c
@@ -2711,7 +2711,8 @@ static int run_dmsetup_suspend(struct mountgroup *mg, char *dev)
-               execlp("dmsetup", "dmsetup", "suspend", buf, NULL);
+               execlp("dmsetup", "dmsetup", "suspend",  "--nolockfs",
+                      "--noflush", buf, NULL);

I've given a patched version of gfs_controld to jkortus to try
and hope to get feedback today.  If it indeed fixes the problem
that means (1) this bug's gfs patch did not cause a regression
and can therefore be placed back into ON_QA or maybe even VERIFIED.
(2) We need to open a new blocker bug record and get it into 5.5
for the patch to gfs_controld.

Comment 31 Robert Peterson 2010-03-04 16:34:55 UTC

I have opened bug #570530 to address the gfs_controld issue at
the direction of Subhendu.  This bug should be changed to
indicate it did not cause a regression.  I'll let jkortus fix
the status as appropriate.

Comment 32 Jaroslav Kortus 2010-03-04 19:31:10 UTC

in RHEL5.4+RHN updates this works as expected, so it's not regression caused by this bug. Please see bug 487610 for additional possibly related information.


Mar  4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: withdrawing from cluster at user's request
Mar  4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: about to withdraw from the cluster
Mar  4 12:35:19 a2 kernel: GFS: fsid=a_cluster:vedder0.1: telling LM to withdraw
Mar  4 12:35:20 a2 kernel: GFS: fsid=a_cluster:vedder0.1: withdrawn
Mar  4 12:35:20 a2 kernel:
Mar  4 12:35:20 a2 kernel: Call Trace:
Mar  4 12:35:20 a2 kernel:  [<a000000100013b40>] show_stack+0x40/0xa0
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7bd0 bsp=e00000010e5a1298
Mar  4 12:35:20 a2 kernel:  [<a000000100013bd0>] dump_stack+0x30/0x60
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7da0 bsp=e00000010e5a1280
Mar  4 12:35:20 a2 kernel:  [<a00000020331df40>] gfs_lm_withdraw+0x1e0/0x220 [gfs]
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7da0 bsp=e00000010e5a1218
Mar  4 12:35:20 a2 kernel:  [<a000000203348600>] gfs_proc_read+0xaa0/0xd60 [gfs]
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7de0 bsp=e00000010e5a11b8
Mar  4 12:35:20 a2 kernel:  [<a000000100177300>] vfs_read+0x200/0x3a0
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7e20 bsp=e00000010e5a1168
Mar  4 12:35:20 a2 kernel:  [<a0000001001779d0>] sys_read+0x70/0xe0
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7e20 bsp=e00000010e5a10f0
Mar  4 12:35:20 a2 kernel:  [<a00000010000bd70>] __ia64_trace_syscall+0xd0/0x110
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a7e30 bsp=e00000010e5a10f0
Mar  4 12:35:20 a2 kernel:  [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
Mar  4 12:35:20 a2 kernel:                                 sp=e00000010e5a8000 bsp=e00000010e5a10f0

Comment 33 Jaroslav Kortus 2010-03-04 19:33:03 UTC

As it works as expected, I'm marking this as verified.

Comment 35 errata-xmlrpc 2010-03-30 08:55:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0291.html

Note You need to log in before you can comment on or make changes to this bug.