192082 – System won't suspend with GFS2 file system mounted

Bug 192082 - System won't suspend with GFS2 file system mounted

Summary: System won't suspend with GFS2 file system mounted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Don Zickus
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	204760 235665
TreeView+	depends on / blocked

Reported:	2006-05-17 14:14 UTC by Rob Kenna
Modified:	2007-11-30 22:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:	RHBA-2007-0959
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 19:12:36 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Fixes suspend to RAM/DISK issue w.r.t GFS2 (1.48 KB, patch) 2007-06-26 21:33 UTC, Abhijith Das	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0959	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5 Update 1	2007-11-08 00:47:37 UTC

Description Rob Kenna 2006-05-17 14:14:47 UTC

Description of problem:
Machine won't suspend with GFS fs mounted


Version-Release number of selected component (if applicable):
RHEL 4 - U3, and FC5

How reproducible:
100%


Steps to Reproduce:
1. Mount a gfs fs
2. Try to suspend
3. Starts the process, but doesn't quite complete.  Then wakes immediately back up
  
Actual results:
Won't suspend

Expected results:
Should suspend

Additional info:

Comment 1 Kiersten (Kerri) Anderson 2006-06-05 15:51:27 UTC

I am assuming this is a gfs2 problem.

Comment 3 Steve Whitehouse 2006-08-02 14:41:16 UTC

Is this GFS2 or not? It says RHEL4 which makes me think not...

I can't see that its valid to suspend a node while its a member of a cluster, so
umount is the only option I think, since you must be a member of a cluster to
have GFS(2) mounted. If we were to suspend a node while it was a member of a
cluster its lack of response would presumably cause it to be fenced.

Using GFS(2) as a single node filesystem is a different matter. Can we confirm
that this is the case here? Some more details about the system would be useful
too if possible. I presume this was discovered on a laptop of some kind?

Comment 4 Rob Kenna 2006-08-31 16:07:16 UTC

Yes, this is GFS2 in single node on a laptop.  I'll reconfirm once I run on Beta 1.

Comment 6 Kiersten (Kerri) Anderson 2007-01-03 21:22:06 UTC

Changing component to gfs2-kernel - lost track of this one since it was in the
kernel queue.  Also, fixing summary.

Comment 7 Kiersten (Kerri) Anderson 2007-01-03 21:23:11 UTC

Moving to 5.1

Comment 9 Steve Whitehouse 2007-02-02 16:47:42 UTC

Abhi, please have a quick look at this one, with the emphasis on quick :-) If
this is something that can be solved relatively easily then please go ahead,
otherwise just document what you discover.

Do you have a suitable laptop on which to test this?

Comment 11 Kiersten (Kerri) Anderson 2007-06-19 16:55:04 UTC

Marked NEEDINFO until we can get access to hardware that can reliably suspend.

Comment 12 Abhijith Das 2007-06-26 21:33:58 UTC

Created attachment 157960 [details]
Fixes suspend to RAM/DISK issue w.r.t GFS2

Ok... I figured out the problem with the help of the laptop Rob sent me
(Thanks!). The kernel threads in gfs2, namely gfs2_scand, gfs2_logd,
gfs2_quotad, gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend
mechanism was trying to freeze them.

See linux-2.6.22/Documentation/power/kernel_threads.txt.
And, from linux-2.6.22/Documentation/power/swsusp.txt:
"Q: A kernel thread must voluntarily freeze itself (call 'refrigerator').
I found some kernel threads that don't do it, and they don't freeze
so the system can't sleep. Is this a known behavior?

A: All such kernel threads need to be fixed, one by one. Select the
place where the thread is safe to be frozen (no kernel semaphores
should be held at that point and it must be safe to sleep there), and
add:

       try_to_freeze();

If the thread is needed for writing the image to storage, you should
instead set the PF_NOFREEZE process flag when creating the thread (and
be very careful)."

I put in calls to refrigerator() in the loops for all the daemons, however, I'm
not sure of the placement of these calls. I need some higher being
(swhiteho/dct-like) to bless this patch.

I tried this patch on the laptop and it works for me.
Also, this fixes only the gfs2 module. There're probably such kernel threads in
lock_dlm and dlm as well. Although, I suspect Rob isn't using dlm/lock_dlm on
his demo laptop and that fixing those isn't a priority.

Rob, it would be really nice if you could try this patch on your laptop and
tell me if it works for you.

Comment 13 Kiersten (Kerri) Anderson 2007-06-26 21:54:45 UTC

Setting blocker flag to get patch into a 5.1 kernel build after it has been
reviewed, accepted upstream and submitted to rhkernel-list.

Comment 14 Wendy Cheng 2007-06-27 00:55:38 UTC

Abhi, add a macro to the top of daemon.c or a common gfs header (say gfs.h ?).
Otherwise, the patch looks good. Nice job !

Comment 15 Abhijith Das 2007-06-27 16:12:58 UTC

The macro idea was NACKed. Sent original patch in comment #12 to cluster-devel.

Comment 16 Abhijith Das 2007-06-27 16:44:34 UTC

Moving back to ASSIGNED. Accidentally moved to MODIFIED. Need to post patch to
rhkernel-list and mark POST.

Comment 17 Abhijith Das 2007-06-27 22:50:12 UTC

Patch posted to rhkernel list
http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg02346.html

Comment 19 Rob Kenna 2007-07-13 13:23:28 UTC

This patch in the "34" build is working for me.  I have a GFS2 partition mounted
w/ and active build and, in initial test, it both suspends and resumes
successfully. Though, not the subject of this BZ, Hibernate (to disk) appears to
work.  This is very light testing, but seems promising.

Comment 20 Don Zickus 2007-07-13 15:42:44 UTC

in 2.6.18-34.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 22 Mike Gahagan 2007-09-12 18:39:35 UTC

Sucessfully tested using hibernate (my test system won't recover from a suspend
to ram for some reason) by copying a kernel source tree to a gfs volume, then
hibernating the system in the middle of the copy operation. The system recovered
and the copy continued. Once the copy finished, I diff'ed the two trees and they
are both the same.

Comment 24 errata-xmlrpc 2007-11-07 19:12:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.