676851 – hung task warnings when NFS server is unavailable during writeback

Bug 676851 - hung task warnings when NFS server is unavailable during writeback

Summary: hung task warnings when NFS server is unavailable during writeback

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	initscripts
Sub Component:
Version:	5.6
Hardware:	i686
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	initscripts Maintenance Team
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	682879
TreeView+	depends on / blocked

Reported:	2011-02-11 14:56 UTC by Steven Devoe
Modified:	2018-11-26 18:08 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, when the netfs script performed a lazy unmount on a NFS filesystem, sometimes cached data would be written out before the shutdown scripts were able to take down the network interfaces. This caused various machines to have been hanging on shutdown. With this update, the netfs script has been fixed and the physical machines no longer hang in the described scenario.
Clone Of:
Clones:	682879 (view as bug list)
Environment:
Last Closed:	2011-07-21 08:34:10 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Console Page 1 (37.62 KB, image/jpeg) 2011-02-11 14:56 UTC, Steven Devoe	no flags	Details
Console Page 2 (33.94 KB, image/jpeg) 2011-02-11 14:57 UTC, Steven Devoe	no flags	Details
Console Page 3 (35.26 KB, image/jpeg) 2011-02-11 14:57 UTC, Steven Devoe	no flags	Details
Console Page 4 (33.29 KB, image/jpeg) 2011-02-11 14:57 UTC, Steven Devoe	no flags	Details
Console Page 5 (34.15 KB, image/jpeg) 2011-02-11 14:58 UTC, Steven Devoe	no flags	Details
Console Page 6 (31.50 KB, image/jpeg) 2011-02-11 14:58 UTC, Steven Devoe	no flags	Details
Console Page 7 (33.80 KB, image/jpeg) 2011-02-11 14:58 UTC, Steven Devoe	no flags	Details
Console Page 8 (36.02 KB, image/jpeg) 2011-02-11 14:59 UTC, Steven Devoe	no flags	Details
Console Page 9 (32.81 KB, image/jpeg) 2011-02-11 15:00 UTC, Steven Devoe	no flags	Details
Console Page 23 (30.18 KB, image/jpeg) 2011-02-11 15:01 UTC, Steven Devoe	no flags	Details
SOS report (2.98 MB, application/octet-stream) 2011-02-16 20:06 UTC, Steven Devoe	no flags	Details
patch -- only pass umount args on retry in __umount_loop (879 bytes, patch) 2011-03-04 16:34 UTC, Jeff Layton	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:1081	0	normal	SHIPPED_LIVE	initscripts bug fix and enhancement update	2011-07-21 08:33:52 UTC

Description Steven Devoe 2011-02-11 14:56:47 UTC

Created attachment 478253 [details]
Console Page 1

Description of problem:

RHEL 5.6 VMware guest with NFS homedirs crashes if users have recently logged in remotely via NX while a shutdown occurs. Problem was introduced with the RHEL 5.5 -> RHEL 5.6 upgrade, and is not present on unpatched systems. kdump fails to write dump logs to local disk, I've attached screenshots gathered from the VMware console in their place.

Version-Release number of selected component (if applicable):

RHEL 5.6 w/kernel 2.6.18-238.1.1.el5PAE

How reproducible:

Consistently reproduceable by having users log into NX sessions before issuing the shutdown command. Problem does not occur if the system receives no NX user activity between reboots.

Steps to Reproduce:
1. Connect to KDE or Gnome session via NX Client
2. Issue shutdown command as root directly or via cron
3. Wait for crash
  
Actual results:

See attached .jpg images of console output. System does not shut down cleanly, requiring manual administrative intervention (power cycling/VM reset) to bring the machine back into a ready state.

Expected results:

A clean reboot or shutdown.

Additional info:

Comment 1 Steven Devoe 2011-02-11 14:57:21 UTC

Created attachment 478254 [details]
Console Page 2

Comment 2 Steven Devoe 2011-02-11 14:57:39 UTC

Created attachment 478255 [details]
Console Page 3

Comment 3 Steven Devoe 2011-02-11 14:57:57 UTC

Created attachment 478256 [details]
Console Page 4

Comment 4 Steven Devoe 2011-02-11 14:58:11 UTC

Created attachment 478257 [details]
Console Page 5

Comment 5 Steven Devoe 2011-02-11 14:58:28 UTC

Created attachment 478258 [details]
Console Page 6

Comment 6 Steven Devoe 2011-02-11 14:58:46 UTC

Created attachment 478259 [details]
Console Page 7

Comment 7 Steven Devoe 2011-02-11 14:59:05 UTC

Created attachment 478260 [details]
Console Page 8

Comment 8 Steven Devoe 2011-02-11 15:00:27 UTC

Created attachment 478261 [details]
Console Page 9

Comment 9 Steven Devoe 2011-02-11 15:01:59 UTC

Created attachment 478262 [details]
Console Page 23

Comment 10 Ric Wheeler 2011-02-11 15:37:44 UTC

Hi Steven,

A few quick questions for you.

What is NX client? Is it a binary driver?

If you have a support contract with RH, please open a support ticket so our field people can help us debug. 

Thanks!

Comment 11 Jeff Layton 2011-02-11 15:47:50 UTC

It's not a crash, you're getting hung task warnings...

Looks like the client is stuckwaiting for the PG_BUSY bit to clear after being written back. Perhaps the network interface went down before the fs was
unmounted?

Comment 12 Steven Devoe 2011-02-11 15:58:31 UTC

Ric,

NX referring to NoMachine NX, a VNC-type service tightly integrated with X Windows. No active support contract to my knowledge.

Jeff,

When /etc/rc6.d is set up to run K20nfs and K90network, NFS should come down before the network interfaces, correct? Is there a way to revert back to the behavior of RHEL 5.5 in this case without downgrading?

Comment 13 Jeff Layton 2011-02-11 16:05:24 UTC

/etc/init.d/nfs refers to server-side NFS

The unmounts should happen in /etc/init.d/netfs, which is typically set to K75. But...there are a lot of possible network interfaces that don't necessarily go down in K90.

Ric is quite right that you'll likely need to open a support case for us to gather more info about the problem.

Comment 14 Steven Devoe 2011-02-11 16:19:54 UTC

What additional information is required? I'm happy to supply anything necessary, but don't understand why a support contract would be required to triage this bug.

Comment 15 Ric Wheeler 2011-02-11 18:06:40 UTC

Hi Steven,

When you have our support people work with you, they help us get information, debug and even produce patches. Part of the value you get from being a customer. It also means that they can make sure that the development team does not have to triage issues that might be resolved already.

If you don't have a RH subscription, we prefer to work the issues through upstream lists or community based distros.

Does not mean you won't get help (probably from the same people), just helps us make sure that our support efforts are working correctly.

In general, Red Hat bugzilla is not meant to be an end user support tool since that cuts our field out.

Comment 16 Ric Wheeler 2011-02-11 19:35:30 UTC

One thought since this is a vmware guest - do you have a different/vmware specific virt nic that gets shut down too early?

Comment 17 Steven Devoe 2011-02-14 13:36:55 UTC

Gents,

I do appreciate the help. To give you some background, my situation is that I'm a relatively new admin here in one of the smaller units of one of the colleges, and our RHEL support contract is all handled through central IT. I don't have access to any of that information directly. I'll attempt to obtain it, though this could take some time.

As far as the VMnic, it's a good thought but they appear to be configured identically to other NICs. vmware-tools unloads at K99, after K75netfs and K90network. 

Thanks again for the assistance. I'll work through the support team to try to track down this bug.

Comment 18 Steven Devoe 2011-02-16 20:06:19 UTC

Created attachment 479205 [details]
SOS report

Adding sosreport per request of RHEL Engineer Jessica

Comment 34 Steven Devoe 2011-02-24 18:44:31 UTC

Just wondering if any status update was available?

Thanks as always!

Comment 37 Jeff Layton 2011-03-01 14:30:47 UTC

(In reply to comment #35)

> The customer turned up nfs client debugging and crashed the system while the
> issue was occurring to get us a core dump.
> 

Ok, I had a look at the core. Not much interesting there really. I found the file that the httplog program was blocking on, and backtracked that to the rpc structures that make up the client and its queue. There are only two calls in queue -- one is a write call which is pretty much what I expected.

All of the network interfaces have been deconfigured which is also expected and therein lies the problem -- the NFS client can't make progress flushing pages without network connectivity.

I think I've also ID'ed a potential initscripts problem that could cause this but I need to look at it a bit more closely and involve the people that maintain that package.

Comment 41 Jeff Layton 2011-03-01 15:51:19 UTC

Looking at the netfs init script, it does this:

    NFSMTAB=`LC_ALL=C awk '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" { print $2 }' /proc/mounts`

...so a non empty $NFSMTAB means that we have nfs mounts that need to be unmounted. In the "stop" section we have this:

        if [ -n "$NFSMTAB" ]; then
                __umount_loop '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" {print $2}' \
                        /proc/mounts \
                        $"Unmounting NFS filesystems: " \
                        $"Unmounting NFS filesystems (retry): " \
                        "-f -l" 
        fi

...which calls __umount loop with a list of mounts that need to be unmounted. __umount_loop is in /etc/init.d/functions:

---------------[snip]-------------------

# __umount_loop awk_program fstab_file first_msg retry_msg umount_args
# awk_program should process fstab_file and return a list of fstab-encoded
# paths; it doesn't have to handle comments in fstab_file.
__umount_loop() {
        local remaining sig=
        local retry=3

        remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r)
        while [ -n "$remaining" -a "$retry" -gt 0 ]; do
                if [ "$retry" -eq 3 ]; then
                        action "$3" fstab-decode umount $5 $remaining
                else
                        action "$4" fstab-decode umount $5 $remaining
                fi
                sleep 2
                remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r)
                [ -z "$remaining" ] && break
                fstab-decode /sbin/fuser -k -m $sig $remaining >/dev/null
                sleep 5
                retry=$(($retry -1))
                sig=-9
        done
}
---------------[snip]-------------------

...the interesting part is are the two umount calls in the 'if [ "$retry" -eq 3 ]' block. They both pass in $5 unconditionally, which is the "-f -l" argument. 

So if I'm interpreting this correctly, RHEL5 pretty much *never* attempts a normal umount on NFS filesystems -- it always does a lazy umount. This means that the first umount always "succeeds" due to the "-f", and the "fuser -k" calls don't usually get done.

I'm actually surprised we don't see this hang more often.

I think the right solution is to turn the "umount_args" argument into a "retry_umount_args" argument. IOW, call the umount without any extra flags first, and then add in the extra flags if and when that fails.

That may not prevent all hangs, but should take care of most of these problems.

Comment 43 Bill Nottingham 2011-03-01 17:51:20 UTC

I'm not sure why this would change on 5.5 -> 5.6; nothing changed there in initscripts.  The lazy umount has been there since 2002, and I'm somewhat worried that changing that in a RHEL update could cause more issues than it solves.

There's a change upstream that may help:

commit decf19bb9dc7b70ad89f9154899e73df069f5e62
Author: Bill Nottingham <notting>
Date:   Fri Feb 25 16:41:25 2011 -0500

    Call sync after nfs unmount, otherwise we'll hang when the kernel syncs later. (#637500)

diff --git a/rc.d/init.d/netfs b/rc.d/init.d/netfs
index 8d9854f..3713cf4 100755
--- a/rc.d/init.d/netfs
+++ b/rc.d/init.d/netfs
@@ -116,6 +116,7 @@ case "$1" in
                        $"Unmounting NFS filesystems: " \
                        $"Unmounting NFS filesystems (retry): " \
                        "-f -l"
+               sync
        fi
        if [ -n "$CIFSMTAB" ]; then
                for MNT in $CIFSMTAB; do

Comment 44 Jeff Layton 2011-03-01 19:11:16 UTC

I think I've found the probable cause of the change -- commit f4fa2b45.

Because of a possible oops when truncating a file, we had to change nfs_wait_on_request() to use an uninterruptible sleep. Many of the processes are hung in this codepath, and since it now uses an uninterruptible sleep they're no longer affected by signals. The fact that they used to be probably is what helps work around the fundamental problem.

I really don't see an alternative to fixing the init scripts properly. The lazy umount allows processes with files already open to continue dirtying pages even after the umount returns. Adding a /bin/sync call won't really do anything to fix that problem. Those processes can just dirty more pages after it returns.

Comment 45 John Ruemker 2011-03-04 16:21:22 UTC

(In reply to comment #41)
> 
> I think the right solution is to turn the "umount_args" argument into a
> "retry_umount_args" argument. IOW, call the umount without any extra flags
> first, and then add in the extra flags if and when that fails.
> 

I'm not sure if this is what you had in mind, but I had the customer try this:

--- initscripts-8.45.33/rc.d/init.d/functions.orig	2011-03-01 15:29:20.000000000 -0500
+++ initscripts-8.45.33/rc.d/init.d/functions	2011-03-01 15:33:13.000000000 -0500
@@ -85,7 +85,7 @@ __umount_loop() {
 		if [ "$retry" -eq 3 ]; then
 			action "$3" fstab-decode umount $5 $remaining
 		else
-			action "$4" fstab-decode umount $5 $remaining
+			action "$4" fstab-decode umount "" $remaining
 		fi
 		sleep 2
 		remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r)


And it did not resolve their issues.

Comment 46 Jeff Layton 2011-03-04 16:34:49 UTC

Created attachment 482324 [details]
patch -- only pass umount args on retry in __umount_loop

No, that looks wrong. "retry" starts at 3, and we want to only pass in the umount_args on the passes after the first attempt. This patch should be more correct.

Bill, any thoughts?

Comment 48 Bill Nottingham 2011-03-04 16:50:56 UTC

It makes me a bit nervous in that it's changing the semantics of the function. That's alleviated by the fact that at least in what we ship, nfs umount is the only thing that uses the args. I don't *think* we have third-party scripts piggybacking on that function.

Comment 49 Jeff Layton 2011-03-04 17:23:38 UTC

That's a good point. We could mitigate that risk by leaving __umount_loop alone, add a new variant that has these semantics, and have netfs and maybe halt call the new one.

Before we do anything further though I'd like to know whether this approach actually fixes the issue or not. For now, I'll set this to NEEDINFO for John. He can let us know whether it helps or not once they've had a chance to test it.

Comment 50 John Ruemker 2011-03-07 15:47:41 UTC

Customer reported the test package successfully resolved their issues.  They had 5 servers they were using for testing that would demonstrate the issue on most reboots.  Over the weekend, they rebooted each of them 130 times and none of them showed any problems.

-John

Comment 51 Jeff Layton 2011-03-07 16:03:26 UTC

That's excellent news.

Bill, would you still prefer that we add a new variant of this function instead of altering __umount_loop?

Comment 52 Bill Nottingham 2011-03-07 20:52:51 UTC

Altering __umount_loop is probably OK, given the "__" prefix (aka, for-internal-use-only.)

Added upstream in Fedora as http://git.fedorahosted.org/git?p=initscripts.git;a=commitdiff;h=a9b0d6b5c655da96783851d5304c4d800d4e4553

Comment 53 RHEL Program Management 2011-03-07 21:00:26 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 57 Steven Devoe 2011-03-22 13:12:40 UTC

I've tested the patch provided along with the modification in 682879. Unfortunately the machine still enters the same unresponsive state when performing a reboot after an NX user has logged in, with similar error messages concerning NFS and call traces displaying on the console, refreshing periodically.

Comment 58 Jeff Layton 2011-03-22 13:30:10 UTC

Not sure why that would be. We have a report of a similar problem by a customer who reported that this patch fixed it for him. I'm not very familiar with what "NX" does, so you'll probably need to do some debugging to track down what the actual problem is.

Comment 59 Steven Devoe 2011-03-22 19:32:17 UTC

NX essentially acts as a compressing proxy for X11 connections. The problem I'm seeing is also apparently not specific to NX: it appears that a shutdown or reboot will fail following any level of system load activity.

Presently, the workaround I'm utilizing is to have set up heartbeat alarms within VMware that monitor the RHEL VMs for an unresponsive and issue a hardware reset when that state is detected. This is working to good effect, but the fact that the VM becomes unresponsive to the VMM seems to indicate an OS-level problem of some sort and I'm not quite sure where to go next.

Comment 60 Ric Wheeler 2011-03-22 20:32:26 UTC

Steven, are you 100% positive that you are not redirecting the NFS traffic through this NX proxy?

Without using the proxy, does the issue go away?

Thanks!

Comment 61 Jeff Layton 2011-03-22 20:49:40 UTC

I think to debug this you're going to have to do some analysis of what happens at shutdown time. Specifically, you'll probably want to instrument the NFS __umount_loop code in the shutdown script. Determine:

a) what processes are not being killed before the netfs script decides to lazy umount the filesystem

b) what those processes are doing that prevents them from being killed (this will probably mean getting stack traces of them, probably via sysrq-t)

One possibility is that there is just too much dirty data to be flushed before the network interfaces come down. If so, then you may also want to play with the patch that Bill suggested in comment #43.

If that doesn't fix it, then you probably have processes that are still actively dirtying pages after being SIGKILL'ed which is a little worrisome...

Comment 62 Steven Devoe 2011-03-23 15:30:52 UTC

Ric,

100% sure on that. NX doesn't have any functionality like that, it essentially just proxies and compresses X11 to make remote X sessions less network intensive. The NFS share /uhome is being mounted directly on each VM from a NetApp Filer.

Jeff,

I'll look into those things you suggest at first opportunity. It might take me some time to do the more in depth debugging, but I'll report back about adding in a sync as Bill suggests in #43 soon.

Comment 67 Viktor Varga 2011-06-21 15:20:59 UTC

Hi,

Based on https://access.redhat.com/kb/docs/DOC-47990, it seems I have this bug. Do you have an estimation when will we get errata for this?

Comment 68 Jeff Layton 2011-06-21 15:28:17 UTC

It's slated to ship with the 5.7 update.

Comment 69 Chris Williams 2011-06-21 15:37:55 UTC

Hi Viktor,
If you don't already have a support case on this issue, go ahead and open one. GSS will be happy to work with you on this.

Chris

Comment 70 Tomas Capek 2011-07-13 12:22:03 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, when the netfs script performed a lazy unmount on a NFS filesystem, sometimes cached data would be written out before the shutdown scripts were able to take down the network interfaces. This caused various machines to have been hanging on shutdown. With this update, the netfs script has been fixed and the physical machines no longer hang in the described scenario.

Comment 71 errata-xmlrpc 2011-07-21 08:34:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1081.html

Comment 72 errata-xmlrpc 2011-07-21 12:40:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1081.html

Note You need to log in before you can comment on or make changes to this bug.