Created attachment 478253 [details] Console Page 1 Description of problem: RHEL 5.6 VMware guest with NFS homedirs crashes if users have recently logged in remotely via NX while a shutdown occurs. Problem was introduced with the RHEL 5.5 -> RHEL 5.6 upgrade, and is not present on unpatched systems. kdump fails to write dump logs to local disk, I've attached screenshots gathered from the VMware console in their place. Version-Release number of selected component (if applicable): RHEL 5.6 w/kernel 2.6.18-238.1.1.el5PAE How reproducible: Consistently reproduceable by having users log into NX sessions before issuing the shutdown command. Problem does not occur if the system receives no NX user activity between reboots. Steps to Reproduce: 1. Connect to KDE or Gnome session via NX Client 2. Issue shutdown command as root directly or via cron 3. Wait for crash Actual results: See attached .jpg images of console output. System does not shut down cleanly, requiring manual administrative intervention (power cycling/VM reset) to bring the machine back into a ready state. Expected results: A clean reboot or shutdown. Additional info:
Created attachment 478254 [details] Console Page 2
Created attachment 478255 [details] Console Page 3
Created attachment 478256 [details] Console Page 4
Created attachment 478257 [details] Console Page 5
Created attachment 478258 [details] Console Page 6
Created attachment 478259 [details] Console Page 7
Created attachment 478260 [details] Console Page 8
Created attachment 478261 [details] Console Page 9
Created attachment 478262 [details] Console Page 23
Hi Steven, A few quick questions for you. What is NX client? Is it a binary driver? If you have a support contract with RH, please open a support ticket so our field people can help us debug. Thanks!
It's not a crash, you're getting hung task warnings... Looks like the client is stuckwaiting for the PG_BUSY bit to clear after being written back. Perhaps the network interface went down before the fs was unmounted?
Ric, NX referring to NoMachine NX, a VNC-type service tightly integrated with X Windows. No active support contract to my knowledge. Jeff, When /etc/rc6.d is set up to run K20nfs and K90network, NFS should come down before the network interfaces, correct? Is there a way to revert back to the behavior of RHEL 5.5 in this case without downgrading?
/etc/init.d/nfs refers to server-side NFS The unmounts should happen in /etc/init.d/netfs, which is typically set to K75. But...there are a lot of possible network interfaces that don't necessarily go down in K90. Ric is quite right that you'll likely need to open a support case for us to gather more info about the problem.
What additional information is required? I'm happy to supply anything necessary, but don't understand why a support contract would be required to triage this bug.
Hi Steven, When you have our support people work with you, they help us get information, debug and even produce patches. Part of the value you get from being a customer. It also means that they can make sure that the development team does not have to triage issues that might be resolved already. If you don't have a RH subscription, we prefer to work the issues through upstream lists or community based distros. Does not mean you won't get help (probably from the same people), just helps us make sure that our support efforts are working correctly. In general, Red Hat bugzilla is not meant to be an end user support tool since that cuts our field out.
One thought since this is a vmware guest - do you have a different/vmware specific virt nic that gets shut down too early?
Gents, I do appreciate the help. To give you some background, my situation is that I'm a relatively new admin here in one of the smaller units of one of the colleges, and our RHEL support contract is all handled through central IT. I don't have access to any of that information directly. I'll attempt to obtain it, though this could take some time. As far as the VMnic, it's a good thought but they appear to be configured identically to other NICs. vmware-tools unloads at K99, after K75netfs and K90network. Thanks again for the assistance. I'll work through the support team to try to track down this bug.
Created attachment 479205 [details] SOS report Adding sosreport per request of RHEL Engineer Jessica
Just wondering if any status update was available? Thanks as always!
(In reply to comment #35) > The customer turned up nfs client debugging and crashed the system while the > issue was occurring to get us a core dump. > Ok, I had a look at the core. Not much interesting there really. I found the file that the httplog program was blocking on, and backtracked that to the rpc structures that make up the client and its queue. There are only two calls in queue -- one is a write call which is pretty much what I expected. All of the network interfaces have been deconfigured which is also expected and therein lies the problem -- the NFS client can't make progress flushing pages without network connectivity. I think I've also ID'ed a potential initscripts problem that could cause this but I need to look at it a bit more closely and involve the people that maintain that package.
Looking at the netfs init script, it does this: NFSMTAB=`LC_ALL=C awk '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" { print $2 }' /proc/mounts` ...so a non empty $NFSMTAB means that we have nfs mounts that need to be unmounted. In the "stop" section we have this: if [ -n "$NFSMTAB" ]; then __umount_loop '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" {print $2}' \ /proc/mounts \ $"Unmounting NFS filesystems: " \ $"Unmounting NFS filesystems (retry): " \ "-f -l" fi ...which calls __umount loop with a list of mounts that need to be unmounted. __umount_loop is in /etc/init.d/functions: ---------------[snip]------------------- # __umount_loop awk_program fstab_file first_msg retry_msg umount_args # awk_program should process fstab_file and return a list of fstab-encoded # paths; it doesn't have to handle comments in fstab_file. __umount_loop() { local remaining sig= local retry=3 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) while [ -n "$remaining" -a "$retry" -gt 0 ]; do if [ "$retry" -eq 3 ]; then action "$3" fstab-decode umount $5 $remaining else action "$4" fstab-decode umount $5 $remaining fi sleep 2 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) [ -z "$remaining" ] && break fstab-decode /sbin/fuser -k -m $sig $remaining >/dev/null sleep 5 retry=$(($retry -1)) sig=-9 done } ---------------[snip]------------------- ...the interesting part is are the two umount calls in the 'if [ "$retry" -eq 3 ]' block. They both pass in $5 unconditionally, which is the "-f -l" argument. So if I'm interpreting this correctly, RHEL5 pretty much *never* attempts a normal umount on NFS filesystems -- it always does a lazy umount. This means that the first umount always "succeeds" due to the "-f", and the "fuser -k" calls don't usually get done. I'm actually surprised we don't see this hang more often. I think the right solution is to turn the "umount_args" argument into a "retry_umount_args" argument. IOW, call the umount without any extra flags first, and then add in the extra flags if and when that fails. That may not prevent all hangs, but should take care of most of these problems.
I'm not sure why this would change on 5.5 -> 5.6; nothing changed there in initscripts. The lazy umount has been there since 2002, and I'm somewhat worried that changing that in a RHEL update could cause more issues than it solves. There's a change upstream that may help: commit decf19bb9dc7b70ad89f9154899e73df069f5e62 Author: Bill Nottingham <notting> Date: Fri Feb 25 16:41:25 2011 -0500 Call sync after nfs unmount, otherwise we'll hang when the kernel syncs later. (#637500) diff --git a/rc.d/init.d/netfs b/rc.d/init.d/netfs index 8d9854f..3713cf4 100755 --- a/rc.d/init.d/netfs +++ b/rc.d/init.d/netfs @@ -116,6 +116,7 @@ case "$1" in $"Unmounting NFS filesystems: " \ $"Unmounting NFS filesystems (retry): " \ "-f -l" + sync fi if [ -n "$CIFSMTAB" ]; then for MNT in $CIFSMTAB; do
I think I've found the probable cause of the change -- commit f4fa2b45. Because of a possible oops when truncating a file, we had to change nfs_wait_on_request() to use an uninterruptible sleep. Many of the processes are hung in this codepath, and since it now uses an uninterruptible sleep they're no longer affected by signals. The fact that they used to be probably is what helps work around the fundamental problem. I really don't see an alternative to fixing the init scripts properly. The lazy umount allows processes with files already open to continue dirtying pages even after the umount returns. Adding a /bin/sync call won't really do anything to fix that problem. Those processes can just dirty more pages after it returns.
(In reply to comment #41) > > I think the right solution is to turn the "umount_args" argument into a > "retry_umount_args" argument. IOW, call the umount without any extra flags > first, and then add in the extra flags if and when that fails. > I'm not sure if this is what you had in mind, but I had the customer try this: --- initscripts-8.45.33/rc.d/init.d/functions.orig 2011-03-01 15:29:20.000000000 -0500 +++ initscripts-8.45.33/rc.d/init.d/functions 2011-03-01 15:33:13.000000000 -0500 @@ -85,7 +85,7 @@ __umount_loop() { if [ "$retry" -eq 3 ]; then action "$3" fstab-decode umount $5 $remaining else - action "$4" fstab-decode umount $5 $remaining + action "$4" fstab-decode umount "" $remaining fi sleep 2 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) And it did not resolve their issues.
Created attachment 482324 [details] patch -- only pass umount args on retry in __umount_loop No, that looks wrong. "retry" starts at 3, and we want to only pass in the umount_args on the passes after the first attempt. This patch should be more correct. Bill, any thoughts?
It makes me a bit nervous in that it's changing the semantics of the function. That's alleviated by the fact that at least in what we ship, nfs umount is the only thing that uses the args. I don't *think* we have third-party scripts piggybacking on that function.
That's a good point. We could mitigate that risk by leaving __umount_loop alone, add a new variant that has these semantics, and have netfs and maybe halt call the new one. Before we do anything further though I'd like to know whether this approach actually fixes the issue or not. For now, I'll set this to NEEDINFO for John. He can let us know whether it helps or not once they've had a chance to test it.
Customer reported the test package successfully resolved their issues. They had 5 servers they were using for testing that would demonstrate the issue on most reboots. Over the weekend, they rebooted each of them 130 times and none of them showed any problems. -John
That's excellent news. Bill, would you still prefer that we add a new variant of this function instead of altering __umount_loop?
Altering __umount_loop is probably OK, given the "__" prefix (aka, for-internal-use-only.) Added upstream in Fedora as http://git.fedorahosted.org/git?p=initscripts.git;a=commitdiff;h=a9b0d6b5c655da96783851d5304c4d800d4e4553
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I've tested the patch provided along with the modification in 682879. Unfortunately the machine still enters the same unresponsive state when performing a reboot after an NX user has logged in, with similar error messages concerning NFS and call traces displaying on the console, refreshing periodically.
Not sure why that would be. We have a report of a similar problem by a customer who reported that this patch fixed it for him. I'm not very familiar with what "NX" does, so you'll probably need to do some debugging to track down what the actual problem is.
NX essentially acts as a compressing proxy for X11 connections. The problem I'm seeing is also apparently not specific to NX: it appears that a shutdown or reboot will fail following any level of system load activity. Presently, the workaround I'm utilizing is to have set up heartbeat alarms within VMware that monitor the RHEL VMs for an unresponsive and issue a hardware reset when that state is detected. This is working to good effect, but the fact that the VM becomes unresponsive to the VMM seems to indicate an OS-level problem of some sort and I'm not quite sure where to go next.
Steven, are you 100% positive that you are not redirecting the NFS traffic through this NX proxy? Without using the proxy, does the issue go away? Thanks!
I think to debug this you're going to have to do some analysis of what happens at shutdown time. Specifically, you'll probably want to instrument the NFS __umount_loop code in the shutdown script. Determine: a) what processes are not being killed before the netfs script decides to lazy umount the filesystem b) what those processes are doing that prevents them from being killed (this will probably mean getting stack traces of them, probably via sysrq-t) One possibility is that there is just too much dirty data to be flushed before the network interfaces come down. If so, then you may also want to play with the patch that Bill suggested in comment #43. If that doesn't fix it, then you probably have processes that are still actively dirtying pages after being SIGKILL'ed which is a little worrisome...
Ric, 100% sure on that. NX doesn't have any functionality like that, it essentially just proxies and compresses X11 to make remote X sessions less network intensive. The NFS share /uhome is being mounted directly on each VM from a NetApp Filer. Jeff, I'll look into those things you suggest at first opportunity. It might take me some time to do the more in depth debugging, but I'll report back about adding in a sync as Bill suggests in #43 soon.
Hi, Based on https://access.redhat.com/kb/docs/DOC-47990, it seems I have this bug. Do you have an estimation when will we get errata for this?
It's slated to ship with the 5.7 update.
Hi Viktor, If you don't already have a support case on this issue, go ahead and open one. GSS will be happy to work with you on this. Chris
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, when the netfs script performed a lazy unmount on a NFS filesystem, sometimes cached data would be written out before the shutdown scripts were able to take down the network interfaces. This caused various machines to have been hanging on shutdown. With this update, the netfs script has been fixed and the physical machines no longer hang in the described scenario.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1081.html