Bug 454193

Summary: hung sync() syscall on box shutdown due to stuck NFS4 mount
Product: [Fedora] Fedora Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 8CC: development, marcin.wolyniak, staubach, steved
Target Milestone: ---Flags: development: needinfo-
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-09 07:50:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output from dmesg
none
lspci -vvv
none
dmidecode output
none
[Tyan Thunder K8W] dmesg output
none
[Tyan Thunder K8W] lspci -vvv output
none
[Tyan Thunder K8W] dmidecode output
none
dmesg -- sysrq-t after sync command is hung on shutdown none

Description Jeff Layton 2008-07-06 11:20:59 UTC
My wife's machine currently runs fedora 8. Ever since patching the kernel to
2.6.25, it no longer powers off at shutdown. It just hangs with the message
"Halting system...". Reverting the kernel to 2.6.24 seems to change this behavior.

I instrumented the "halt" init script and I can see that it's calling:

/sbin/halt -d -p

...it's just not actually powering off.

Comment 1 Jeff Layton 2008-07-06 11:21:34 UTC
Created attachment 311093 [details]
output from dmesg

Comment 2 Jeff Layton 2008-07-06 11:21:53 UTC
Created attachment 311094 [details]
lspci -vvv

Comment 3 Jeff Layton 2008-07-06 11:25:17 UTC
Created attachment 311096 [details]
dmidecode output

...let me know if you need any other info or need me to test patches. The
machine seems to reliably fail to power off.

Comment 4 Jeff Layton 2008-07-06 11:31:56 UTC
It also hangs when rebooting.

Comment 5 Marcin 2008-07-21 22:21:41 UTC
confirmed on two machines based on MSI KT800 mobos. Exactly the same bahaviour
after upgrade to 2.6.25 line.

Comment 6 Pieter de Rijk 2008-08-23 16:54:43 UTC
This problem also exists on a Tyan S2885 Thunder K8W Mainboard.

Comment 7 Pieter de Rijk 2008-08-23 16:58:14 UTC
Created attachment 314868 [details]
[Tyan Thunder K8W] dmesg output

Comment 8 Pieter de Rijk 2008-08-23 16:58:48 UTC
Created attachment 314869 [details]
[Tyan Thunder K8W] lspci -vvv output

Comment 9 Pieter de Rijk 2008-08-23 16:59:28 UTC
Created attachment 314870 [details]
[Tyan Thunder K8W] dmidecode output

Comment 10 Jeff Layton 2008-09-10 15:50:49 UTC
I think the problem in my case may be a buggy DSDT. Link to interesting post on the Gentoo forums about this:

http://forums.gentoo.org/viewtopic.php?t=122145

...in my case, I was able to fix this on my wife's machine by adding this to the kernel command line:

acpi_os_name="Microsoft Windows XP"

I haven't tried any of the other methods of forcing the kernel to use a fixed DSDT, however.

Comment 11 Jeff Layton 2008-09-10 16:00:01 UTC
actually...

that didn't fix it after all. For some reason, it successfully powered off a couple of times after I did this and then went back to hanging at shutdown.

Still, there definitely seems to be something motherboard specific here, and that probably means that the DSDT is broken.

Comment 12 Jeff Layton 2008-09-13 11:36:24 UTC
I just tried kernel-2.6.26.3-14.fc8.i686 on this machine and it also didn't power off at shutdown. Whatever it is seems to be affecting kernels since 2.6.24. It also seems to occasionally work with these later kernels, but it shutdown often hangs.

Finally, I have another machine with the exact same motherboard (same BIOS rev and everything) that's running x86_64 F9. This machine never has a problem shutting down. I don't think the shutdown procedure has changed appreciably between F8 and F9, so I think the relevant thing is a 2.6.25+ kernel on i686.

Comment 13 Jeff Layton 2008-11-09 12:40:10 UTC
I may have found a workaround. If I change the "halt" script to add -n to the halt args, then the power off seems to generally work. Perhaps something is happening to make the sync call in the poweroff hang? I'll need to experiment a bit more. Perhaps I can strace it...

Comment 14 Jeff Layton 2008-11-09 13:16:44 UTC
Confirmed....

strace shows that the halt command calls sync(). That syscall hangs indefinitely and prevents the poweroff from reliably occurring on this machine. I tried to gather some sysrq-t data at that point, but it didn't display anything on the screen.

If anyone knows of a way to determine why sync() is hanging at this point, let me know and I'll try to gather it.

Comment 15 Jeff Layton 2008-11-10 15:27:07 UTC
Created attachment 323078 [details]
dmesg -- sysrq-t after sync command is hung on shutdown

I hacked up the /etc/init.d/halt script to drop to a shell just before it would ordinarily power off the computer. Just after it did this, I ran:

# sync &

...ps then showed that the sync command hung. I was then able to fire off sysrq-t a couple of times, remount / rw, and collect the contents of dmesg.

It shows that sync() is hung waiting for the s_umount semaphore. That seems to be held by the gam_server process which is hung waiting for an RPC call to come back.
It's not ever going to come back, however, because the network interfaces have already been torn down.

Comment 16 Jeff Layton 2008-11-10 15:29:08 UTC
Pieter, does the machine that you're having problems powering off act as an NFS client?

Comment 17 Jeff Layton 2008-11-10 15:36:17 UTC
I'll go ahead and grab this for now since it looks like it's an RPC/NFS problem of some sort (at least mine is).

Comment 18 Jeff Layton 2008-11-10 17:41:12 UTC
Correction...

gam_server looks like it's doing the last mntput() and that's causing the sb to get torn down. In the process, we're returning delegations, so the delegreturn thread was spawned. I think it's trying to do RPC calls, but since the network is down, they're not going to make it.

The part I'm not sure about -- did the gam_server process get signaled during the netfs shutdown? It seems like the umount should have had to wait until the process was actually dead.

Comment 19 Bug Zapper 2008-11-26 10:58:06 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 20 Bug Zapper 2009-01-09 07:50:51 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.