Red Hat Bugzilla – Bug 454193
hung sync() syscall on box shutdown due to stuck NFS4 mount
Last modified: 2014-06-16 07:28:23 EDT
My wife's machine currently runs fedora 8. Ever since patching the kernel to
2.6.25, it no longer powers off at shutdown. It just hangs with the message
"Halting system...". Reverting the kernel to 2.6.24 seems to change this behavior.
I instrumented the "halt" init script and I can see that it's calling:
/sbin/halt -d -p
...it's just not actually powering off.
Created attachment 311093 [details]
output from dmesg
Created attachment 311094 [details]
Created attachment 311096 [details]
...let me know if you need any other info or need me to test patches. The
machine seems to reliably fail to power off.
It also hangs when rebooting.
confirmed on two machines based on MSI KT800 mobos. Exactly the same bahaviour
after upgrade to 2.6.25 line.
This problem also exists on a Tyan S2885 Thunder K8W Mainboard.
Created attachment 314868 [details]
[Tyan Thunder K8W] dmesg output
Created attachment 314869 [details]
[Tyan Thunder K8W] lspci -vvv output
Created attachment 314870 [details]
[Tyan Thunder K8W] dmidecode output
I think the problem in my case may be a buggy DSDT. Link to interesting post on the Gentoo forums about this:
...in my case, I was able to fix this on my wife's machine by adding this to the kernel command line:
acpi_os_name="Microsoft Windows XP"
I haven't tried any of the other methods of forcing the kernel to use a fixed DSDT, however.
that didn't fix it after all. For some reason, it successfully powered off a couple of times after I did this and then went back to hanging at shutdown.
Still, there definitely seems to be something motherboard specific here, and that probably means that the DSDT is broken.
I just tried kernel-126.96.36.199-14.fc8.i686 on this machine and it also didn't power off at shutdown. Whatever it is seems to be affecting kernels since 2.6.24. It also seems to occasionally work with these later kernels, but it shutdown often hangs.
Finally, I have another machine with the exact same motherboard (same BIOS rev and everything) that's running x86_64 F9. This machine never has a problem shutting down. I don't think the shutdown procedure has changed appreciably between F8 and F9, so I think the relevant thing is a 2.6.25+ kernel on i686.
I may have found a workaround. If I change the "halt" script to add -n to the halt args, then the power off seems to generally work. Perhaps something is happening to make the sync call in the poweroff hang? I'll need to experiment a bit more. Perhaps I can strace it...
strace shows that the halt command calls sync(). That syscall hangs indefinitely and prevents the poweroff from reliably occurring on this machine. I tried to gather some sysrq-t data at that point, but it didn't display anything on the screen.
If anyone knows of a way to determine why sync() is hanging at this point, let me know and I'll try to gather it.
Created attachment 323078 [details]
dmesg -- sysrq-t after sync command is hung on shutdown
I hacked up the /etc/init.d/halt script to drop to a shell just before it would ordinarily power off the computer. Just after it did this, I ran:
# sync &
...ps then showed that the sync command hung. I was then able to fire off sysrq-t a couple of times, remount / rw, and collect the contents of dmesg.
It shows that sync() is hung waiting for the s_umount semaphore. That seems to be held by the gam_server process which is hung waiting for an RPC call to come back.
It's not ever going to come back, however, because the network interfaces have already been torn down.
Pieter, does the machine that you're having problems powering off act as an NFS client?
I'll go ahead and grab this for now since it looks like it's an RPC/NFS problem of some sort (at least mine is).
gam_server looks like it's doing the last mntput() and that's causing the sb to get torn down. In the process, we're returning delegations, so the delegreturn thread was spawned. I think it's trying to do RPC calls, but since the network is down, they're not going to make it.
The part I'm not sure about -- did the gam_server process get signaled during the netfs shutdown? It seems like the umount should have had to wait until the process was actually dead.
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '8'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 8's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 8 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.