Red Hat Bugzilla – Bug 828823
NFS unreliable on F17 kernel on client with NFS root
Last modified: 2013-07-04 10:42:32 EDT
Description of problem:
This problem happens to a diskless client with NFS root, I that is related to the problem. (I have a boot option root=nfs:172.17.0.1:/remote/pluto.) It started after I upgraded it from F15 to F17. The original symptom was that it failed to boot with the new kernel. Systemd fails early on complaining that a lot of things like "media.mount" fails, and after a little while it hangs.
Trying to investigate the problem I've found that if I boot directly to a shell and try to mount a tmpfs filesystem on /media, that command hangs with the new kernel. With the old F15 kernel, but otherwise the same system, it returns as expected.
I've also occasionally had "ls -l /media" hang before anything is mounted. But that problem is not completely reproducible; it doesn't happen all the time.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot with init=/usr/bin/bash
2. At the shell prompt: mount -t tmpfs tmpfs /media
I don't get a new prompt.
New prompt, and a tmpfs mounted on /media
I've investigated this a bit more, using the newer 3.4.0-1 kernel. The behavior is not completely reproducible, sometimes it works better, other times if fails earlier. But NFS certainly appears to be unreliable with the new kernel.
It seems there are problems both if the client has a new kernel OR the server has a new kernel. If both the server and client runs a 3.4.0-1 kernel there are slightly less problems it appears, but it still fails.
If it doesn't fail directly when mounting as I described in comment 0, the accesses to files are unreliable. For example, I did an "rpm -Va" on the client in one case where I managed to boot it. Several files were indicated to have the wrong MD5 sum. If I did a chroot to the client's root on the server machine and did the same thing there, and the same files checked fine.
I picked one of the files to investigate closer. Running sha1sum on it gives different results on the server and the client. If I copy the incorrect file on the client and then compare the original and the copy on the server, I see that some apparently random parts of the file have been changed, mostly but not exclusively to zeroes. Around 5% of the file had changed.
I don't really know if this depends on the client using an NFS root. Obviously, that client is most dependent on NFS, so any problems will appear there first.
The most recent combination I happen to have which does work reliably for me is
3.2.7-1.bz795141.1.fc16.x86_64 on the server and 22.214.171.124-2.fc15.x86_64 on the client. I haven't found any immediate problems using those old kernels with system otherwise running F17. But I guess that is just a matter of time before something actually needs features from a newer kernel.
There are a number of fixes queued up to NFS in the 3.4.x stable series. That might explain some of the issues you are seeing, but it's hard to say for sure. Perhaps the NFS team will know which questions to ask.
I did a quick test with the 3.5.0-2.fc17 kernel on the client, but it didn't boot. The server was still running the old kernel in this case. I didn't do any closer investigation this time.
I had an opportunity to upgrade the server side kernel After that, I made another test. Now I have 3.5.1-1.fc17.x86_64 on the server, and the even newer 3.5.2-1.fc17.x86_64 on the client. With that combination, the client actually did boot! :-)
I just did it some minutes ago, and I don't dare to trust the stability just yet. But there is hope.
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '17'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 17's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 17 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged change the
'version' to a later Fedora version prior to Fedora 17's end of life.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
This bug hasn't reappeared since my last comments. So I think it's safe to assume that the fixes mentioned in comment 2 solved the issue.