Bug 828823 - NFS unreliable on F17 kernel on client with NFS root
NFS unreliable on F17 kernel on client with NFS root
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
17
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: nfs-maint
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-05 08:37 EDT by Göran Uddeborg
Modified: 2013-07-04 10:42 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-04 10:42:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Göran Uddeborg 2012-06-05 08:37:06 EDT
Description of problem:
This problem happens to a diskless client with NFS root, I that is related to the problem.  (I have a boot option root=nfs:172.17.0.1:/remote/pluto.)  It started after I upgraded it from F15 to F17.  The original symptom was that it failed to boot with the new kernel.  Systemd fails early on complaining that a lot of things like "media.mount" fails, and after a little while it hangs.

Trying to investigate the problem I've found that if I boot directly to a shell and try to mount a tmpfs filesystem on /media, that command hangs with the new kernel.  With the old F15 kernel, but otherwise the same system, it returns as expected.

I've also occasionally had "ls -l /media" hang before anything is mounted.  But that problem is not completely reproducible; it doesn't happen all the time.

Version-Release number of selected component (if applicable):
kernel-2.6.43.5-2.fc15.x86_64 (works)
kernel-3.3.7-1.fc17.x86_64 (hangs)
util-linux-2.21.2-1.fc17.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Boot with init=/usr/bin/bash
2. At the shell prompt: mount -t tmpfs tmpfs /media
  
Actual results:
I don't get a new prompt.

Expected results:
New prompt, and a tmpfs mounted on /media
Comment 1 Göran Uddeborg 2012-06-16 17:44:53 EDT
I've investigated this a bit more, using the newer 3.4.0-1 kernel.  The behavior is not completely reproducible, sometimes it works better, other times if fails earlier.  But NFS certainly appears to be unreliable with the new kernel.

It seems there are problems both if the client has a new kernel OR the server has a new kernel.  If both the server and client runs a 3.4.0-1 kernel there are slightly less problems it appears, but it still fails.

If it doesn't fail directly when mounting as I described in comment 0, the accesses to files are unreliable.  For example, I did an "rpm -Va" on the client in one case where I managed to boot it.  Several files were indicated to have the wrong MD5 sum.  If I did a chroot to the client's root on the server machine and did the same thing there, and the same files checked fine.

I picked one of the files to investigate closer.  Running sha1sum on it gives different results on the server and the client.  If I copy the incorrect file on the client and then compare the original and the copy on the server, I see that some apparently random parts of the file have been changed, mostly but not exclusively to zeroes.  Around 5% of the file had changed.

I don't really know if this depends on the client using an NFS root.  Obviously, that client is most dependent on NFS, so any problems will appear there first.

The most recent combination I happen to have which does work reliably for me is
3.2.7-1.bz795141.1.fc16.x86_64 on the server and 2.6.43.5-2.fc15.x86_64 on the client.  I haven't found any immediate problems using those old kernels with system otherwise running F17.  But I guess that is just a matter of time before something actually needs features from a newer kernel.
Comment 2 Josh Boyer 2012-07-05 11:50:22 EDT
There are a number of fixes queued up to NFS in the 3.4.x stable series.  That might explain some of the issues you are seeing, but it's hard to say for sure.  Perhaps the NFS team will know which questions to ask.
Comment 3 Göran Uddeborg 2012-08-07 09:13:54 EDT
I did a quick test with the 3.5.0-2.fc17 kernel on the client, but it didn't boot.  The server was still running the old kernel in this case.  I didn't do any closer investigation this time.
Comment 4 Göran Uddeborg 2012-08-20 12:39:30 EDT
I had an opportunity to upgrade the server side kernel  After that, I made another test.  Now I have 3.5.1-1.fc17.x86_64 on the server, and the even newer 3.5.2-1.fc17.x86_64 on the client.  With that combination, the client actually did boot! :-)

I just did it some minutes ago, and I don't dare to trust the stability just yet.  But there is hope.
Comment 5 Fedora End Of Life 2013-07-03 21:18:19 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 6 Göran Uddeborg 2013-07-04 10:42:32 EDT
This bug hasn't reappeared since my last comments.  So I think it's safe to assume that the fixes mentioned in comment 2 solved the issue.

Note You need to log in before you can comment on or make changes to this bug.