Red Hat Bugzilla – Bug 377661
nfs and autofs in new kernel broken
Last modified: 2007-11-30 10:54:26 EST
With the last upgrade to 5.1 the following problem happens:
We have a setup with autofs mounted filesystems. There is also management stuff
on NFS which is accessed on boot time. Now after the upgrade every FIRST access
tu such a share is broken. The second access works as expected.
We debugged the problem and could find that the problem happens only when
booting the new kernel. Booting with the old one but with all other packages
upgraded the bug do not happen.
Version-Release number of selected component (if applicable):
- Setup a system with RedHat EL5.1
- Use a autofs map to mount a nfs share from anywhere (/net is also affected.)
- Do a ls to such a mounted path twice one after another
The first ls fails but the second lists the files.
Both ls should list the same.
Please note: This is urgent as all management tools fails as they need first a
running autofs path! So the bug has the same severity then the broken PVM on
WS3 years before (Bug 191841)!
Sounds more like a problem with autofs. I see that a patch went into -55.el5 to
fix a similar problem. You may want to try that kernel and see if this is
Well, not sure. It seems to be a NFS problem. I use cfs which use nfs to
localhost and this is also a bit broken.
So yes, on a plain redhat machine it only happens on autofs.
But no, it is a nfs issue not a autofs problem as also other nfs tools are
I'm also using autofs to access a central NFS server here and it fails on every
first try .
I'll give the -55.el5 kernel a try and confirm it's working after that (i guess
you're talking about your test kernel packages available at
http://people.redhat.com/jlayton/ ? )
Btw, if that kernel corrects the problem, how long will we have to wait for an
'official' kernel to be released/available ?
Related to Bug #371341. This is indeed a serious problem.
I ran a test with the -56.el5 kernel. It fixed the problem for me both in i686
Ok, i've updated both kernel and nfs-utils on my test server and it solves the
problem ... so question is : when will a newer kernel be available in rhn
because i don't want to roll this 'test' kernel on every server/workstation here ...
As Akemi mentioned it, bug https://bugzilla.redhat.com/show_bug.cgi?id=371341 is
the same and can be considered as a duplicate ... but i guess more and more
people will complaint about autofs faling, especially for people relying on
autofs to mount home directories ...
Update : Test kernel available 2.6.18-56 at http://people.redhat.com/jlayton/
solves the problem but brings another one : if the machine is itself a nfs
server, kernel panics as soon as the nfs service is launched !
I confirm the note by Fabian. If I turn nfs on, reboot results in kernel panic.
The Summary is wrong, not only autofs is broken. Also third party software
using nfs stack is broken. (Like CFS for example.)
I was able to reproduce the panic when starting the nfs service. The problem was
the patch that queued rpc connection attempts to the rpciod workqueue. I've got
an additional patch that seems to fix it and am building a new set of test
kernels with that now.
That problem seems unrelated to the original problem reported here though. I
think the original issue was fixed in -55.el5 (bug 354621).
I am not able to view the bug 354621 as I have not the permissions. So I
cannot say any about those bug.
(In reply to comment #11)
> That problem seems unrelated to the original problem reported here though. I
> think the original issue was fixed in -55.el5 (bug 354621).
Which patch(es) fixed this problem in -55.el5? Are they these?
That sounds right. Closing this as a dupe.
*** This bug has been marked as a duplicate of 354621 ***
(In reply to comment #13)
> (In reply to comment #11)
> > That problem seems unrelated to the original problem reported here though. I
> > think the original issue was fixed in -55.el5 (bug 354621).
> Which patch(es) fixed this problem in -55.el5? Are they these?
Actually, the second patch here contains the two corrections.
The first patch was already present.
(In reply to comment #15)
> (In reply to comment #13)
> > (In reply to comment #11)
> > linux-2.6-autofs4-fix-race-between-mount-and-expire.patch
> > linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch
> Actually, the second patch here contains the two corrections.
> The first patch was already present.
Thank you for the reply. In fact, I had figured that out from your comment in
Bug #371341. I rebuilt the kernel with that patch and it is working fine now.
Is it likely that this fix makes it into the next kernel update in 5.1?
workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs
I am still not allowed to read the possible duplcate bug. So the problem is no
soved at all!
Note for Thorsten:
This workaround is only for autofs. Other (nonredhat) software which also
depends on this bug is still problematic.
Ah, yes, another crosslink to the related service request: 1783044
(In reply to comment #17)
> workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs
I think the note in Comment #4 of Bug #371341 must be appended here as well:
This only works if your auto.home map explicitly
lists every entry, i.e., it does NOT use wildcards like
I tried the DEFAULT_BROWSE_MODE="yes" workaround and there's still a timing
issue with autofs mounting the mount. Upon login I get a error window that says
~/.dmrc isn't read/writeable by the user. This may be gdm checking permissions
just before the mount occurs. Any fixes for this?
The kernels that contained the patch in question were under security embargo
last week. That embargo is now lifted. Could you test the kernels on his people
page and report back as to whether they fix the issue for you?
What do you understand of "security embargo"????? Do that mean that redhat
holds fixes back to allow intruders to use the hole?
But an other question: there is a new kernel errata (http://rhn.redhat.com/
errata/RHSA-2007-0993.html) which should fix the problem (Information from
redhat days ago from a telephone conference). But I cannot find the Number
377661 in the list of fixed bugs. 371341, which seems to be the same bug is
also not listed there. So what is the different of the official errata and the
From the -58.el5 changelog:
- [autofs4] fix race between mount and expire (Ian Kent ) 
...the same patch also seems to be in -53.1.4 as:
- [autofs4] fix race between mount and expire (Ian Kent ) 
...this is the patch that I suspect fixes this problem for autofs users.
Regrettably, I can't make bug 354621 public, but if you speak with your Red Hat
support contact they may be able to CC you on one of these bugs so that you can
read the details.
If you have a reproducible problem that doesn't involve autofs (like you hinted
in comment #3), then I suggest opening a new BZ for that.
I will note that mounting via nfs over the loopback is a known problematic
configuration and is not recommended.
For now, I'm going to close this bug as a duplicate of these other bugs. If
you're still able to reproduce the problem with autofs on one of the above
kernels, then please reopen this bug and we'll have a closer look.
*** This bug has been marked as a duplicate of 371341 ***
The problem seems to be solved with the 2.6.18-53.1.4.el5.
However, I understand that you cannot make the private bug 354621 open. Private
bugs in bugzilla just don't exists for other users as they are not searchable
or in any kind checkable for users. So don't link such bug. For your suggestion
that the support team should cc me, they told me that they cannot as they have
no (technical) rights to do that. But if a bug is closed and addressed by a
errata I suppose that this is listed in the errata under "Bugs fixed". But
neighter this bug nor 371341 is listed in the errata.
If the problem with CFS is also fixed I still have to see. (CFS is technical
using nfs to localhost and only to localhost. Note that all other handling
would be insecure and would negate the sense of CFS!)
Closing the bug is ok for me. I will post a new if the later problem still
exists. (371341 is not private and so this is also ok.)
Just to know. You didn't answer my question about the "security embargo" above.
Can you please do that?
I believe this page from the fedora wiki explains what an embargoed security bug is:
...since the bug in question was not public we were barred from releasing the
fix until the agreed-upon date.