Bug 377661 - nfs and autofs in new kernel broken
Summary: nfs and autofs in new kernel broken
Keywords:
Status: CLOSED DUPLICATE of bug 371341
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-11-12 12:45 UTC by Klaus Ethgen
Modified: 2007-11-30 15:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-30 11:50:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Klaus Ethgen 2007-11-12 12:45:33 UTC
With the last upgrade to 5.1 the following problem happens:
We have a setup with autofs mounted filesystems. There is also management stuff 
on NFS which is accessed on boot time. Now after the upgrade every FIRST access 
tu such a share is broken. The second access works as expected.

We debugged the problem and could find that the problem happens only when 
booting the new kernel. Booting with the old one but with all other packages 
upgraded the bug do not happen.

Version-Release number of selected component (if applicable):
2.6.18-53.el5

How reproducible:
- Setup a system with RedHat EL5.1
- Use a autofs map to mount a nfs share from anywhere (/net is also affected.)
- Do a ls to such a mounted path twice one after another

Actual results:
The first ls fails but the second lists the files.

Expected results:
Both ls should list the same.

Comment 1 Klaus Ethgen 2007-11-12 12:59:44 UTC
Please note: This is urgent as all management tools fails as they need first a 
running autofs path! So the bug has the same severity then the broken PVM on 
WS3 years before (Bug 191841)!

Comment 2 Jeff Layton 2007-11-12 15:47:21 UTC
Sounds more like a problem with autofs. I see that a patch went into -55.el5 to
fix a similar problem. You may want to try that kernel and see if this is
reproducible there.



Comment 3 Klaus Ethgen 2007-11-12 17:05:50 UTC
Well, not sure. It seems to be a NFS problem. I use cfs which use nfs to 
localhost and this is also a bit broken.

So yes, on a plain redhat machine it only happens on autofs.
But no, it is a nfs issue not a autofs problem as also other nfs tools are 
involved.

Comment 4 Fabian Arrotin 2007-11-17 14:51:21 UTC
I'm also using autofs to access a central NFS server here and it fails on every
first try .
I'll give the -55.el5 kernel a try and confirm it's working after that (i guess
you're talking about your test kernel packages available at
http://people.redhat.com/jlayton/ ? )
Btw, if that kernel corrects the problem, how long will we have to wait for an
'official' kernel to be released/available ?
Thanks

Comment 5 Akemi Yagi 2007-11-17 16:41:54 UTC
Related to Bug #371341. This is indeed a serious problem.

Comment 6 Akemi Yagi 2007-11-17 17:45:44 UTC
I ran a test with the -56.el5 kernel.  It fixed the problem for me both in i686
and x86_86.

Comment 7 Fabian Arrotin 2007-11-18 14:30:25 UTC
Ok, i've updated both kernel and nfs-utils on my test server and it solves the
problem ... so question is : when will a newer kernel be available in rhn
because i don't want to roll this 'test' kernel on every server/workstation here ...
As Akemi mentioned it, bug https://bugzilla.redhat.com/show_bug.cgi?id=371341 is
the same and can be considered as a duplicate ... but i guess more and more
people will complaint about autofs faling, especially for people relying on
autofs to mount home directories ...

Comment 8 Fabian Arrotin 2007-11-18 14:39:57 UTC
Update : Test kernel available 2.6.18-56 at http://people.redhat.com/jlayton/
solves the problem but brings another one : if the machine is itself a nfs
server, kernel panics as soon as the nfs service is launched !

Comment 9 Akemi Yagi 2007-11-18 14:44:12 UTC
I confirm the note by Fabian.  If I turn nfs on, reboot results in kernel panic.

Comment 10 Klaus Ethgen 2007-11-19 10:11:21 UTC
The Summary is wrong, not only autofs is broken. Also third party software 
using nfs stack is broken. (Like CFS for example.)

Comment 11 Jeff Layton 2007-11-19 14:29:19 UTC
I was able to reproduce the panic when starting the nfs service. The problem was
the patch that queued rpc connection attempts to the rpciod workqueue. I've got
an additional patch that seems to fix it and am building a new set of test
kernels with that now. 

That problem seems unrelated to the original problem reported here though. I
think the original issue was fixed in -55.el5 (bug 354621).

Comment 12 Klaus Ethgen 2007-11-19 15:17:16 UTC
I am not able to view the bug 354621 as I have not the permissions. So I 
cannot say any about those bug.

Comment 13 Akemi Yagi 2007-11-19 16:29:39 UTC
(In reply to comment #11)

> That problem seems unrelated to the original problem reported here though. I
> think the original issue was fixed in -55.el5 (bug 354621).

Jeff,

Which patch(es) fixed this problem in -55.el5?  Are they these?

linux-2.6-autofs4-fix-race-between-mount-and-expire.patch
linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch



Comment 14 Jeff Layton 2007-11-19 17:08:01 UTC
That sounds right. Closing this as a dupe.

*** This bug has been marked as a duplicate of 354621 ***

Comment 15 Ian Kent 2007-11-21 01:28:02 UTC
(In reply to comment #13)
> (In reply to comment #11)
> 
> > That problem seems unrelated to the original problem reported here though. I
> > think the original issue was fixed in -55.el5 (bug 354621).
> 
> Jeff,
> 
> Which patch(es) fixed this problem in -55.el5?  Are they these?
> 
> linux-2.6-autofs4-fix-race-between-mount-and-expire.patch
> linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch

Actually, the second patch here contains the two corrections.
The first patch was already present.

Ian

Comment 16 Akemi Yagi 2007-11-21 02:16:30 UTC
(In reply to comment #15)
> (In reply to comment #13)
> > (In reply to comment #11)
> > linux-2.6-autofs4-fix-race-between-mount-and-expire.patch
> > linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch
> 
> Actually, the second patch here contains the two corrections.
> The first patch was already present.
> 
> Ian

Thank you for the reply.  In fact, I had figured that out from your comment in
Bug #371341.  I rebuilt the kernel with that patch and it is working fine now. 
  Is it likely that this fix makes it into the next kernel update in 5.1?

Akemi 


Comment 17 Thorsten Scherf 2007-11-21 13:29:55 UTC
workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs


Comment 18 Klaus Ethgen 2007-11-21 13:40:17 UTC
I am still not allowed to read the possible duplcate bug. So the problem is no 
soved at all!

Comment 19 Klaus Ethgen 2007-11-21 13:42:24 UTC
Note for Thorsten:
This workaround is only for autofs. Other (nonredhat) software which also 
depends on this bug is still problematic.

Comment 20 Klaus Ethgen 2007-11-21 13:45:03 UTC
Ah, yes, another crosslink to the related service request: 1783044

Comment 21 Akemi Yagi 2007-11-21 15:13:15 UTC
(In reply to comment #17)
> workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs

I think the note in Comment #4 of Bug #371341 must be appended here as well:
=============================================================
This only works if your auto.home map explicitly
lists every entry, i.e., it does NOT use wildcards like
  * server:/export/home/&
=============================================================


Comment 22 sabuj pattanayek 2007-11-29 17:41:23 UTC
I tried the DEFAULT_BROWSE_MODE="yes" workaround and there's still a timing
issue with autofs mounting the mount. Upon login I get a error window that says
~/.dmrc isn't read/writeable by the user. This may be gdm checking permissions
just before the mount occurs. Any fixes for this?

Comment 23 Jeff Layton 2007-11-29 17:55:38 UTC
The kernels that contained the patch in question were under security embargo
last week. That embargo is now lifted. Could you test the kernels on his people
page and report back as to whether they fix the issue for you?

http://people.redhat.com/dzickus/el5/58.el5/

Comment 24 Klaus Ethgen 2007-11-30 09:25:57 UTC
What do you understand of "security embargo"????? Do that mean that redhat 
holds fixes back to allow intruders to use the hole?

But an other question: there is a new kernel errata (http://rhn.redhat.com/
errata/RHSA-2007-0993.html) which should fix the problem (Information from 
redhat days ago from a telephone conference). But I cannot find the Number 
377661 in the list of fixed bugs. 371341, which seems to be the same bug is 
also not listed there. So what is the different of the official errata and the 
kernel 58.el5?

Comment 25 Jeff Layton 2007-11-30 11:50:48 UTC
From the -58.el5 changelog:

   - [autofs4] fix race between mount and expire (Ian Kent ) [354621]

...the same patch also seems to be in -53.1.4 as:

   - [autofs4] fix race between mount and expire (Ian Kent ) [381071]

...this is the patch that I suspect fixes this problem for autofs users.
Regrettably, I can't make bug 354621 public, but if you speak with your Red Hat
support contact they may be able to CC you on one of these bugs so that you can
read the details.

If you have a reproducible problem that doesn't involve autofs (like you hinted
in comment #3), then I suggest opening a new BZ for that.

I will note that mounting via nfs over the loopback is a known problematic
configuration and is not recommended.

For now, I'm going to close this bug as a duplicate of these other bugs. If
you're still able to reproduce the problem with autofs on one of the above
kernels, then please reopen this bug and we'll have a closer look.


*** This bug has been marked as a duplicate of 371341 ***

Comment 26 Klaus Ethgen 2007-11-30 12:37:18 UTC
The problem seems to be solved with the 2.6.18-53.1.4.el5.

However, I understand that you cannot make the private bug 354621 open. Private 
bugs in bugzilla just don't exists for other users as they are not searchable 
or in any kind checkable for users. So don't link such bug. For your suggestion 
that the support team should cc me, they told me that they cannot as they have 
no (technical) rights to do that. But if a bug is closed and addressed by a 
errata I suppose that this is listed in the errata under "Bugs fixed". But 
neighter this bug nor 371341 is listed in the errata.

If the problem with CFS is also fixed I still have to see. (CFS is technical 
using nfs to localhost and only to localhost. Note that all other handling 
would be insecure and would negate the sense of CFS!)

Closing the bug is ok for me. I will post a new if the later problem still 
exists. (371341 is not private and so this is also ok.)

Comment 27 Klaus Ethgen 2007-11-30 12:38:29 UTC
Just to know. You didn't answer my question about the "security embargo" above. 
Can you please do that?

Comment 28 Jeff Layton 2007-11-30 12:45:59 UTC
I believe this page from the fedora wiki explains what an embargoed security bug is:

http://fedoraproject.org/wiki/Security/Bugs#head-0b84564dbeb494452afe89f812d3e78112a6e82e

...since the bug in question was not public we were barred from releasing the
fix until the agreed-upon date.



Note You need to log in before you can comment on or make changes to this bug.