With the last upgrade to 5.1 the following problem happens: We have a setup with autofs mounted filesystems. There is also management stuff on NFS which is accessed on boot time. Now after the upgrade every FIRST access tu such a share is broken. The second access works as expected. We debugged the problem and could find that the problem happens only when booting the new kernel. Booting with the old one but with all other packages upgraded the bug do not happen. Version-Release number of selected component (if applicable): 2.6.18-53.el5 How reproducible: - Setup a system with RedHat EL5.1 - Use a autofs map to mount a nfs share from anywhere (/net is also affected.) - Do a ls to such a mounted path twice one after another Actual results: The first ls fails but the second lists the files. Expected results: Both ls should list the same.
Please note: This is urgent as all management tools fails as they need first a running autofs path! So the bug has the same severity then the broken PVM on WS3 years before (Bug 191841)!
Sounds more like a problem with autofs. I see that a patch went into -55.el5 to fix a similar problem. You may want to try that kernel and see if this is reproducible there.
Well, not sure. It seems to be a NFS problem. I use cfs which use nfs to localhost and this is also a bit broken. So yes, on a plain redhat machine it only happens on autofs. But no, it is a nfs issue not a autofs problem as also other nfs tools are involved.
I'm also using autofs to access a central NFS server here and it fails on every first try . I'll give the -55.el5 kernel a try and confirm it's working after that (i guess you're talking about your test kernel packages available at http://people.redhat.com/jlayton/ ? ) Btw, if that kernel corrects the problem, how long will we have to wait for an 'official' kernel to be released/available ? Thanks
Related to Bug #371341. This is indeed a serious problem.
I ran a test with the -56.el5 kernel. It fixed the problem for me both in i686 and x86_86.
Ok, i've updated both kernel and nfs-utils on my test server and it solves the problem ... so question is : when will a newer kernel be available in rhn because i don't want to roll this 'test' kernel on every server/workstation here ... As Akemi mentioned it, bug https://bugzilla.redhat.com/show_bug.cgi?id=371341 is the same and can be considered as a duplicate ... but i guess more and more people will complaint about autofs faling, especially for people relying on autofs to mount home directories ...
Update : Test kernel available 2.6.18-56 at http://people.redhat.com/jlayton/ solves the problem but brings another one : if the machine is itself a nfs server, kernel panics as soon as the nfs service is launched !
I confirm the note by Fabian. If I turn nfs on, reboot results in kernel panic.
The Summary is wrong, not only autofs is broken. Also third party software using nfs stack is broken. (Like CFS for example.)
I was able to reproduce the panic when starting the nfs service. The problem was the patch that queued rpc connection attempts to the rpciod workqueue. I've got an additional patch that seems to fix it and am building a new set of test kernels with that now. That problem seems unrelated to the original problem reported here though. I think the original issue was fixed in -55.el5 (bug 354621).
I am not able to view the bug 354621 as I have not the permissions. So I cannot say any about those bug.
(In reply to comment #11) > That problem seems unrelated to the original problem reported here though. I > think the original issue was fixed in -55.el5 (bug 354621). Jeff, Which patch(es) fixed this problem in -55.el5? Are they these? linux-2.6-autofs4-fix-race-between-mount-and-expire.patch linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch
That sounds right. Closing this as a dupe. *** This bug has been marked as a duplicate of 354621 ***
(In reply to comment #13) > (In reply to comment #11) > > > That problem seems unrelated to the original problem reported here though. I > > think the original issue was fixed in -55.el5 (bug 354621). > > Jeff, > > Which patch(es) fixed this problem in -55.el5? Are they these? > > linux-2.6-autofs4-fix-race-between-mount-and-expire.patch > linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch Actually, the second patch here contains the two corrections. The first patch was already present. Ian
(In reply to comment #15) > (In reply to comment #13) > > (In reply to comment #11) > > linux-2.6-autofs4-fix-race-between-mount-and-expire.patch > > linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch > > Actually, the second patch here contains the two corrections. > The first patch was already present. > > Ian Thank you for the reply. In fact, I had figured that out from your comment in Bug #371341. I rebuilt the kernel with that patch and it is working fine now. Is it likely that this fix makes it into the next kernel update in 5.1? Akemi
workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs
I am still not allowed to read the possible duplcate bug. So the problem is no soved at all!
Note for Thorsten: This workaround is only for autofs. Other (nonredhat) software which also depends on this bug is still problematic.
Ah, yes, another crosslink to the related service request: 1783044
(In reply to comment #17) > workaround for this is to set "DEFAULT_BROWSE_MODE="yes" in /etc/sysconfig/autofs I think the note in Comment #4 of Bug #371341 must be appended here as well: ============================================================= This only works if your auto.home map explicitly lists every entry, i.e., it does NOT use wildcards like * server:/export/home/& =============================================================
I tried the DEFAULT_BROWSE_MODE="yes" workaround and there's still a timing issue with autofs mounting the mount. Upon login I get a error window that says ~/.dmrc isn't read/writeable by the user. This may be gdm checking permissions just before the mount occurs. Any fixes for this?
The kernels that contained the patch in question were under security embargo last week. That embargo is now lifted. Could you test the kernels on his people page and report back as to whether they fix the issue for you? http://people.redhat.com/dzickus/el5/58.el5/
What do you understand of "security embargo"????? Do that mean that redhat holds fixes back to allow intruders to use the hole? But an other question: there is a new kernel errata (http://rhn.redhat.com/ errata/RHSA-2007-0993.html) which should fix the problem (Information from redhat days ago from a telephone conference). But I cannot find the Number 377661 in the list of fixed bugs. 371341, which seems to be the same bug is also not listed there. So what is the different of the official errata and the kernel 58.el5?
From the -58.el5 changelog: - [autofs4] fix race between mount and expire (Ian Kent ) [354621] ...the same patch also seems to be in -53.1.4 as: - [autofs4] fix race between mount and expire (Ian Kent ) [381071] ...this is the patch that I suspect fixes this problem for autofs users. Regrettably, I can't make bug 354621 public, but if you speak with your Red Hat support contact they may be able to CC you on one of these bugs so that you can read the details. If you have a reproducible problem that doesn't involve autofs (like you hinted in comment #3), then I suggest opening a new BZ for that. I will note that mounting via nfs over the loopback is a known problematic configuration and is not recommended. For now, I'm going to close this bug as a duplicate of these other bugs. If you're still able to reproduce the problem with autofs on one of the above kernels, then please reopen this bug and we'll have a closer look. *** This bug has been marked as a duplicate of 371341 ***
The problem seems to be solved with the 2.6.18-53.1.4.el5. However, I understand that you cannot make the private bug 354621 open. Private bugs in bugzilla just don't exists for other users as they are not searchable or in any kind checkable for users. So don't link such bug. For your suggestion that the support team should cc me, they told me that they cannot as they have no (technical) rights to do that. But if a bug is closed and addressed by a errata I suppose that this is listed in the errata under "Bugs fixed". But neighter this bug nor 371341 is listed in the errata. If the problem with CFS is also fixed I still have to see. (CFS is technical using nfs to localhost and only to localhost. Note that all other handling would be insecure and would negate the sense of CFS!) Closing the bug is ok for me. I will post a new if the later problem still exists. (371341 is not private and so this is also ok.)
Just to know. You didn't answer my question about the "security embargo" above. Can you please do that?
I believe this page from the fedora wiki explains what an embargoed security bug is: http://fedoraproject.org/wiki/Security/Bugs#head-0b84564dbeb494452afe89f812d3e78112a6e82e ...since the bug in question was not public we were barred from releasing the fix until the agreed-upon date.