Bug 714023
Summary: | Error in loading kernel module nfs using module-init-tools 3.13 or later | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Edgar Hoch <edgar.hoch> |
Component: | module-init-tools | Assignee: | Jon Masters <jonathan> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 15 | CC: | aquini, gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, m.a.young, schrett, sergio |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | module-init-tools-3.16-2.fc15 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-07-26 03:37:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Edgar Hoch
2011-06-17 06:40:36 UTC
I want to note that now I rebooted the host with the old installation kernel kernel-2.6.38.6-26.rc1.fc15.x86_64. And now I am surprised that the same problem exists with this kernel. For installation I have done a kickstart installation which installes in postinstall part a init script that starts after reboot and did some more postinstallation tasks (like updates, more packege installations, configuration of nameservices, printers, local data partitions, nfs export, autofs, etc.). This script that starts after reboot mounts a partition over nfs, and that was done without problems. This script did updated the kernel and then has installed another postinstall script for the next reboot and did a reboot to do the rest of the postinstall task with the current kernel. This kernel was kernel-2.6.38.8-32.fc15.x86_64 and it also had mounted a nfs partition which had succeeded. After these scripts was finished the host was (automatically) rebooted again, and from that on the nfs problem had occured. I use the same fstab and autofs* files as in Fedora 14, but I think this should not inhibit loading the nfs kernel module. The insmod showing warnings about unknown symbols is expected, as it doesn't pull in dependant modules like modprobe does. the modprobe hanging is the root problem that needs to be fixed. Does the box totally hang when this happens, or can you still use other tty's ? If so, the output of dmesg after modprobe hangs might be useful. The problem for me appears to be with the update to module-init-tools. Suddenly, modprobe is looking in /usr/local/lib/modprobe.d. So what happens is autofs starts, /usr/local is mounted via NFS by autofs. The nfs modules hasn't been loaded yet, since all our NFS mounts are autofs controlled. So the first time anything tries to access any NFS automount, modprobe tries load the NFS module and reads /usr/local/lib/modprobe.d which is not yet available. So everything hangs waiting for the nfs module to get loaded which it can't since it depends on an nfs mount. I'm not sure if the bug is autofs hanging when it should just silently fail, mount hanging due to lack of NFS module, or most likely modprobe since it was the last change that broke things. I consider this a rather critical bug, it rendered all our systems inoperable and took many hours to track down. Stay out of /usr/local, you don't belong there! Its LOCAL... I can confirm that we have the same situation as described in comment #3. Fedora 15 hangs only on our networked hosts thats have configured /usr/local as an automount nfs filesystem, not on standalone hosts like notebooks. This also clarifies why the kickstart installation as described in comment #1 is not hanging after the first and second reboot but after the last reboot, since /usr/local is only autofs nfs configured after the last reboot (changed in the scripts before the last reboot, but not activated). And to ask the question in comment #2 (sorry for the delayed answer): The host is not totally hanging, I can login using ssh. It seems sshd is started before autofs. But there is no grafic login, no gdm. And there is no vt* login. May it be the reason that theese binaries contain /usr/local/bin hardcoded in there PATH variable? When I login using ssh als root, ssh hangs. When I press Control-C two times, I will get bash shell prompt. Then if I enter export PATH=/sbin:/usr/sbin:/bin:/usr/bin and then I search for processes which touches /usr/local (like ls, mount, etc.) and for modprobe processes, and kill them, then I can enter modprobe nfs and that succeeds without error. After that I can access /usr/local and the autofs nfs mount succeds. We know the fedora people deems (is it the right word?) /usr/local as a local filesystem to the host, not as shared by nfs among a set of similar hosts. But I think the LSB is not clear in this case - but this is another topic. We have a long history with shared /usr/local, from the days of SunOS and Solaris, then used the same design at linux. Many of our local developed software is installed in /usr/local, without making packages, and without the need to update every new local software locally on about hundred hosts. But there is another known problem - ssh login hangs when the nfs server for /usr/local is not available because /usr/local/bin is hardcoded in the sshd and login binaries and is listed before /bin and /usr/bin. So we will consider to change our shared "local" directory to something other and let /usr/local be a really local filesystem. May be that we will do a radical cut with Fedora 15 to not nfs mount /usr/local which meens that some of our software will not work for some time until it is recompiled on the new place. Is there a suggestion where to mount such a (nfs) shared "/usr/local" partition? /usr/share has the name that it may be shared, but in reality it must be local because rpm packages install data in this directory. I looked at LSB but I found no suggestion. Indepedent of the above: Does anybody know why modprobe looks searches in /usr/local/lib/modprobe.d in Fedora 15? We had no that problems in Fedora 14 and earlier. -bash-4.2# strings - /sbin/modprobe | grep /usr/local /usr/local/lib/modprobe.d My issue is fixed on kernel 2.6.38.8-35 , I think is was : * Wed Jul 06 2011 Chuck Ebbert <cebbert> 2.6.38.8-35 - Revert SCSI/block patches from 2.6.38.6 that caused more problems. Please, try it and report it . Caused by: http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commitdiff_plain;h=9454d710137be3799f343cc9d0f833f0802e2111 (in v3.13) module-init-tools is now looking in /usr/local/lib/modprobe.d when loading modules, which breaks horribly when we are trying to automount /usr/local via nfs. *** Bug 710197 has been marked as a duplicate of this bug. *** Can you test the build at http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it resolves this issue? (In reply to comment #8) > Can you test the build at > http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it > resolves this issue? Yes, my system boots cleanly with that version of module-init-tools. module-init-tools-3.16-2.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15 If people could test and give appropriate karma to the update, that would be great. Thank you for creating the update! module-init-tools-3.16-2.fc15 from comment #10 works for me and solves the described problem! Package module-init-tools-3.16-2.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing module-init-tools-3.16-2.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15 then log in and leave karma (feedback). module-init-tools-3.16-2.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. |