Description of problem: The kernel modul "nfs" does not load. This means that no nfs filesystems can be mounted! After reboot I see processes like these: 0 S root 9398 8291 0 80 0 - 30321 autofs 08:24 pts/0 00:00:00 ls -l /mount/data 0 S root 9400 1319 0 80 0 - 29801 wait 08:24 ? 00:00:00 /bin/mount -n -t nfs -s -o rw,hard,intr fileserver:/fs/data /mount/data 4 D root 9401 9400 0 80 0 - 4253 call_u 08:24 ? 00:00:00 /sbin/mount.nfs fileserver:/fs/data /mount/data -s -n -o rw,hard,intr 1 S root 9402 50 0 80 0 - 0 wait 08:24 ? 00:00:00 [kworker/u:5] 0 S root 9403 9402 0 80 0 - 1594 autofs 08:24 ? 00:00:00 /sbin/modprobe -q -- nfs There is also a permanent load on the host, which increases with each request to a nfs filesystem. We use autofs, but I don't think this is the problem. The problem seems to be the line "/sbin/modprobe -q -- nfs". I searched for the module nfs and tried to load it manually and got the following error: -bash-4.2# insmod /lib/modules/2.6.38.8-32.fc15.x86_64/kernel/fs/nfs/nfs.ko insmod: error inserting '/lib/modules/2.6.38.8-32.fc15.x86_64/kernel/fs/nfs/nfs.ko': -1 Unknown symbol in module The relevant part of "dmesg" output is: [44596.303781] nfs: Unknown symbol nlmclnt_proc (err 0) [44596.304285] nfs: Unknown symbol __fscache_read_or_alloc_pages (err 0) [44596.306414] nfs: Unknown symbol __fscache_relinquish_cookie (err 0) [44596.306895] nfs: Unknown symbol nfsacl_decode (err 0) [44596.308749] nfs: Unknown symbol __fscache_unregister_netfs (err 0) [44596.308943] nfs: Unknown symbol nfsacl_encode (err 0) [44596.310800] nfs: Unknown symbol __fscache_maybe_release_page (err 0) [44596.311803] nfs: Unknown symbol __fscache_read_or_alloc_page (err 0) [44596.313133] nfs: Unknown symbol __fscache_uncache_page (err 0) [44596.313494] nfs: Unknown symbol __fscache_register_netfs (err 0) [44596.313890] nfs: Unknown symbol svc_gss_principal (err 0) [44596.314225] nfs: Unknown symbol __fscache_write_page (err 0) [44596.314801] nfs: Unknown symbol nlmclnt_init (err 0) [44596.315019] nfs: Unknown symbol nlmclnt_done (err 0) [44596.315252] nfs: Unknown symbol __fscache_wait_on_page_write (err 0) [44596.315633] nfs: Unknown symbol __fscache_acquire_cookie (err 0) Version-Release number of selected component (if applicable): kernel-2.6.38.8-32.fc15.x86_64 How reproducible: Always. Steps to Reproduce: 1. Boot with (current) Fedora 15 kernel 2.6.38.8-32. 2. Try to mount a nfs filesystem. Or try to load the kernel module nfs. Actual results: Kernel module nfs and other dependenies are not loaded. There are error messages in "dmesg" and /var/log/messages when using "insmod". "modprobe nfs" hangs. NFS filesystems can not be mountet (mount command hangs). Expected results: Kernel module nfs and other dependenies is loaded. NFS filesystems can be mounted. Additional info: I checked another host with Fedora 15 but still running kernel-2.6.38.7-30.fc15.x86_64. There nfs filesystems can be mounted. This system has the following modules loaded ("lsmod | grep nfs"): nfs 316713 12 lockd 70321 1 nfs fscache 43442 1 nfs nfs_acl 2357 1 nfs auth_rpcgss 38390 1 nfs sunrpc 195388 36 nfs,lockd,nfs_acl,auth_rpcgss
I want to note that now I rebooted the host with the old installation kernel kernel-2.6.38.6-26.rc1.fc15.x86_64. And now I am surprised that the same problem exists with this kernel. For installation I have done a kickstart installation which installes in postinstall part a init script that starts after reboot and did some more postinstallation tasks (like updates, more packege installations, configuration of nameservices, printers, local data partitions, nfs export, autofs, etc.). This script that starts after reboot mounts a partition over nfs, and that was done without problems. This script did updated the kernel and then has installed another postinstall script for the next reboot and did a reboot to do the rest of the postinstall task with the current kernel. This kernel was kernel-2.6.38.8-32.fc15.x86_64 and it also had mounted a nfs partition which had succeeded. After these scripts was finished the host was (automatically) rebooted again, and from that on the nfs problem had occured. I use the same fstab and autofs* files as in Fedora 14, but I think this should not inhibit loading the nfs kernel module.
The insmod showing warnings about unknown symbols is expected, as it doesn't pull in dependant modules like modprobe does. the modprobe hanging is the root problem that needs to be fixed. Does the box totally hang when this happens, or can you still use other tty's ? If so, the output of dmesg after modprobe hangs might be useful.
The problem for me appears to be with the update to module-init-tools. Suddenly, modprobe is looking in /usr/local/lib/modprobe.d. So what happens is autofs starts, /usr/local is mounted via NFS by autofs. The nfs modules hasn't been loaded yet, since all our NFS mounts are autofs controlled. So the first time anything tries to access any NFS automount, modprobe tries load the NFS module and reads /usr/local/lib/modprobe.d which is not yet available. So everything hangs waiting for the nfs module to get loaded which it can't since it depends on an nfs mount. I'm not sure if the bug is autofs hanging when it should just silently fail, mount hanging due to lack of NFS module, or most likely modprobe since it was the last change that broke things. I consider this a rather critical bug, it rendered all our systems inoperable and took many hours to track down. Stay out of /usr/local, you don't belong there! Its LOCAL...
I can confirm that we have the same situation as described in comment #3. Fedora 15 hangs only on our networked hosts thats have configured /usr/local as an automount nfs filesystem, not on standalone hosts like notebooks. This also clarifies why the kickstart installation as described in comment #1 is not hanging after the first and second reboot but after the last reboot, since /usr/local is only autofs nfs configured after the last reboot (changed in the scripts before the last reboot, but not activated). And to ask the question in comment #2 (sorry for the delayed answer): The host is not totally hanging, I can login using ssh. It seems sshd is started before autofs. But there is no grafic login, no gdm. And there is no vt* login. May it be the reason that theese binaries contain /usr/local/bin hardcoded in there PATH variable? When I login using ssh als root, ssh hangs. When I press Control-C two times, I will get bash shell prompt. Then if I enter export PATH=/sbin:/usr/sbin:/bin:/usr/bin and then I search for processes which touches /usr/local (like ls, mount, etc.) and for modprobe processes, and kill them, then I can enter modprobe nfs and that succeeds without error. After that I can access /usr/local and the autofs nfs mount succeds. We know the fedora people deems (is it the right word?) /usr/local as a local filesystem to the host, not as shared by nfs among a set of similar hosts. But I think the LSB is not clear in this case - but this is another topic. We have a long history with shared /usr/local, from the days of SunOS and Solaris, then used the same design at linux. Many of our local developed software is installed in /usr/local, without making packages, and without the need to update every new local software locally on about hundred hosts. But there is another known problem - ssh login hangs when the nfs server for /usr/local is not available because /usr/local/bin is hardcoded in the sshd and login binaries and is listed before /bin and /usr/bin. So we will consider to change our shared "local" directory to something other and let /usr/local be a really local filesystem. May be that we will do a radical cut with Fedora 15 to not nfs mount /usr/local which meens that some of our software will not work for some time until it is recompiled on the new place. Is there a suggestion where to mount such a (nfs) shared "/usr/local" partition? /usr/share has the name that it may be shared, but in reality it must be local because rpm packages install data in this directory. I looked at LSB but I found no suggestion. Indepedent of the above: Does anybody know why modprobe looks searches in /usr/local/lib/modprobe.d in Fedora 15? We had no that problems in Fedora 14 and earlier. -bash-4.2# strings - /sbin/modprobe | grep /usr/local /usr/local/lib/modprobe.d
My issue is fixed on kernel 2.6.38.8-35 , I think is was : * Wed Jul 06 2011 Chuck Ebbert <cebbert> 2.6.38.8-35 - Revert SCSI/block patches from 2.6.38.6 that caused more problems. Please, try it and report it .
Caused by: http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commitdiff_plain;h=9454d710137be3799f343cc9d0f833f0802e2111 (in v3.13) module-init-tools is now looking in /usr/local/lib/modprobe.d when loading modules, which breaks horribly when we are trying to automount /usr/local via nfs.
*** Bug 710197 has been marked as a duplicate of this bug. ***
Can you test the build at http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it resolves this issue?
(In reply to comment #8) > Can you test the build at > http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it > resolves this issue? Yes, my system boots cleanly with that version of module-init-tools.
module-init-tools-3.16-2.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15
If people could test and give appropriate karma to the update, that would be great.
Thank you for creating the update! module-init-tools-3.16-2.fc15 from comment #10 works for me and solves the described problem!
Package module-init-tools-3.16-2.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing module-init-tools-3.16-2.fc15' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15 then log in and leave karma (feedback).
module-init-tools-3.16-2.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.