Bug 714023

Summary: Error in loading kernel module nfs using module-init-tools 3.13 or later
Product: [Fedora] Fedora Reporter: Edgar Hoch <edgar.hoch>
Component: module-init-toolsAssignee: Jon Masters <jonathan>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: aquini, gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, m.a.young, schrett, sergio
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: module-init-tools-3.16-2.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-26 03:37:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Edgar Hoch 2011-06-17 06:40:36 UTC
Description of problem:

The kernel modul "nfs" does not load. This means that no nfs filesystems can be mounted!

After reboot I see processes like these:

0 S root      9398  8291  0  80   0 - 30321 autofs 08:24 pts/0    00:00:00 ls -l /mount/data
0 S root      9400  1319  0  80   0 - 29801 wait   08:24 ?        00:00:00 /bin/mount -n -t nfs -s -o rw,hard,intr fileserver:/fs/data /mount/data
4 D root      9401  9400  0  80   0 -  4253 call_u 08:24 ?        00:00:00 /sbin/mount.nfs fileserver:/fs/data /mount/data -s -n -o rw,hard,intr
1 S root      9402    50  0  80   0 -     0 wait   08:24 ?        00:00:00 [kworker/u:5]
0 S root      9403  9402  0  80   0 -  1594 autofs 08:24 ?        00:00:00 /sbin/modprobe -q -- nfs


There is also a permanent load on the host, which increases with each request to a nfs filesystem.

We use autofs, but I don't think this is the problem. The problem seems to be the line "/sbin/modprobe -q -- nfs".

I searched for the module nfs and tried to load it manually and got the following error:

-bash-4.2# insmod /lib/modules/2.6.38.8-32.fc15.x86_64/kernel/fs/nfs/nfs.ko 
insmod: error inserting '/lib/modules/2.6.38.8-32.fc15.x86_64/kernel/fs/nfs/nfs.ko': -1 Unknown symbol in module

The relevant part of "dmesg" output is:

[44596.303781] nfs: Unknown symbol nlmclnt_proc (err 0)
[44596.304285] nfs: Unknown symbol __fscache_read_or_alloc_pages (err 0)
[44596.306414] nfs: Unknown symbol __fscache_relinquish_cookie (err 0)
[44596.306895] nfs: Unknown symbol nfsacl_decode (err 0)
[44596.308749] nfs: Unknown symbol __fscache_unregister_netfs (err 0)
[44596.308943] nfs: Unknown symbol nfsacl_encode (err 0)
[44596.310800] nfs: Unknown symbol __fscache_maybe_release_page (err 0)
[44596.311803] nfs: Unknown symbol __fscache_read_or_alloc_page (err 0)
[44596.313133] nfs: Unknown symbol __fscache_uncache_page (err 0)
[44596.313494] nfs: Unknown symbol __fscache_register_netfs (err 0)
[44596.313890] nfs: Unknown symbol svc_gss_principal (err 0)
[44596.314225] nfs: Unknown symbol __fscache_write_page (err 0)
[44596.314801] nfs: Unknown symbol nlmclnt_init (err 0)
[44596.315019] nfs: Unknown symbol nlmclnt_done (err 0)
[44596.315252] nfs: Unknown symbol __fscache_wait_on_page_write (err 0)
[44596.315633] nfs: Unknown symbol __fscache_acquire_cookie (err 0)



Version-Release number of selected component (if applicable):
kernel-2.6.38.8-32.fc15.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Boot with (current) Fedora 15 kernel 2.6.38.8-32.
2. Try to mount a nfs filesystem.
   Or try to load the kernel module nfs.
  
Actual results:
Kernel module nfs and other dependenies are not loaded.
There are error messages in "dmesg" and /var/log/messages when using "insmod".
"modprobe nfs" hangs.
NFS filesystems can not be mountet (mount command hangs).

Expected results:
Kernel module nfs and other dependenies is loaded.
NFS filesystems can be mounted.


Additional info:

I checked another host with Fedora 15 but still running kernel-2.6.38.7-30.fc15.x86_64. There nfs filesystems can be mounted. This system has the following modules loaded ("lsmod | grep nfs"): 

nfs                   316713  12 
lockd                  70321  1 nfs
fscache                43442  1 nfs
nfs_acl                 2357  1 nfs
auth_rpcgss            38390  1 nfs
sunrpc                195388  36 nfs,lockd,nfs_acl,auth_rpcgss

Comment 1 Edgar Hoch 2011-06-17 11:13:46 UTC
I want to note that now I rebooted the host with the old installation kernel kernel-2.6.38.6-26.rc1.fc15.x86_64. And now I am surprised that the same problem exists with this kernel.

For installation I have done a kickstart installation which installes in postinstall part a init script that starts after reboot and did some more postinstallation tasks (like updates, more packege installations, configuration of nameservices, printers, local data partitions, nfs export, autofs, etc.). This script that starts after reboot mounts a partition over nfs, and that was done without problems.

This script did updated the kernel and then has installed another postinstall script for the next reboot and did a reboot to do the rest of the postinstall task with the current kernel. This kernel was kernel-2.6.38.8-32.fc15.x86_64 and it also had mounted a nfs partition which had succeeded.

After these scripts was finished the host was (automatically) rebooted again, and from that on the nfs problem had occured.

I use the same fstab and autofs* files as in Fedora 14, but I think this should not inhibit loading the nfs kernel module.

Comment 2 Dave Jones 2011-06-27 18:43:37 UTC
The insmod showing warnings about unknown symbols is expected, as it doesn't pull in dependant modules like modprobe does.

the modprobe hanging is the root problem that needs to be fixed. Does the box totally hang when this happens, or can you still use other tty's ? If so, the output of dmesg after modprobe hangs might be useful.

Comment 3 Andrew Schretter 2011-07-01 13:10:06 UTC
The problem for me appears to be with the update to module-init-tools.   Suddenly, modprobe is looking in /usr/local/lib/modprobe.d.  So what
happens is autofs starts, /usr/local is mounted via NFS by autofs.
The nfs modules hasn't been loaded yet, since all our NFS mounts are
autofs controlled.  So the first time anything tries to access any
NFS automount, modprobe tries load the NFS module and reads /usr/local/lib/modprobe.d which is not yet available.  So everything
hangs waiting for the nfs module to get loaded which it can't since
it depends on an nfs mount.

I'm not sure if the bug is autofs hanging when it should just silently
fail, mount hanging due to lack of NFS module, or most likely modprobe
since it was the last change that broke things.

I consider this a rather critical bug, it rendered all our systems
inoperable and took many hours to track down.  Stay out of /usr/local,
you don't belong there!  Its LOCAL...

Comment 4 Edgar Hoch 2011-07-01 15:00:28 UTC
I can confirm that we have the same situation as described in comment #3.

Fedora 15 hangs only on our networked hosts thats have configured /usr/local as an automount nfs filesystem, not on standalone hosts like notebooks.

This also clarifies why the kickstart installation as described in comment #1 is not hanging after the first and second reboot but after the last reboot, since /usr/local is only autofs nfs configured after the last reboot (changed in the scripts before the last reboot, but not activated).

And to ask the question in comment #2 (sorry for the delayed answer):
The host is not totally hanging, I can login using ssh. It seems sshd is started before autofs.
But there is no grafic login, no gdm. And there is no vt* login. May it be the reason that theese binaries contain /usr/local/bin hardcoded in there PATH variable?

When I login using ssh als root, ssh hangs. When I press Control-C two times, I will get bash shell prompt.
Then if I enter
export PATH=/sbin:/usr/sbin:/bin:/usr/bin
and then I search for processes which touches /usr/local (like ls, mount, etc.) and for modprobe processes, and kill them, then I can enter
modprobe nfs
and that succeeds without error.
After that I can access /usr/local and the autofs nfs mount succeds.


We know the fedora people deems (is it the right word?) /usr/local as a local filesystem to the host, not as shared by nfs among a set of similar hosts. But I think the LSB is not clear in this case - but this is another topic.

We have a long history with shared /usr/local, from the days of SunOS and Solaris, then used the same design at linux. Many of our local developed software is installed in /usr/local, without making packages, and without the need to update every new local software locally on about hundred hosts.
But there is another known problem - ssh login hangs when the nfs server for /usr/local is not available because /usr/local/bin is hardcoded in the sshd and login binaries and is listed before /bin and /usr/bin.

So we will consider to change our shared "local" directory to something other and let /usr/local be a really local filesystem. May be that we will do a radical cut with Fedora 15 to not nfs mount /usr/local which meens that some of our software will not work for some time until it is recompiled on the new place.

Is there a suggestion where to mount such a (nfs) shared "/usr/local" partition? /usr/share has the name that it may be shared, but in reality it must be local because rpm packages install data in this directory.
I looked at LSB but I found no suggestion.


Indepedent of the above: Does anybody know why modprobe looks searches in /usr/local/lib/modprobe.d in Fedora 15? We had no that problems in Fedora 14 and earlier.

-bash-4.2# strings - /sbin/modprobe | grep /usr/local
/usr/local/lib/modprobe.d

Comment 5 Sergio Basto 2011-07-14 16:24:43 UTC
My issue is fixed on kernel 2.6.38.8-35 , I think is was : 

* Wed Jul 06 2011 Chuck Ebbert <cebbert> 2.6.38.8-35 - Revert
SCSI/block patches from 2.6.38.6 that caused more problems.

Please, try it and report it .

Comment 6 Chuck Ebbert 2011-07-14 18:08:36 UTC
Caused by:
http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commitdiff_plain;h=9454d710137be3799f343cc9d0f833f0802e2111

(in v3.13)

module-init-tools is now looking in /usr/local/lib/modprobe.d when loading modules, which breaks horribly when we are trying to automount /usr/local via nfs.

Comment 7 Chuck Ebbert 2011-07-14 18:09:18 UTC
*** Bug 710197 has been marked as a duplicate of this bug. ***

Comment 8 Matthew Garrett 2011-07-18 20:15:13 UTC
Can you test the build at http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it resolves this issue?

Comment 9 Michael Young 2011-07-19 09:19:03 UTC
(In reply to comment #8)
> Can you test the build at
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3208055 and check whether it
> resolves this issue?

Yes, my system boots cleanly with that version of module-init-tools.

Comment 10 Fedora Update System 2011-07-21 19:38:24 UTC
module-init-tools-3.16-2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15

Comment 11 Matthew Garrett 2011-07-21 19:40:20 UTC
If people could test and give appropriate karma to the update, that would be great.

Comment 12 Edgar Hoch 2011-07-22 16:54:35 UTC
Thank you for creating the update!

module-init-tools-3.16-2.fc15 from comment #10 works for me and solves the described problem!

Comment 13 Fedora Update System 2011-07-23 01:59:18 UTC
Package module-init-tools-3.16-2.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing module-init-tools-3.16-2.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/module-init-tools-3.16-2.fc15
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2011-07-26 03:37:42 UTC
module-init-tools-3.16-2.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.