Description of problem: kernel BUG at fs/nfs/namespace.c:108! Version-Release number of selected component (if applicable): kernel-2.6.23.14-107.fc8.i686 How reproducible: Often Steps to Reproduce: I have an NFS share mounted as follows (/etc/fstab entry): netapp39:/vol/projects1/builds /mnt/builds \ nfs rsize=32768,wsize=32768,soft,intr 0 0 If I then type "cd /mnt/builds" and hit TAB a few times in bash to perform completion, the kernel often gives me an Oops message (shown below), and kills my login/ssh/... Additional info: Jan 25 12:10:25 swarren-lx1 kernel: kernel BUG at fs/nfs/namespace.c:108! Jan 25 12:10:25 swarren-lx1 kernel: invalid opcode: 0000 [#1] SMP Jan 25 12:10:25 swarren-lx1 kernel: Modules linked in: nvidia(P)(U) nfsd exportfs auth_rpcgss rfcomm l2cap bluetooth autofs4 nfs lockd nfs_acl sunrpc loop dm_multipath ipv6 snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device firewire_ohci snd_pcm_oss firewire_core forcedeth k8temp crc_itu_t hwmon i2c_nforce2 snd_mixer_oss snd_pcm i2c_core snd_timer snd_page_alloc serio_raw button snd_hwdep snd soundcore parport_pc pcspkr parport sr_mod cdrom sg floppy pata_amd dm_snapshot dm_zero dm_mirror dm_mod sata_nv ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Jan 25 12:10:25 swarren-lx1 kernel: CPU: 0 Jan 25 12:10:25 swarren-lx1 kernel: EIP: 0060:[<f9177509>] Tainted: P VLI Jan 25 12:10:25 swarren-lx1 kernel: EFLAGS: 00210246 (2.6.23.14-107.fc8 #1) Jan 25 12:10:25 swarren-lx1 kernel: EIP is at nfs_follow_mountpoint+0x37/0x34a [nfs] Jan 25 12:10:25 swarren-lx1 kernel: eax: f7dabc00 ebx: e49e2cc0 ecx: f918b7e0 edx: f5ef8f04 Jan 25 12:10:25 swarren-lx1 kernel: esi: f5ef8f04 edi: 00000000 ebp: da04a900 esp: f5ef8cb0 Jan 25 12:10:25 swarren-lx1 kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Jan 25 12:10:25 swarren-lx1 kernel: Process bash (pid: 4124, ti=f5ef8000 task=f337cc20 task.ti=f5ef8000) Jan 25 12:10:25 swarren-lx1 kernel: Stack: f919c240 00001000 c06f7340 f6c79ee0 f7385a80 00000000 c0430006 da04a300 Jan 25 12:10:25 swarren-lx1 kernel: da04a348 f5ef8000 c8402b0a 00000781 f337cc20 00000002 000041ff 00000009 Jan 25 12:10:25 swarren-lx1 kernel: 00000000 00000000 00001000 00000000 00001000 00000000 00000000 f5ef8f38 Jan 25 12:10:25 swarren-lx1 kernel: Call Trace: Jan 25 12:10:25 swarren-lx1 kernel: [<c0430006>] do_exit+0x37/0x6fc Jan 25 12:10:25 swarren-lx1 kernel: [<f9179dca>] nfs3_decode_dirent+0x1b/0x163 [nfs] Jan 25 12:10:25 swarren-lx1 kernel: [<c046bd8e>] page_address+0x78/0x98 Jan 25 12:10:25 swarren-lx1 kernel: [<f8b603e6>] rpcauth_lookup_credcache+0x4c/0x183 [sunrpc] Jan 25 12:10:25 swarren-lx1 kernel: [<f916bdf9>] nfs_access_get_cached+0x97/0xd7 [nfs] Jan 25 12:10:25 swarren-lx1 kernel: [<f8b5ff5c>] put_rpccred+0x2c/0xc0 [sunrpc] Jan 25 12:10:25 swarren-lx1 kernel: [<f916bfc9>] nfs_permission+0x190/0x19c [nfs] Jan 25 12:10:25 swarren-lx1 kernel: [<c0490198>] dput+0x30/0xd7 Jan 25 12:10:25 swarren-lx1 kernel: [<c048746f>] __follow_mount+0x1e/0x60 Jan 25 12:10:25 swarren-lx1 kernel: [<c04875c0>] do_lookup+0x4f/0x140 Jan 25 12:10:25 swarren-lx1 kernel: [<c048950d>] __link_path_walk+0x8c5/0xbaf Jan 25 12:10:25 swarren-lx1 kernel: [<c05461e3>] n_tty_receive_buf+0xc77/0xcc3 Jan 25 12:10:25 swarren-lx1 kernel: [<c048983b>] link_path_walk+0x44/0xb3 Jan 25 12:10:25 swarren-lx1 kernel: [<c0489b23>] do_path_lookup+0x162/0x1c7 Jan 25 12:10:25 swarren-lx1 kernel: [<c048896d>] getname+0x59/0xad Jan 25 12:10:25 swarren-lx1 kernel: [<c048a2f7>] __user_walk_fd+0x2f/0x40 Jan 25 12:10:25 swarren-lx1 kernel: [<c0483f37>] vfs_stat_fd+0x19/0x40 Jan 25 12:10:25 swarren-lx1 kernel: [<c0483feb>] sys_stat64+0xf/0x23 Jan 25 12:10:25 swarren-lx1 kernel: [<c0459322>] audit_syscall_exit+0x2aa/0x2c6 Jan 25 12:10:25 swarren-lx1 kernel: [<c045904e>] audit_syscall_entry+0x10d/0x137 Jan 25 12:10:25 swarren-lx1 kernel: [<c0407f4d>] do_syscall_trace+0xd7/0xde Jan 25 12:10:25 swarren-lx1 kernel: [<c040518a>] syscall_call+0x7/0xb Jan 25 12:10:25 swarren-lx1 kernel: ======================= Jan 25 12:10:25 swarren-lx1 kernel: Code: 00 00 8b 40 0c f6 05 8c c5 b7 f8 01 8b 80 9c 00 00 00 8b a8 64 01 00 00 74 0c c7 04 24 d3 e9 18 f9 e8 ef 69 2b c7 3b 5b 18 75 04 <0f> 0b eb fe f6 05 8c c5 b7 f8 01 74 14 c7 44 24 04 98 b8 18 f9 Jan 25 12:10:25 swarren-lx1 kernel: EIP: [<f9177509>] nfs_follow_mountpoint+0x37/0x34a [nfs] SS:ESP 0068:f5ef8cb0
There are fixes in 2.6.24 for some causes of this problem, but those errors are triggered by a buggy NFS server. Is the server software up-to-date?
I asked our IT department, and they state the server has the latest OS loaded. Which specific bug # / patch # / version should the server have from netapp to solve this?
Apparently the specific software version on the netapp is 7.2.2. Is there a fix available for this issue from netapp?
(In reply to comment #3) > Apparently the specific software version on the netapp is 7.2.2. > > Is there a fix available for this issue from netapp? > 7.2.4 is the latest version.
IT says this: I need to confirmation on bug before upgrading this filer. This filer has been 7.2.2 at least 259 days. This is first time reporting on any client crash. Can anyone tell me the specific netapp bug number that causes this that will be fixed by upgrading to 7.2.4?
Apparently, netapp is not aware of this problem. Chuck, please tell me what the NFS server bug is exactly, and which netapp bug # was assigned to this issue.
According to Neil's description of this patch, it certainly is suggestive that this is a bug in Linux that's being triggered by buggy server behavior: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4c1fe2f78a08e2c514a39c91a0eb7b55bbd3c0d2 I'm afraid I don't have any info on whether this is a NetApp bug that's already fixed. If you're hitting this regularly, you might want to try backporting this patch to 2.6.23. Otherwise, it sounds like this is already fixed in 2.6.24.
Your best bet though is probably to get some network captures of the problem behavior, verify whether the problem is what we think it is, and open a case with NetApp about it.
FYI, I have supplied packet captures to our IT department, who I assume are in communication with NetApp support. We also have a 7.2.4 test server, and it seems like that fixes the issue, in brief testing. We'll see if/when it gets rolled out to the production server.
Created attachment 295844 [details] Re-diffed (v.s. 2.6.23) patch from comment #7.
I added the patch in comment #7 to the kernel RPM and rebuilt and it *does* appear to fix the issue. I've attached the patch to this bug report (basically, just re-diffed against the 2.6.23 kernel. Is there any chance of including it in the standard Fedora kernel releases; I'd rather not be stuck with rebuilding my kernel each time a new one comes out:-)
Kernel 2.6.24.3-12 is in the updates-testing repository now... please test.
Hmmm. For some reason, I didn't see your earlier comment; must have missed the email. Sorry. Anyway, the update just came in via the updates repository, and does appear to have solved the problem. I am awaiting final confirmation that the netapp wasn't also upgraded yet, to isolate that it was the kernel fix that solved the issue. I'll close out the bug when I get that. Thanks.
Setting to NEEDINFO based on last comment.
Well, IT can't be bothered to answer my question. Since I already tested the fix in a previous kernel, and since I don't see the issue in the current kernel, I'll assume the kernel update (rather than a netapp upgrade) fixed the issue. Hence, you can close this out. (I would do that myself, but I'm not sure whether to choose upstream/errata/rawhide/...) Thanks.
Ick. Clicking in the resolution list in order to see what I might want to select forceably selected the option to close the close the bug. Damn Javascript. Oh well, you can change the resolution to whatever is appropriate...
Thanks for following up! Since it did look like an actual bug, I'll change this to CURRENTRELEASE.