Bug 727927 - [cifs] Kernel 2.6.40.4 Panics on issue of mount command.
[cifs] Kernel 2.6.40.4 Panics on issue of mount command.
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
15
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Jeff Layton
Fedora Extras Quality Assurance
:
: 728684 731278 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-03 12:58 EDT by TR Bentley
Modified: 2014-06-18 03:41 EDT (History)
14 users (show)

See Also:
Fixed In Version: kernel-2.6.40.4-5.fc15
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-09 04:30:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Video of the crash if it helps (986.88 KB, video/mp4)
2011-08-03 16:45 EDT, TR Bentley
no flags Details
Best picture of the dump (134.15 KB, image/png)
2011-08-03 16:46 EDT, TR Bentley
no flags Details
patch -- cope with negative dentries in cifs_get_root (1019 bytes, patch)
2011-08-04 10:01 EDT, Jeff Layton
no flags Details | Diff

  None (edit)
Description TR Bentley 2011-08-03 12:58:19 EDT
Description of problem:
Fedora 15 box with above kernel will panic and kill the UI and lock up when 

sudo mount -t cifs //192.168.0.5/photos /mnt/tigger5/pictures -o user=timali,password=zzzzzz,uid=502,gid=100 is issued.

Version-Release number of selected component (if applicable):
Installed       kernel-2.6.38.8-32.fc15.x86_64                  The Linux kernel
Installed       kernel-2.6.38.8-35.fc15.x86_64                  The Linux kernel                                                             
Installed       kernel-2.6.40-4.fc15.x86_64  

When the machine is booted with kernel-2.6.40-4.fc15.x86_64 the machine will always panic and lockup.

How reproducible:
Always as it took 6 reboots to fine the source of the problem.

Steps to Reproduce:
1.Boot with Kernel and issue mount command
2.
3.
  
Actual results:
Locked machine with no UI, Panic trace on screen and a hard reboot necessary.


Expected results:
Working desktop with drive mounted.


Additional info:

Unable to get any trace info I have looked sorry.
Comment 1 Dave Jones 2011-08-03 14:22:55 EDT
can you try a mount from a tty (ctrl-alt-f2). That might get you a trace you can capture with a camera.
Comment 2 Jeff Layton 2011-08-03 15:11:18 EDT
I've not seen this sort of problem with a similar mount command on 2.6.40. I'll need a stack trace or something to go on in order to pursue this.
Comment 3 TR Bentley 2011-08-03 16:45:47 EDT
Created attachment 516577 [details]
Video of the crash if it helps
Comment 4 TR Bentley 2011-08-03 16:46:27 EDT
Created attachment 516578 [details]
Best picture of the dump
Comment 5 TR Bentley 2011-08-03 16:48:19 EDT
The trace will start with 5 umount commands then the crash.
The message reads to me.

CIFS NFS default security mechanism required.  The default security mechanism will be .....................
Comment 6 Jeff Layton 2011-08-03 17:02:15 EDT
I'm afraid that doesn't really help as I can't read any of the oops message.
Comment 7 Iain Arnell 2011-08-04 08:42:45 EDT
I can reliably reproduce with 2.6.40-4.fc15.x86_64 (but not with 3.0.0-1.fc16.x86_64) and get stack trace. 

$ sudo mount -v -t cifs //server2/data/mydata data -o user=cifsuser
Password: 
mount.cifs kernel mount options: ip=10.0.0.2,unc=\\server2\data,,ver=1,user=cifsuser,prefixpath=mydata,pass=********


[   41.012855] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[   41.013117] IP: [<ffffffff814b636e>] mutex_lock+0x2c/0x4a
[   41.013358] PGD 37c00067 PUD 37b1f067 PMD 0 
[   41.013608] Oops: 0002 [#1] SMP 
[   41.013798] CPU 0 
[   41.013875] Modules linked in: des_generic md4 nls_utf8 cifs fscache sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables ppdev parport_pc parport e1000 microcode vmw_balloon shpchp i2c_piix4 i2c_core vmw_pvscsi [last unloaded: speedstep_lib]
[   41.018285] 
[   41.018354] Pid: 963, comm: mount.cifs Not tainted 2.6.40-4.fc15.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[   41.018945] RIP: 0010:[<ffffffff814b636e>]  [<ffffffff814b636e>] mutex_lock+0x2c/0x4a
[   41.019408] RSP: 0018:ffff88003d4cbd08  EFLAGS: 00010246
[   41.019660] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 000000000000005c
[   41.019943] RDX: 0000000000000000 RSI: 0000000000000055 RDI: 0000000000000038
[   41.020226] RBP: ffff88003d4cbd28 R08: ffffea0000cd91c8 R09: ffffffffa01441bf
[   41.020509] R10: ffff88003d4cb998 R11: ffff88003d4cb998 R12: ffff88003cb08600
[   41.020791] R13: ffff88003abbfcc0 R14: ffff88003552f240 R15: ffff88003abbfcef
[   41.021087] FS:  00007fd5faaf8740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   41.021520] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   41.021781] CR2: 0000000000000038 CR3: 000000003744f000 CR4: 00000000000006f0
[   41.022093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   41.022402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   41.022688] Process mount.cifs (pid: 963, threadinfo ffff88003d4ca000, task ffff880037eb0000)
[   41.023113] Stack:
[   41.023308]  ffff88003d4cbd28 ffffffff81137c49 ffff88003b1f7000 ffff88003b1f7000
[   41.023889]  ffff88003d4cbda8 ffffffffa0133d2f ffff88003abbfcda ffffffff811f845c
[   41.024475]  0000000000000038 00000004378fb000 ffff88003a88f000 ffff88003aed8400
[   41.025055] Call Trace:
[   41.025280]  [<ffffffff81137c49>] ? dput+0x42/0xea
[   41.025542]  [<ffffffffa0133d2f>] cifs_do_mount+0x396/0x466 [cifs]
[   41.025834]  [<ffffffff811f845c>] ? selinux_sb_copy_data+0x148/0x1ab
[   41.026113]  [<ffffffff81129858>] mount_fs+0x69/0x155
[   41.026382]  [<ffffffff810f528a>] ? __alloc_percpu+0x10/0x12
[   41.026646]  [<ffffffff8113d5e9>] vfs_kern_mount+0x63/0x9d
[   41.026903]  [<ffffffff8113df6c>] do_kern_mount+0x4d/0xdf
[   41.027158]  [<ffffffff8113f5f1>] do_mount+0x63c/0x69f
[   41.027417]  [<ffffffff8113f8d6>] sys_mount+0x88/0xc2
[   41.027675]  [<ffffffff814bd7c2>] system_call_fastpath+0x16/0x1b
[   41.027939] Code: 48 89 e5 53 48 83 ec 18 66 66 66 66 90 31 d2 be 55 00 00 00 48 89 fb 48 c7 c7 82 36 7b 81 e8 9c 10 b9 ff e8 43 f7 ff ff 48 89 df <3e> ff 0f 79 05 e8 51 00 00 00 65 48 8b 04 25 80 cd 00 00 48 89 
[   41.031502] RIP  [<ffffffff814b636e>] mutex_lock+0x2c/0x4a
[   41.031801]  RSP <ffff88003d4cbd08>
[   41.032020] CR2: 0000000000000038
[   41.032256] ---[ end trace 3bc5d4d0271d1502 ]---
Comment 8 Jeff Layton 2011-08-04 09:36:19 EDT
(cc'ing Al since he wrote this code...)

Thanks, that helps somewhat:

(gdb) list *(cifs_do_mount+0x396)
0xd53 is in cifs_do_mount (fs/cifs/cifsfs.c:580).
575			/* next separator */
576			while (*s && *s != sep)
577				s++;
578	
579			mutex_lock(&dir->i_mutex);
580			child = lookup_one_len(p, dentry, s - p);
581			mutex_unlock(&dir->i_mutex);
582			dput(dentry);
583			dentry = child;
584		} while (!IS_ERR(dentry));


...which is actually inlined cifs_get_root. So the problem is likely that dir is NULL here, which would imply that we hit a negative dentry while walking down to the root of the vfsmount.
Comment 9 Jeff Layton 2011-08-04 10:01:54 EDT
Created attachment 516711 [details]
patch -- cope with negative dentries in cifs_get_root

Here's a possible patch (untested, but I don't seem to be able to reproduce this). If lookup_one_len returns a negative dentry, then put it and set the dentry pointer to an error of -ENOENT.

I'd like Al to weigh in on this before I propose it upstream, but it might be worth testing in the meantime if you feel brave.
Comment 10 Iain Arnell 2011-08-05 05:37:23 EDT
The patch certainly avoids the oops for me with no negative side-effects that I've noticed. (But then I'm still stuck with bug 727834, of course).
Comment 11 Jeff Layton 2011-08-05 09:04:08 EDT
Ok, patch sent upstream. I have a feeling we'll be ripping and replacing this code with something that doesn't require access to top-level directories in order to mount a lower one, but this patch should at least stop the oopses in the interim.
Comment 12 Dave Jones 2011-08-08 11:56:18 EDT
*** Bug 728684 has been marked as a duplicate of this bug. ***
Comment 13 Vlad 2011-08-16 13:03:18 EDT
fyi: I'm also having the same issue on 2.6.40-4.fc15.i686.
Comment 14 Jeff Layton 2011-08-16 13:09:30 EDT
Yep. Looks like 3.0.2 got released upstream in the last day or two, and the relevant patches should be in there. You may want to test this build out of koji:

    http://koji.fedoraproject.org/koji/buildinfo?buildID=258790
Comment 15 Vlad 2011-08-16 13:47:16 EDT
the kernel-2.6.40.3-0.fc15.i686.rpm from the link above does not improve things for me.
Kernel halts few seconds after mounting cifs 

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] Oops: 0000 [#1] SMP

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] Process sshd (pid: 1654, ti=f658c000 task=f6545860 task.ti=f658c000)

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] Stack:

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] Call Trace:

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] Code: 08 85 c9 89 4d f0 75 17 8b 7d d8 8b 55 ec 89 04 24 89 f0 89 f9 e8 aa 5d 30 00 89 45 f0 eb 2d 8b 5d f0 8b 46 14 8b 7d e4 8b 55 e4 <8b> 04 03 47 89 7d dc 89 f9 8b 3e 89 45 e0 89 c3 8b 45 f0 64 0f

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] EIP: [<c04de862>] kmem_cache_alloc_trace+0x78/0xd8 SS:ESP 0068:f658dd10

Message from syslogd@test at Aug 16 13:37:07 ...
 kernel:[  134.876632] CR2: 00000000800a8115

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.537968] Oops: 0000 [#2] SMP

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] Process sshd (pid: 1656, ti=f6588000 task=f6544bc0 task.ti=f6588000)

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] Stack:

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] Call Trace:

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] Code: 08 85 c9 89 4d f0 75 17 8b 7d d8 8b 55 ec 89 04 24 89 f0 89 f9 e8 aa 5d 30 00 89 45 f0 eb 2d 8b 5d f0 8b 46 14 8b 7d e4 8b 55 e4 <8b> 04 03 47 89 7d dc 89 f9 8b 3e 89 45 e0 89 c3 8b 45 f0 64 0f

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] EIP: [<c04de862>] kmem_cache_alloc_trace+0x78/0xd8 SS:ESP 0068:f6589f0c

Message from syslogd@test at Aug 16 13:37:10 ...
 kernel:[  137.538260] CR2: 00000000800a8115
Comment 16 Jeff Layton 2011-08-16 13:55:51 EDT
That's almost certainly a different bug than this one. Could you open a new bug for this, and include the entire stack trace if you're able. Please cc me on the bug too.
Comment 17 Vlad 2011-08-16 14:05:21 EDT
Jeff:

unfortunately I don't have easy access to that server to reset it and every time it locks I need to contact support / create ticket, etc. :(
Comment 18 TR Bentley 2011-08-20 15:11:44 EDT
(In reply to comment #14)
> Yep. Looks like 3.0.2 got released upstream in the last day or two, and the
> relevant patches should be in there. You may want to test this build out of
> koji:
> 
>     http://koji.fedoraproject.org/koji/buildinfo?buildID=258790

Tested with 2.6.40.3-0.fc15 and still fails
Comment 19 Jeff Layton 2011-08-25 06:24:17 EDT
Could you test with the kernel here and let me know if it fails?

    https://koji.fedoraproject.org/koji/taskinfo?taskID=3298071
Comment 20 Vlad 2011-08-25 09:45:48 EDT
Jeff:

1 hour runtime - so far good. Before it would crash within seconds.

-- vlad
Comment 21 Chuck Ebbert 2011-08-29 17:33:52 EDT
Fixes are in 2.6.40.4-3
Comment 22 Chuck Ebbert 2011-08-29 17:35:47 EDT
*** Bug 731278 has been marked as a duplicate of this bug. ***
Comment 23 TR Bentley 2011-08-30 12:15:33 EDT
I can confirm that 2 days of testing have been sucessfull and the original bug is fixed.


[16:54][timali@tigger3] bug-fixes $map_drives
umount: /mnt/tigger5/pictures: not mounted
umount: /mnt/tigger5/backups: not mounted
umount: /mnt/tigger5/share: not mounted
umount: /mnt/tigger5/music: not mounted
umount: /mnt/tigger5/sarah: not mounted
umount: /mnt/tigger5/peter: not mounted
[16:54][timali@tigger3] bug-fixes $uname -s
Linux
[17:14][timali@tigger3] bug-fixes $uname -r
2.6.40.3-1.cifs.1.fc15.x86_64
Comment 24 Fedora Update System 2011-09-01 07:06:53 EDT
kernel-2.6.40.4-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.4-5.fc15
Comment 25 Fedora Update System 2011-09-06 20:01:02 EDT
kernel-2.6.40.4-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 26 gene smith 2011-09-08 00:57:48 EDT
(In reply to comment #25)
> kernel-2.6.40.4-5.fc15 has been pushed to the Fedora 15 stable repository.  If
> problems still persist, please make note of it in this bug report.

See immediate kernel panic with this. Ok with prev kernel 2.6.40-3.0.fc15.x86_64
Here is abbreviated  version of the dump (nothing in logs so wrote it down):

----
Kernel Panic - not syncing: VFS Unable to mount root fs or unknown block(0,0)
Pid: 1 comm: swapper Not tainted 2.6.40.4-5.fc15.x86_64 #1
Call trace:
Panic
mount block root
mount root
prepare namespace
? release_tgcred
kernel init
? sched_tail
kernel thread helper
? sfmt kernel
? gs change
------

Note: When I did yum update that installed new kernel, yum hung at end of cleanup for long time. Finally I did crtl-c to get back to prompt. Also, did lsinitrd on new inintramfs for new kernel and it shows strange errors: 
gzip: initramfs-2.6.40.4-5.fc15.x86_64.img: unexpected end of file

gzip: initramfs-2.6.40.4-5.fc15.x86_64.img: unexpected end of file
cpio: premature end of file

Other kernels still in /boot don't show this error from lsinitrd.

Rebuilt it with dracut --force and see same results. Possibly the ram image file is corrupt?
System: HP pavilion dv7-6195 laptop (i7/sandybridge)
Comment 27 Jeff Layton 2011-09-08 06:53:51 EDT
That sounds like a different problem entirely, unrelated to the issue here. I'd recommend opening a new bug for that.
Comment 28 gene smith 2011-09-14 00:43:41 EDT
(In reply to comment #27)
> That sounds like a different problem entirely, unrelated to the issue here. I'd
> recommend opening a new bug for that.

Just for the record apparently my problem was user induced by killing yum update before the new kernel was completely installed. Removed the partially installed kernel and doing yum update again fixed the problem.

Note You need to log in before you can comment on or make changes to this bug.