Trying to mount a CIFS share on Rawhide - 3.0-0.rc4.git3.1.fc16.x86_64 - hangs the entire system (monitor displays part of the traceback, and system becomes entirely unresponsive, can only restart via the reset switch). Same share worked fine with F15. Trace: Jun 27 13:12:22 adam kernel: [ 1026.522051] FS-Cache: Loaded Jun 27 13:12:22 adam kernel: [ 1026.523794] FS-Cache: Netfs 'cifs' registered for caching Jun 27 13:12:22 adam kernel: [ 1026.536690] CIFS VFS: default security mechanism requested. The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.1 Jun 27 13:12:22 adam kernel: [ 1026.620900] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Jun 27 13:12:22 adam kernel: [ 1026.620934] IP: [<ffffffffa04e6280>] cifs_get_tcp_session+0x62/0x5e0 [cifs] Jun 27 13:12:22 adam kernel: [ 1026.620958] PGD 416610067 PUD 419ccc067 PMD 0 Jun 27 13:12:22 adam kernel: [ 1026.620975] Oops: 0000 [#1] SMP Jun 27 13:12:22 adam kernel: [ 1026.620987] CPU 4 Jun 27 13:12:22 adam kernel: [ 1026.620993] Modules linked in: des_generic md4 nls_utf8 cifs fscache tcp_lp fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle tun bridge stp llc ppdev parport_pc lp parport sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack coretemp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_seq uvcvideo snd_hwdep eeepc_wmi videodev asus_wmi media snd_usbmidi_lib btusb bluetooth snd_rawmidi sparse_keymap snd_seq_device snd_pcm v4l2_compat_ioctl32 rfkill snd_timer snd iTCO_wdt soundcore r8169 mii microcode i2c_i801 snd_page_alloc shpchp xhci_hcd iTCO_vendor_support e1000e virtio_net kvm uinput firewire_ohci firewire_core crc_itu_t usb_storage uas nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core mxm_wmi wmi video [last unloaded: scsi_wait_scan] Jun 27 13:12:22 adam kernel: [ 1026.621318] Jun 27 13:12:22 adam kernel: [ 1026.621325] Pid: 19141, comm: mount.cifs Tainted: G W 3.0-0.rc4.git3.1.fc16.x86_64 #1 System manufacturer System Product Name/P8P67 DELUXE Jun 27 13:12:22 adam kernel: [ 1026.621352] RIP: 0010:[<ffffffffa04e6280>] [<ffffffffa04e6280>] cifs_get_tcp_session+0x62/0x5e0 [cifs] Jun 27 13:12:22 adam kernel: [ 1026.621375] RSP: 0018:ffff8804162a9bd8 EFLAGS: 00010246 Jun 27 13:12:22 adam kernel: [ 1026.621385] RAX: 0000000000000000 RBX: ffff88044171d918 RCX: 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621397] RDX: ffff8804162a9be0 RSI: 000000000000013d RDI: ffff8804162a9c60 Jun 27 13:12:22 adam kernel: [ 1026.621409] RBP: ffff8804162a9c98 R08: 0000000000000002 R09: 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621420] R10: 0000000000000000 R11: ffffea000edeb1c0 R12: 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621432] R13: ffff88042037ca88 R14: ffff880443876910 R15: 0000000000000005 Jun 27 13:12:22 adam kernel: [ 1026.621444] FS: 00007f011310e740(0000) GS:ffff88045ee00000(0000) knlGS:0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621459] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jun 27 13:12:22 adam kernel: [ 1026.621469] CR2: 0000000000000020 CR3: 0000000419fbf000 CR4: 00000000000406e0 Jun 27 13:12:22 adam kernel: [ 1026.621481] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621492] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 27 13:12:22 adam kernel: [ 1026.621505] Process mount.cifs (pid: 19141, threadinfo ffff8804162a8000, task ffff880406500000) Jun 27 13:12:22 adam kernel: [ 1026.621517] Stack: Jun 27 13:12:22 adam kernel: [ 1026.621523] 000000000000cc50 0000000000000000 0000000000000000 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621545] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621566] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Jun 27 13:12:22 adam kernel: [ 1026.621586] Call Trace: Jun 27 13:12:22 adam kernel: [ 1026.621601] [<ffffffffa04eadf5>] cifs_mount+0xe1/0x4de [cifs] Jun 27 13:12:22 adam kernel: [ 1026.621615] [<ffffffffa04dbf64>] cifs_do_mount+0x1ab/0x364 [cifs] Jun 27 13:12:22 adam kernel: [ 1026.621631] [<ffffffff81211929>] ? selinux_sb_copy_data+0x192/0x1ab Jun 27 13:12:22 adam kernel: [ 1026.621646] [<ffffffff8113a84c>] mount_fs+0x69/0x155 Jun 27 13:12:22 adam kernel: [ 1026.621658] [<ffffffff81103f9c>] ? __alloc_percpu+0x10/0x12 Jun 27 13:12:22 adam kernel: [ 1026.621671] [<ffffffff8114f90a>] vfs_kern_mount+0x63/0xa0 Jun 27 13:12:22 adam kernel: [ 1026.621683] [<ffffffff811505de>] do_kern_mount+0x4d/0xdf Jun 27 13:12:22 adam kernel: [ 1026.621695] [<ffffffff81151c74>] do_mount+0x63c/0x69f Jun 27 13:12:22 adam kernel: [ 1026.621706] [<ffffffff81151f58>] sys_mount+0x88/0xc2 Jun 27 13:12:22 adam kernel: [ 1026.621718] [<ffffffff814f9e82>] system_call_fastpath+0x16/0x1b Jun 27 13:12:22 adam kernel: [ 1026.621728] Code: 48 ff ff ff 49 89 fc 48 89 d7 f3 ab 74 1d 49 8b 4c 24 20 49 8b 54 24 18 48 c7 c6 c1 76 50 a0 48 c7 c7 29 79 50 a0 e8 89 2b 00 e1 Jun 27 13:12:22 adam kernel: [ 1026.621924] RIP [<ffffffffa04e6280>] cifs_get_tcp_session+0x62/0x5e0 [cifs] Jun 27 13:12:22 adam kernel: [ 1026.621943] RSP <ffff8804162a9bd8> Jun 27 13:12:22 adam kernel: [ 1026.621951] CR2: 0000000000000020 Jun 27 13:12:22 adam kernel: [ 1026.644600] ---[ end trace 33b5bdcde362acf5 ]---
I hoped that one of the big list of cifs fixes in rc5 would fix this, but no joy: still crashes instantly with 3.0-0.rc5.git0.1.fc16.x86_64 .
cc'ing Steve F. as I'm on vacation this week and won't have time to dig into it until next week... What might be helpful is starting with some details of what mount options you're using and what server you're mounting. For bonus points, if you could follow the directions here to get a listing of the crash site, that would be very helpful: http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Oopses
Thanks. At first I had a line in /etc/fstab: //192.168.1.13/Volume_1 /share/data cifs rsize=8192,wsize=8192,nosuid,soft,user=guest,noauto,comment=systemd.automount 0 0 but after I removed that, simply mounting it manually with: mount.cifs //192.168.1.13/Volume_1 /share/data is enough to cause the crash, which happens instantly - like, I hit enter, and boom, I see the console with the trace on it. The server is a D-Link DNS-323 - http://www.dlink.com/products/?pid=509 - running stock, up-to-date firmware. I just logged into it via telnet, and it appears to be running: / # smbd -V Version 3.0.24 I still have one system on Fedora 15, which is able to mount and use the exact same share just fine, using the same fstab line. That's running 2.6.38.8-32.fc15.x86_64 . I'll try for the bonus points in a minute :)
OK, for the bonus points: (gdb) list *(cifs_get_tcp_session+0x62) 0xb211 is in cifs_get_tcp_session (fs/cifs/connect.c:1695). 1690 1691 memset(&addr, 0, sizeof(struct sockaddr_storage)); 1692 1693 cFYI(1, "UNC: %s ip: %s", volume_info->UNC, volume_info->UNCip); 1694 1695 if (volume_info->UNCip && volume_info->UNC) { 1696 rc = cifs_fill_sockaddr((struct sockaddr *)&addr, 1697 volume_info->UNCip, 1698 strlen(volume_info->UNCip), 1699 volume_info->port); I wasn't sure if 0x5e0 mattered, so I did that too: (gdb) list *(cifs_get_tcp_session+0x5e0) 0xb78f is in cifs_reconnect (fs/cifs/connect.c:79). 74 * reconnect tcp session 75 * wake up waiters on reconnection? - (not needed currently) 76 */ 77 static int 78 cifs_reconnect(struct TCP_Server_Info *server) 79 { 80 int rc = 0; 81 struct list_head *tmp, *tmp2; 82 struct cifs_ses *ses; 83 struct cifs_tcon *tcon; Hope that helps.
Hmmm...just to be sure, you did do the analysis on the same kernel that you saw the oops, right? If you don't, the offsets might not match and you'll end up in the wrong place. In any case...it looks like it crashed here on a NULL pointer dereference: if (volume_info->UNCip && volume_info->UNC) { ...which would imply that volume_info was NULL, but that's *really* odd as I don't see any way that that could occur. In any case, I'll try to recreate this when I get the chance...
I wondered that too, but yes, I checked, they match. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
(In reply to comment #3) > but after I removed that, simply mounting it manually with: > > mount.cifs //192.168.1.13/Volume_1 /share/data > > is enough to cause the crash, which happens instantly - like, I hit enter, and > boom, I see the console with the trace on it. > Strange. I tried to reproduce this with a similar set of mount options, but no luck. Can you paste in the oops from the more recent kernel? I'd like to verify that that it looks similar. Also, if you're able could you strace the above mount.cifs command? Something like # strace -o /tmp/mount.cifs.strace -f -v -s 256 mount.cifs //192... ...and then attach mount.cifs.strace here? That should show me exactly what mount options are getting passed to the kernel.
okay, here's the strace output, and the oops, from 3.0-0.rc6.git0.1.fc16.x86_64 . The behaviour seems a bit different now (or maybe it was like this before and somehow I missed it) - the system doesn't crash. Immediately upon hitting 'enter' I see a console with the trace on it, as before, but ctrl-alt-f2 gets me to a console, and ctrl-alt-f1 gets me back to the desktop, and everything still seems to be working (except the mount didn't happen, obviously).
Created attachment 511371 [details] strace output strace output
Created attachment 511372 [details] trace from 3.0-0.rc6.git0.1.fc16.x86_64
I just checked again and we're still at the same lines of code with that trace.
one thing I forgot to mention is that it prompts for a password when run from console; I usually just hit enter. But I just tried entering a password - 'pass' - and it failed again. So I don't think the empty password is significant.
Ok, I wonder if you're hitting a DFS referral here. I see one bit of suspect code in cifs_mount that might be causing this: if (referral_walks_count) { if (tcon) cifs_put_tcon(tcon); else if (pSesInfo) cifs_put_smb_ses(pSesInfo); cifs_cleanup_volume_info(&volume_info); FreeXid(xid); } ...the problem here though is that cifs_cleanup_volume_info will zero out the volume_info pointer and then the subsequent call to cifs_get_tcp_session will oops like this. Let me look over the code a bit more and I'll see if I can get you a test patch.
Created attachment 511393 [details] patch -- remove bogus call to cifs_cleanup_volume_info Adam, could you test this patch and let me know if it fixes the issue?
Ok, I figured out how to reproduce the panic -- just needed to have the client chase a DFS referral at mount time. I'll test the patch out now...
Created attachment 511491 [details] patch -- fix several regressions when chasing DFS referrals at mount time If you haven't yet built a kernel with the other patch, it might be good to test this one instead. It incorporates the earlier patch but also fixes a number of other regressions I uncovered while investigating this. This patchset has been sent upstream and I'm awaiting comment from the cifs maintainer. With luck, it should make 3.0.
Adam tested these out and they fixed the issue for him. I've posted the patches upstream and they should (hopefully) make 3.0, assuming that Steve F pushes them in time.
did the patches make rc6?
They made -rc7