Bug 717060
Summary: | Trying to mount a CIFS share crashes the entire system | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | aquini, gansalmon, itamar, jlayton, jonathan, kernel-maint, madhu.chinakonda, smfltc, smfrench, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-07-08 10:59:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Adam Williamson
2011-06-27 20:19:44 UTC
I hoped that one of the big list of cifs fixes in rc5 would fix this, but no joy: still crashes instantly with 3.0-0.rc5.git0.1.fc16.x86_64 . cc'ing Steve F. as I'm on vacation this week and won't have time to dig into it until next week... What might be helpful is starting with some details of what mount options you're using and what server you're mounting. For bonus points, if you could follow the directions here to get a listing of the crash site, that would be very helpful: http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Oopses Thanks. At first I had a line in /etc/fstab: //192.168.1.13/Volume_1 /share/data cifs rsize=8192,wsize=8192,nosuid,soft,user=guest,noauto,comment=systemd.automount 0 0 but after I removed that, simply mounting it manually with: mount.cifs //192.168.1.13/Volume_1 /share/data is enough to cause the crash, which happens instantly - like, I hit enter, and boom, I see the console with the trace on it. The server is a D-Link DNS-323 - http://www.dlink.com/products/?pid=509 - running stock, up-to-date firmware. I just logged into it via telnet, and it appears to be running: / # smbd -V Version 3.0.24 I still have one system on Fedora 15, which is able to mount and use the exact same share just fine, using the same fstab line. That's running 2.6.38.8-32.fc15.x86_64 . I'll try for the bonus points in a minute :) OK, for the bonus points: (gdb) list *(cifs_get_tcp_session+0x62) 0xb211 is in cifs_get_tcp_session (fs/cifs/connect.c:1695). 1690 1691 memset(&addr, 0, sizeof(struct sockaddr_storage)); 1692 1693 cFYI(1, "UNC: %s ip: %s", volume_info->UNC, volume_info->UNCip); 1694 1695 if (volume_info->UNCip && volume_info->UNC) { 1696 rc = cifs_fill_sockaddr((struct sockaddr *)&addr, 1697 volume_info->UNCip, 1698 strlen(volume_info->UNCip), 1699 volume_info->port); I wasn't sure if 0x5e0 mattered, so I did that too: (gdb) list *(cifs_get_tcp_session+0x5e0) 0xb78f is in cifs_reconnect (fs/cifs/connect.c:79). 74 * reconnect tcp session 75 * wake up waiters on reconnection? - (not needed currently) 76 */ 77 static int 78 cifs_reconnect(struct TCP_Server_Info *server) 79 { 80 int rc = 0; 81 struct list_head *tmp, *tmp2; 82 struct cifs_ses *ses; 83 struct cifs_tcon *tcon; Hope that helps. Hmmm...just to be sure, you did do the analysis on the same kernel that you saw the oops, right? If you don't, the offsets might not match and you'll end up in the wrong place. In any case...it looks like it crashed here on a NULL pointer dereference: if (volume_info->UNCip && volume_info->UNC) { ...which would imply that volume_info was NULL, but that's *really* odd as I don't see any way that that could occur. In any case, I'll try to recreate this when I get the chance... I wondered that too, but yes, I checked, they match. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers (In reply to comment #3) > but after I removed that, simply mounting it manually with: > > mount.cifs //192.168.1.13/Volume_1 /share/data > > is enough to cause the crash, which happens instantly - like, I hit enter, and > boom, I see the console with the trace on it. > Strange. I tried to reproduce this with a similar set of mount options, but no luck. Can you paste in the oops from the more recent kernel? I'd like to verify that that it looks similar. Also, if you're able could you strace the above mount.cifs command? Something like # strace -o /tmp/mount.cifs.strace -f -v -s 256 mount.cifs //192... ...and then attach mount.cifs.strace here? That should show me exactly what mount options are getting passed to the kernel. okay, here's the strace output, and the oops, from 3.0-0.rc6.git0.1.fc16.x86_64 . The behaviour seems a bit different now (or maybe it was like this before and somehow I missed it) - the system doesn't crash. Immediately upon hitting 'enter' I see a console with the trace on it, as before, but ctrl-alt-f2 gets me to a console, and ctrl-alt-f1 gets me back to the desktop, and everything still seems to be working (except the mount didn't happen, obviously). Created attachment 511371 [details]
strace output
strace output
Created attachment 511372 [details]
trace from 3.0-0.rc6.git0.1.fc16.x86_64
I just checked again and we're still at the same lines of code with that trace. one thing I forgot to mention is that it prompts for a password when run from console; I usually just hit enter. But I just tried entering a password - 'pass' - and it failed again. So I don't think the empty password is significant. Ok, I wonder if you're hitting a DFS referral here. I see one bit of suspect code in cifs_mount that might be causing this: if (referral_walks_count) { if (tcon) cifs_put_tcon(tcon); else if (pSesInfo) cifs_put_smb_ses(pSesInfo); cifs_cleanup_volume_info(&volume_info); FreeXid(xid); } ...the problem here though is that cifs_cleanup_volume_info will zero out the volume_info pointer and then the subsequent call to cifs_get_tcp_session will oops like this. Let me look over the code a bit more and I'll see if I can get you a test patch. Created attachment 511393 [details]
patch -- remove bogus call to cifs_cleanup_volume_info
Adam, could you test this patch and let me know if it fixes the issue?
Ok, I figured out how to reproduce the panic -- just needed to have the client chase a DFS referral at mount time. I'll test the patch out now... Created attachment 511491 [details]
patch -- fix several regressions when chasing DFS referrals at mount time
If you haven't yet built a kernel with the other patch, it might be good to test this one instead. It incorporates the earlier patch but also fixes a number of other regressions I uncovered while investigating this.
This patchset has been sent upstream and I'm awaiting comment from the cifs maintainer. With luck, it should make 3.0.
Adam tested these out and they fixed the issue for him. I've posted the patches upstream and they should (hopefully) make 3.0, assuming that Steve F pushes them in time. did the patches make rc6? They made -rc7 |