Bug 247204
Summary: | [PATCH] cifs / smbfs, mount error 20, NAS device only supports smb | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kai Engert (:kaie) (inactive account) <kengert> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7 | CC: | abartlet, chris.brown, esandeen, smfrench, steved, triage |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-06-17 01:48:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Kai Engert (:kaie) (inactive account)
2007-07-05 23:02:00 UTC
Similar is bug 190006 The long term aim is for the cifs kernel modules to be a replacement for smbfs. Steve (CC'ed) should be able to clarify the status. BTW, what does smbclient -L show as the server name (often Samba 1.9, if smbfs is having trouble)? A network trace (pcap format) might also show what it claims to support, and help chase this down. [kaie@kaiez1:~]$ smbclient -L '//192.168.2.73' Password: Domain=[ȇ] OS=[] Server=[�] Sharename Type Comment --------- ---- ------- PUBLIC Disk IPC$ IPC Domain=[ȇ] OS=[] Server=[�] Server Comment --------- ------- Workgroup Master --------- ------- (yes, smbclient reports one "special/weird" character for Domain and Server). (In reply to comment #2) > A network trace (pcap format) might also show what it claims to support, and > help chase this down. Can you please recommend what tool / commandline I should use to produce the dump you require? tcpdump -w /tmp/smblog host 192.168.2.73 strings smblog D DBDJDCCODBDGDICODCCODHDDCACACACA ELE SMBr SMBr SMBs SMBs SMBu SMBu SMB% SMB% SMB% SMB% SMB% SMB% SMBq SMBq D DBDJDCCODBDGDICODCCODHDDCACACACA ELE SMBr SMBr SMBs SMBs SMBu SMBu SMB% SMB% SMB% SMB% SMBq SMBq Possible problem with access control for that user on the root of the share. Perhaps the user is authenticating with a different default user id in the two different cases. In any case an ethereal/tcpdump/wireshark binary trace is more helpful for us to read/debug. See http://wiki.samba.org/index.php/Capture_Packets Alternatively turn on cifs debugging: "dmesg -c" (to clear the error log) "echo 7 > /proc/fs/cifs/cifsFYI" attempt the mount then send or attach the dmesg output on a fedora 7 machine: [root@intel tmp]# "echo 7 > /proc/fs/cifs/cifsFYI" -bash: echo 7 > /proc/fs/cifs/cifsFYI: No such file or directory (In reply to comment #6) > on a fedora 7 machine: > > [root@intel tmp]# "echo 7 > /proc/fs/cifs/cifsFYI" > -bash: echo 7 > /proc/fs/cifs/cifsFYI: No such file or directory Argh, I'm dumb. I removed the " chars then it's ok ;-) Created attachment 158793 [details]
tcpdump output from mount attempt
Created attachment 158794 [details]
dmesg output from mount attempt
From Steve French: Looks like the Linux cifs client is authenticating ok - but I don't recognize the server type. The server does not seem to report its version. The server is mishandling at least one request (probably two) but it claims to support the CIFS Unix Extensions (I wonder if it is a fork of Samba). In any case, I am suspicious that the server is mishandling the response to frame 31 (in your trace) but the main problem seems to be the response to the SetFSUnixInfo (see frame 30) which is malformed (looks like invalid word count). Try disabling the Unix Extensions to see if that will work around the server bug (the client will then not try this SetFSInfo request that the server seems to incorrectly respond to) ("echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled" then try the mount) (In reply to comment #10) > Try disabling the Unix Extensions to see if that will work around the > server bug (the client will then not try this SetFSInfo request that > the server seems to incorrectly respond to) > > ("echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled" then try the mount) Wow, that helped. Thanks! I was able to mount and access/read/write files on the device now, wonderful. I will attach a dmesg debug logfile in case you are curious about the details. Should we (I) report something to the hardware vendor? (Should cifs retry without extensions by default?) Created attachment 158978 [details]
dmesg output from working mount (after disabling unix extensions)
I am moving this to the kernel as it is obviously a cifs fs issue. Hello Kai, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris Christopher, yes, this bug is still happening with latest Fedora-7 kernel-2.6.22.5-76.fc7 This bug is still waiting for an answer to my comment 11. While a workaround exists (disable linux extensions in order to work around a bug in the server device), the open question is: If a server behaves as described in this bug report, should the cifs code automatically retry without extensions? I can't answer that one Kai so I'm re-assigning to the filesystem maintainer who might be able to comment better. Cheers Chris Answering Kai's question in comment #11 and #15 about fallback... First, there is a mount option (nounix) in very recent kernels that should allow you to forcibly disable unix extensions on a per-mount basis. That patch should make it into 2.6.23 (I think). It sounds like the server is just plain misbehaving here. I'd definitely file a bug report with the vendor. You might want to include network traces and point them toward the info in this BZ case. As to falling back if the mount fails... If the server claims to support unix extensions, I don't see why we should second guess that and retry the mount without them. Mounts can fail for many reasons and trying to automatically guess the reason why it failed can be very difficult. This is the first case I've heard of where a server claimed to support unix extensions when it actually doesn't. If this problem were more widespread, I'd be more inclined to consider automatic workarounds for it. I suggest we close this as NOTABUG... Jeff, thanks for your explanations. It's good to hear that we will soon see mount options that would enable the use of such devices. But I think we should try hard to be compatible with as many devices as possible out of the box. The average user will connect the device and see a failure. IMHO we should target those users. IIRC my device isn't the only one that behaves in such a way, see the posts I quoted. If we claim to be compatible with smb devices, and other or older distributions can access the device out of the box, but our new code can not, then WE appear broken (even if you can argue we are not). After all, we're talking about an extensions to the original protocol. If using the extension fails, we should fall back. FYI, this strategy is what we use with the SSL/TLS protocols in Firefox. That protocol is evolving, too, and there are many servers that don't correctly implement the new extensions. We try hard to fall back and give the users a pleasant experience without error messages. (In reply to comment #18) > Jeff, thanks for your explanations. > It's good to hear that we will soon see mount options that would enable the use > of such devices. > > But I think we should try hard to be compatible with as many devices as possible > out of the box. The average user will connect the device and see a failure. IMHO > we should target those users. IIRC my device isn't the only one that behaves in > such a way, see the posts I quoted. > > If we claim to be compatible with smb devices, and other or older distributions > can access the device out of the box, but our new code can not, then WE appear > broken (even if you can argue we are not). > > After all, we're talking about an extensions to the original protocol. If using > the extension fails, we should fall back. > > FYI, this strategy is what we use with the SSL/TLS protocols in Firefox. That > protocol is evolving, too, and there are many servers that don't correctly > implement the new extensions. We try hard to fall back and give the users a > pleasant experience without error messages. > Problem is, falling back might become a security problem for some users if they are counting on the extensions being enabled. Right. There's also the issue of interpreting what the error you get back actually means. In this case, we got an ENOTDIR, but that's not necessarily a 1:1 correspondence of problem and error. I'm not convinced that we can reliably detect when the server is reporting that it supports POSIX extensions but doesn't. There's also the chicken and egg problem. If we automatically work around this, then most people won't know that their servers are broken this way. At least with this method, users will know something is wrong... The safest thing to me seems to be to leave things as they are. The manual workarounds should work for anyone needing them until these servers are patched. Eric suggested trying to report a more helpful error when this occurs. That seems like it might be possible. I'll plan to give that a look and see whether we can... There are more problems here than responding incorrectly to the "Query File Unix Basic" SMB transact2 call. Various non-Unix (original SMB/CIFS) have problems too - e.g. a) SMB TreeConnect does not return the file system type (so we can not work around this whenever we see a server whose filesystem is of the same type) b) SMB SessionSetup does not return the server type (NativeOS and Network Operating System name). It does not even return the server's domain. The response to QueryFSDeviceInfo returns an error, which is in fact malformed/illegal (0xFFFF0002). None of these have anything to do with Unix Extensions. It does claim to support Unix Extensions (on negotiate protocol response capabilities) but then returns a bizarre return code 0x00040002 on the initial SetFSInfo which presumably we don't map - we could try to add this particular return code (e.g. make up ERR_BROKEN_UNIX_EXTENSIONS) to a new POSIX return code which we could look for in SetFSInfo. We certainly should log the failed SetFSInfo - perhaps we could add more text to the error on line 4458 of fs/cifs/cifssmb.c cERROR(1, ("Send error in SETFSUnixInfo = %d", rc)); or add more text to line 1744 of fs/cifs/connect.c suggesting disabling unix extensions? if (CIFSSMBSetFSUnixInfo(xid, tcon, cap)) { cFYI(1, ("setting capabilities failed")); } In any case, there are multiple problems with this server implementation as can be seen from the various warning messages it triggered in dmesg on the client, and these can also be seen from looking at the tcpdump trace. Not all of these problems have to do with the Unix extensions. There may be a workaround possibility based on the strange return code on SetFSInfo but the other bugs (non-Unix extensions related) should also be fixed by the server even if we workaround this I added better debug messages for this case (instructing how to turn off Unix Extensions when this error encountered) - could you try them out and see if they help? http://pserver.samba.org/samba/ftp/cifs-cvs/cifs-1.50c.tar.gz Given this is not a successful negotiation of the Unix extensions, I agree with Kai that they should not be considered negotiated in the first place. Just logging it and asking the admin to fix things seems wrong. If the server negotiated Unix extensions, then failed to honor them, this would be a different situation. The problem is that various servers supported (only) the original Unix extensions but not SetFSInfo (Unix SetFSInfo level) and would be indistinguishable at first from the broken server. The server has already said: 1) it supports Unix Extensions (capability returned on Negotiate) 2) it supports some optional Unix capabilities (on the QFSInfo) That it did not support SetFSInfo makes it look like an older (pre-"CIFS POSIX Extensions") server, like server which simply implements the original CIFS Unix Extensions ie those documented in the SNIA CIFS Technical Reference and by HP before that). We do not want to forbid using the Unix Extensions simply because SetFSInfo was not supported, that was only added in the last few years, and is not as important as the QueryFSInfo Unix (which it does support). But it is after that that we get problems ... It is not until the first stat (QueryPathInfo Unix for the root directory of the mount) that we can see a malformed response. It seems risky to turn off a set of features just because QueryUnixInfo of "." fails. Perhaps we could always fall back on a corrupt/malformed Query Unix Info response to using the non-Unix NT style QueryPathInfo but it seems drastic to turn off other Unix Info features because of a bug in implementing one. In any case, it is important for a bug report to be opened against the server (i t seems to have other SMB problems as I noted). Created attachment 201661 [details]
patch to retry failed Unix Query Path Info with older CIFS style equivalent
Here is a patch (on top of cifs version 1.50c referenced above) to retry
UnixQPathInfo with older style Query Path Info. Could you see if that gets you
past the worst of the problems (it will still log error messages a lot to dmesg
- we will have to make a change to only log those once - but I want to make
sure that something else is not broken too in the Unix Extensions
implementation).
Can I bug people for an update on this? Is the nounix mount option working for reporters here and have the error messages been made more .. helpful? Steve - what is the latest on the patch you attached to this report? Thanks for the reminder. I apologize for not yet having recompiled a kernel with this patch. Let me try to compile the latest Fedora 7 kernel with the attached patch now. I downloaded cifs 1.50c from the link in comment 22. Then I tried to apply the attached patch. Hunk 2 applied. There is no context that completely matches Hunk 1, because I can't find cERROR(1, ("Malformed FILE_UNIX_BASIC_INFO response.\n" ... anywhere in the file. But the second part of the context } else { __u16 data_offset = le16_to_cpu(pSMBr->t2.DataOffset); memcpy((char *) pFindData, is present in the file only once, so I think I know where I must change -EIO to -EOPNOTSUPP It confuses me that I'm getting compiler errors when trying to recompile the latest Fedora 7 kernel SRPM on a Fedora 7 system... drivers/net/wireless/ipw2200.c:1236:23: error: bad constant expression drivers/net/wireless/ath5k/base.c:288:38: error: marked inline, but without a definition drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a definition drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a definition drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a definition drivers/net/wireless/ath5k/base.c:281:35: error: marked inline, but without a definition drivers/net/wireless/ath5k/base.c:288:38: error: marked inline, but without a definition It seems strange, but the above compiler errors seem to be a consequence of cifs-1.50c I can successfully rebuild the unchanged kernel source rpm. When I try to build a rpm that has the cifs-1.50c added as a patch, please a small change to kernel.spec to apply that patch, then I get above compiler errors... Created attachment 291263 [details]
cifs 1.50c as a patch against fedora 7 kernel rpm
this is the patch I used, that result in the compiler errors mentioned in the
previous comment.
> When I try to build a rpm that has the cifs-1.50c added as a patch, please a
> small change to kernel.spec to apply that patch, then I get above compiler
errors...
s/please a/plus a/
I would like to donate the NAS hardware and free shipping to someone who would be interested to work on this and test it. Unfortunately I've been unable to get it working for testing, see above. I think it would be simpler if coding and testing can be done at the same time. Who wants it? Steve? This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. Fedora 7 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. More than one year later, I have now repeated my attempts to use this hardware with Fedora 10. I'm happy to report that it's working now, when using -o nosfu,nounix Thanks a lot! Regards, Kai |