Description of problem: While trying to mount a share (DFS or 'classic') with Kerberos, I got an error message like: mount error(5): Input/output error username/password works fine.
Does anything pop up in dmesg when you mount that way?
Aug 10 14:52:39 host kernel: CIFS VFS: Send error in SessSetup = -5 Aug 10 14:52:39 host kernel: CIFS VFS: cifs_mount failed w/return code = -5
EIO usually means either that you got some malformed packets back from the server, or we got some other CIFS error that the kernel doesn't know how to translate. It might be good to have some cFYI info from one of these attempts. Could you follow the instructions here, capture the messages to a file and attach it? http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Enabling_Debugging
I believe you may have attached the info to the wrong bug... https://bugzilla.redhat.com/show_bug.cgi?id=622802#c2 Looks like the client is sending a session setup, and the server is sending back an error that gets translated to ERRgeneral which is sort of a catch-all error like EIO is. Would it be possible to get a binary capture of this? The instructions for how to do this are a little ways down on the same samba wiki page. If not, you could do the capture yourself and tell me what wireshark says the error response to the Session Setup packet is.
Created attachment 437892 [details] trace file
That doesn't show it. The problem here is that this error is cropping up when chasing a DFS referral, so limiting the capture to the particular mount host means that you aren't getting anything once the referral is chased. You may want to redo the capture and leave out the 'host cifs_server.example.com' part of the capture filter. If you capture everything on port 445 it should get it.
Created attachment 437901 [details] tcp dump only limited to port 445
Wireshark says the error was this: NT Status: RPC_NT_INVALID_BINDING (0xc0020003) ...which is one I've never seen before. What kind of server is at 192.168.50.100?
This is an EMC Storage.
Thanks. I can find no helpful mention of that error in any of the MS docs. The closest I found is some googling that turned up some bugs in RDP protocol handling. Nothing in CIFS protocol however. I suspect this is a bug in the EMC SPNEGO/krb5 implementation. You may want to file a bug report with them. You should be able to reference this bug if they need captures and such. If they think it's a bug in the Linux cifs implementation, then please have them explain what they think we're doing wrong... I'll leave this bug open for now with the needinfo flag set in case you or they have questions or we need to discuss this further.
One assumption could be that the enctype is not supported by the EMC, as the TGT is fetched from an 2008R2 Server which does not support DES anymore (by default).
Possibly -- only EMC could tell us that. I'll note that the krb5 ticket in the session setup request has this: Encryption type: rc4-hmac (23) ...which is a very common enctype for windows. It would be a stretch for them to claim cifs+krb5 compatibility without supporting it... Also from the capture, the negTokenTarg reply is "reject", but we can't tell much beyond that. The SMB error code is also very odd. It's one I've never seen before, and I've seen servers throw quite a few different errors in response to krb5 auth problems.
Setting: default_tkt_enctypes = arcfour-hmac-md5 default_tgs_enctypes = arcfour-hmac-md5 solved the problem, so I guess we can close this one. Thanks for keeping an eye on it, Jeff.
Sorry...where was this set? Someplace on the server?
No, this has been set on the client. From my observation, if the TGT is AES256, you won't be able to fetch service tickets from machines that does not support it. Setting the above value forced the TGT to arcfour-hmac-md5 which is compatible with every component (yes, even EMC) in our environment.
Huh....ok. Here's what I don't quite get -- the enctype of the service ticket in the session setup request was rc4-hmac. Why would the server care that the TGT used to get this ticket from the KDC was AES? It doesn't seem like that should matter at all...
Ah, sorry, it's not about the TGT, but about the Client/Server Session Key enc type negotiation which does not seem to work if not forced.
hmm, after some tries the problem occurs again so it may just have been luck that it worked once.
Ok, I think this is probably a server issue. That error is pretty strange and not something I've ever seen Windows return. I suggest taking opening a bug with EMC -- I don't think we'll be able to solve this without their involvement. For now, I'll set this to NEEDINFO. If you get some more info from the EMC support people, I'll be happy to look it over.
This seems to be a problem with the MIT krb5 libraries. I've tested mount.cifs and smbclient from samba4 (commit b0b73ca041ba3d90b3924b380abed4975e5354d9) with the "HACK bin/smbclient4... and don't use MS KRB5 oid." patch. This worked: /tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no" worked while this failed: mount.cifs -o user=metze,uid=21866,sec=krb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' /home/metze See metze-both-arcfour-06.pcap frame 10 is cifs mount and frame 26 is smbclient4-nomdkrb5. Then I've tested with the installed smbclient from samba3 (Version 3.5.4-62.fc13) and also got the RPC_NT_INVALID_BINDING (0xc0020003) error. smbclient '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yescli_session_setup_blob: receive failed (NT code 0xc0020003) session setup failed: NT code 0xc0020003 As cifs.upcall and smbclient both use the MIT krb5 libraries (krb5-libs-1.7.1-10.fc13.i686), but smbclient4-nomskrb5 uses heimdal, I assume the problem is within the krb5 libraries. The only difference between frames 10 and 26 is the different length of the authenticator.
Created attachment 439107 [details] Patch to build a source3/bin/smbclient4 staticly
Created attachment 439108 [details] Capture of mount.cifs and smbclient4-nomskrb5 with arcfour
Nice work, Metze. Ok, moving this bug to be against the krb5 libs.
Pity that wireshark isn't better able to stitch together SMB packets... There is a difference in the length of the session setup requests. Both session setup requests have a trailing frame that wireshark ids as an "NBSS Continuation Message". The first one (from the failed session setup request) is 222 bytes long. The second is 226 bytes long. It seems likely that the problem is related to that difference.
Then again, session setup requests have a couple of UCS2 strings at the end. Those strings can be variable length, so it's not a given that the two different programs will send the exact same length session setup request. It would be helpful to have more info from the EMC side of things. Like, what specifically does it not like about the tickets that MIT krb5 is creating?
I've also tested /tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no" "--option=gensec:fake_gssapi_krb5=yes" "--option=gensec:gssapi_krb5=no" and it also works... I first thought that the problem might be that smbclient (samba3) and cifs.upcall use the kerberos library directly (with selfmade gss wrapping) instead of using the gssapi wrapper, but the fake_gssapi_krb5 module should simulate that behavior...
from IRC: 15:48 < jlayton> metze: the length difference between those two session setup packets could have more to do with the strings at the end of the session setup packet too 15:48 < jlayton> unless I'm missing a length field in there 15:53 < metze> jlayton: no, they're not part of the authenticator 15:57 < jlayton> I didn't see an authenticator field length in there 16:00 < metze> click in ticket and compare the offsets 16:01 < metze> they start at the same offset 16:01 < metze> and then click on the authenticator 16:01 < metze> they also start at the same offset 16:01 < metze> but have different starting bytes 16:01 < metze> the length is asn1 encoded 16:02 < metze> but the smb level security blob length is also different 16:06 < metze> 1396 of mount.cifs and 1470 of smbclient4
If you disable tcp checksum validation wireshark reassambles the session setup requests fine...
I think I found the problem, but I need to verify my guess. Inside the authenticator we have a checksum field. For real gssapi this checksum has type 8003 and contains things like the channel binding and delegated credentials. For selfmade gssapi using krb5_mk_req_extended() they checksum should be of CKSUMTYPE_RSA_MD5(7), this is what heimdal is using and the reason why /tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no" "--option=gensec:fake_gssapi_krb5=yes" "--option=gensec:gssapi_krb5=no" works. MIT uses CKSUMTYPE_HMAC_MD5(-138) and that's the reason why smbclient '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes gets rejected. I need to modify heimdal to also use CKSUMTYPE_HMAC_MD5(-138) and see if it also gets rejected.
Created attachment 441233 [details] Patch for krb5-1.7.1 to use checksum rsa md5 (7) as heimdal and windows clients
Do we know which server implementations are doing this? It's a bit hard to tell here if we're talking about a bug in the client or bug-compatibility with a server (and if so, which one(s)?).
I have an EMC Celerra running revision 5.6.47.11. (My EMC folks tell me they are planning an upgrade to 6.0.) I'm using RHEL 6 Beta 2 with kerberos: krb5-libs-1.8.2-2.el6.x86_64 This combination results in the above errors when using kerberos authentication.
Created attachment 456988 [details] patch for kerberos 1.8.2 Porting the patch to 1.8.2 took some tweaking, but I can confirm that kerberos 1.8.2 in RHEL 5 Beta 2 built with this patch works with our EMC Celerra as well as normal Windows workstations. (Where before only Windows workstations as the CIFS server would work.)
It's bug-compatibility with EMC Servers. As the client doesn't use the GSSAPI-Checksum 0x8003, there might be some other possible fixes using krb5_auth_con_set_req_cksumtype(). See the discussion here: http://mailman.mit.edu/pipermail/krbdev/2010-September/thread.html#9478 But as nobody tried a krb5_auth_con_set_req_cksumtype() based workaround, it's hard to tell if it would really fix the problem.
Looking at the source of smbclient, it seems that krb5_auth_con_set_req_cksumtype() is already used to trigger the GSSAPI-Checksum, but the problem still exists without using the patched krb5 library. (But I haven't analyzed this in detail).
I've fixed the problem with smbclient, see https://bugzilla.samba.org/show_bug.cgi?id=7883 I'll provide a patch for cifs-utils soon.
The patches for cifs-utils are here: https://bugzilla.samba.org/show_bug.cgi?id=7890
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This has long since been fixed in f14 and beyond. I'm hesitant to push fixes into f13 as it's almost EOL'ed. So, I'll close this WONTFIX. Please reopen if you wish to make a case for fixing this in f13 too.