Bug 622790

Summary: Input/output error with sec=krb5
Product: [Fedora] Fedora Reporter: Marcus Moeller <marcus.moeller>
Component: cifs-utilsAssignee: Jeff Layton <jlayton>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: gdeschner, jjneely, jlayton, metze, nalin, ssorce, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 645127 667644 667647 (view as bug list) Environment:
Last Closed: 2011-06-15 16:38:50 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 645127, 667644, 667647, 667675    
Attachments:
Description Flags
trace file
none
tcp dump only limited to port 445
none
Patch to build a source3/bin/smbclient4 staticly
none
Capture of mount.cifs and smbclient4-nomskrb5 with arcfour
none
Patch for krb5-1.7.1 to use checksum rsa md5 (7) as heimdal and windows clients
none
patch for kerberos 1.8.2 none

Description Marcus Moeller 2010-08-10 08:31:45 EDT
Description of problem:

While trying to mount a share (DFS or 'classic') with Kerberos, I got an error message like:

mount error(5): Input/output error

username/password works fine.
Comment 1 Jeff Layton 2010-08-10 08:45:02 EDT
Does anything pop up in dmesg when you mount that way?
Comment 2 Marcus Moeller 2010-08-10 08:53:11 EDT
Aug 10 14:52:39 host kernel: CIFS VFS: Send error in SessSetup = -5
Aug 10 14:52:39 host kernel: CIFS VFS: cifs_mount failed w/return code = -5
Comment 3 Jeff Layton 2010-08-10 09:08:33 EDT
EIO usually means either that you got some malformed packets back from the server, or we got some other CIFS error that the kernel doesn't know how to translate.

It might be good to have some cFYI info from one of these attempts. Could you follow the instructions here, capture the messages to a file and attach it?

http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Enabling_Debugging
Comment 4 Jeff Layton 2010-08-10 09:53:29 EDT
I believe you may have attached the info to the wrong bug...

https://bugzilla.redhat.com/show_bug.cgi?id=622802#c2

Looks like the client is sending a session setup, and the server is sending back an error that gets translated to ERRgeneral which is sort of a catch-all error like EIO is. Would it be possible to get a binary capture of this? The instructions for how to do this are a little ways down on the same samba wiki page.

If not, you could do the capture yourself and tell me what wireshark says the error response to the Session Setup packet is.
Comment 5 Marcus Moeller 2010-08-10 10:08:48 EDT
Created attachment 437892 [details]
trace file
Comment 6 Jeff Layton 2010-08-10 10:23:16 EDT
That doesn't show it. The problem here is that this error is cropping up when chasing a DFS referral, so limiting the capture to the particular mount host means that you aren't getting anything once the referral is chased.

You may want to redo the capture and leave out the 'host cifs_server.example.com' part of the capture filter. If you capture everything on port 445 it should get it.
Comment 7 Marcus Moeller 2010-08-10 10:36:19 EDT
Created attachment 437901 [details]
tcp dump only limited to port 445
Comment 8 Jeff Layton 2010-08-10 12:40:09 EDT
Wireshark says the error was this:

    NT Status: RPC_NT_INVALID_BINDING (0xc0020003)

...which is one I've never seen before. What kind of server is at 192.168.50.100?
Comment 9 Marcus Moeller 2010-08-10 12:55:49 EDT
This is an EMC Storage.
Comment 10 Jeff Layton 2010-08-10 13:11:04 EDT
Thanks. I can find no helpful mention of that error in any of the MS docs. The closest I found is some googling that turned up some bugs in RDP protocol handling. Nothing in CIFS protocol however.

I suspect this is a bug in the EMC SPNEGO/krb5 implementation. You may want to file a bug report with them. You should be able to reference this bug if they need captures and such. If they think it's a bug in the Linux cifs implementation, then please have them explain what they think we're doing wrong...

I'll leave this bug open for now with the needinfo flag set in case you or they have questions or we need to discuss this further.
Comment 11 Marcus Moeller 2010-08-10 13:17:03 EDT
One assumption could be that the enctype is not supported by the EMC, as the TGT is fetched from an 2008R2 Server which does not support DES anymore (by default).
Comment 12 Jeff Layton 2010-08-10 13:33:56 EDT
Possibly -- only EMC could tell us that. I'll note that the krb5 ticket in the session setup request has this:

Encryption type: rc4-hmac (23)

...which is a very common enctype for windows. It would be a stretch for them to claim cifs+krb5 compatibility without supporting it...

Also from the capture, the negTokenTarg reply is "reject", but we can't tell much beyond that.

The SMB error code is also very odd. It's one I've never seen before, and I've seen servers throw quite a few different errors in response to krb5 auth problems.
Comment 13 Marcus Moeller 2010-08-10 14:02:32 EDT
Setting:

        default_tkt_enctypes = arcfour-hmac-md5
        default_tgs_enctypes = arcfour-hmac-md5

solved the problem, so I guess we can close this one. Thanks for keeping an eye on it, Jeff.
Comment 14 Jeff Layton 2010-08-10 14:09:10 EDT
Sorry...where was this set? Someplace on the server?
Comment 15 Marcus Moeller 2010-08-10 14:17:42 EDT
No, this has been set on the client.

From my observation, if the TGT is AES256, you won't be able to fetch service tickets from machines that does not support it. Setting the above value forced the TGT to arcfour-hmac-md5 which is compatible with every component (yes, even EMC) in our environment.
Comment 16 Jeff Layton 2010-08-10 14:31:46 EDT
Huh....ok. Here's what I don't quite get -- the enctype of the service ticket in the session setup request was rc4-hmac. Why would the server care that the TGT used to get this ticket from the KDC was AES?

It doesn't seem like that should matter at all...
Comment 17 Marcus Moeller 2010-08-10 15:04:35 EDT
Ah, sorry, it's not about the TGT, but about the Client/Server Session Key enc type negotiation which does not seem to work if not forced.
Comment 18 Marcus Moeller 2010-08-11 09:43:50 EDT
hmm, after some tries the problem occurs again so it may just have been luck that it worked once.
Comment 19 Jeff Layton 2010-08-16 14:38:35 EDT
Ok, I think this is probably a server issue. That error is pretty strange and not something I've ever seen Windows return. I suggest taking opening a bug with EMC -- I don't think we'll be able to solve this without their involvement.


For now, I'll set this to NEEDINFO. If you get some more info from the EMC support people, I'll be happy to look it over.
Comment 20 Stefan Metzmacher 2010-08-17 09:13:35 EDT
This seems to be a problem with the MIT krb5 libraries.

I've tested mount.cifs and smbclient from samba4 (commit b0b73ca041ba3d90b3924b380abed4975e5354d9) with the
"HACK bin/smbclient4... and don't use MS KRB5 oid." patch.

This worked:

/tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no"

worked while this failed:

mount.cifs -o user=metze,uid=21866,sec=krb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' /home/metze


See metze-both-arcfour-06.pcap frame 10 is cifs mount and frame 26 is smbclient4-nomdkrb5.

Then I've tested with the installed smbclient from samba3
(Version 3.5.4-62.fc13) and also got the RPC_NT_INVALID_BINDING (0xc0020003)
error.

smbclient '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yescli_session_setup_blob: receive failed (NT code 0xc0020003)
session setup failed: NT code 0xc0020003

As cifs.upcall and smbclient both use the MIT krb5 libraries
(krb5-libs-1.7.1-10.fc13.i686), but smbclient4-nomskrb5 uses
heimdal, I assume the problem is within the krb5 libraries.

The only difference between frames 10 and 26 is the different length
of the authenticator.
Comment 21 Stefan Metzmacher 2010-08-17 09:14:48 EDT
Created attachment 439107 [details]
Patch to build a source3/bin/smbclient4 staticly
Comment 22 Stefan Metzmacher 2010-08-17 09:16:06 EDT
Created attachment 439108 [details]
Capture of mount.cifs and smbclient4-nomskrb5 with arcfour
Comment 23 Jeff Layton 2010-08-17 09:40:04 EDT
Nice work, Metze. Ok, moving this bug to be against the krb5 libs.
Comment 24 Jeff Layton 2010-08-17 09:46:40 EDT
Pity that wireshark isn't better able to stitch together SMB packets...

There is a difference in the length of the session setup requests. Both session setup requests have a trailing frame that wireshark ids as an "NBSS Continuation Message". The first one (from the failed session setup request) is 222 bytes long. The second is 226 bytes long.

It seems likely that the problem is related to that difference.
Comment 25 Jeff Layton 2010-08-17 09:50:25 EDT
Then again, session setup requests have a couple of UCS2 strings at the end. Those strings can be variable length, so it's not a given that the two different programs will send the exact same length session setup request.

It would be helpful to have more info from the EMC side of things. Like, what specifically does it not like about the tickets that MIT krb5 is creating?
Comment 26 Stefan Metzmacher 2010-08-17 10:04:50 EDT
I've also tested

/tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no" "--option=gensec:fake_gssapi_krb5=yes" "--option=gensec:gssapi_krb5=no"

and it also works...

I first thought that the problem might be that smbclient (samba3)
and cifs.upcall use the kerberos library directly (with selfmade gss wrapping) instead of using the gssapi wrapper, but the fake_gssapi_krb5 module should simulate that behavior...
Comment 27 Stefan Metzmacher 2010-08-17 10:07:04 EDT
from IRC:

15:48 < jlayton> metze: the length difference between those two session setup packets could have more to do with 
                 the strings at the end of the session setup packet too
15:48 < jlayton> unless I'm missing a length field in there
15:53 < metze> jlayton: no, they're not part of the authenticator
15:57 < jlayton> I didn't see an authenticator field length in there
16:00 < metze> click in ticket and compare the offsets
16:01 < metze> they start at the same offset
16:01 < metze> and then click on the authenticator 
16:01 < metze> they also start at the same offset
16:01 < metze> but have different starting bytes
16:01 < metze> the length is asn1 encoded
16:02 < metze> but the smb level security blob length is also different
16:06 < metze> 1396 of mount.cifs and 1470 of smbclient4
Comment 28 Stefan Metzmacher 2010-08-17 10:41:14 EDT
If you disable tcp checksum validation wireshark reassambles the session setup requests fine...
Comment 29 Stefan Metzmacher 2010-08-17 11:01:58 EDT
I think I found the problem, but I need to verify my guess.

Inside the authenticator we have a checksum field.

For real gssapi this checksum has type 8003 and contains things like
the channel binding and delegated credentials.

For selfmade gssapi using krb5_mk_req_extended() they checksum should
be of CKSUMTYPE_RSA_MD5(7), this is what heimdal is using and the reason why
/tmp/smbclient4-nomskrb5 '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes --option="gensec_gssapi:delegation=no" --option="gensec_gssapi:mutual=no" "--option=gensec:fake_gssapi_krb5=yes" "--option=gensec:gssapi_krb5=no"
works.

MIT uses CKSUMTYPE_HMAC_MD5(-138) and that's the reason why
smbclient '//nas-nethz-users.d.ethz.ch/share-m-$/metze' -k yes
gets rejected.

I need to modify heimdal to also use CKSUMTYPE_HMAC_MD5(-138)
and see if it also gets rejected.
Comment 30 Stefan Metzmacher 2010-08-26 10:38:16 EDT
Created attachment 441233 [details]
Patch for krb5-1.7.1 to use checksum rsa md5 (7) as heimdal and windows clients
Comment 31 Nalin Dahyabhai 2010-11-01 13:57:50 EDT
Do we know which server implementations are doing this?  It's a bit hard to tell here if we're talking about a bug in the client or bug-compatibility with a server (and if so, which one(s)?).
Comment 32 Jack Neely 2010-11-01 15:36:32 EDT
I have an EMC Celerra running revision 5.6.47.11.  (My EMC folks tell me they are planning an upgrade to 6.0.)

I'm using RHEL 6 Beta 2 with kerberos: krb5-libs-1.8.2-2.el6.x86_64

This combination results in the above errors when using kerberos authentication.
Comment 33 Jack Neely 2010-11-01 16:20:02 EDT
Created attachment 456988 [details]
patch for kerberos 1.8.2

Porting the patch to 1.8.2 took some tweaking, but I can confirm that kerberos 1.8.2 in RHEL 5 Beta 2 built with this patch works with our EMC Celerra as well as normal Windows workstations.  (Where before only Windows workstations as the CIFS server would work.)
Comment 34 Stefan Metzmacher 2010-11-02 04:39:41 EDT
It's bug-compatibility with EMC Servers.

As the client doesn't use the GSSAPI-Checksum 0x8003,
there might be some other possible fixes using
krb5_auth_con_set_req_cksumtype().

See the discussion here:
http://mailman.mit.edu/pipermail/krbdev/2010-September/thread.html#9478

But as nobody tried a krb5_auth_con_set_req_cksumtype() based workaround,
it's hard to tell if it would really fix the problem.
Comment 35 Stefan Metzmacher 2010-11-02 04:46:45 EDT
Looking at the source of smbclient, it seems that krb5_auth_con_set_req_cksumtype() is already used to trigger the GSSAPI-Checksum,
but the problem still exists without using the patched krb5 library.
(But I haven't analyzed this in detail).
Comment 36 Stefan Metzmacher 2010-12-23 12:59:05 EST
I've fixed the problem with smbclient,
see https://bugzilla.samba.org/show_bug.cgi?id=7883

I'll provide a patch for cifs-utils soon.
Comment 37 Stefan Metzmacher 2010-12-27 15:35:00 EST
The patches for cifs-utils are here:
https://bugzilla.samba.org/show_bug.cgi?id=7890
Comment 38 Bug Zapper 2011-06-01 07:44:47 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 39 Jeff Layton 2011-06-15 16:38:50 EDT
This has long since been fixed in f14 and beyond. I'm hesitant to push fixes into f13 as it's almost EOL'ed. So, I'll close this WONTFIX. Please reopen if you wish to make a case for fixing this in f13 too.