Bug 562807
Summary: | secure nfs mount sec=krb5 fails in Fedora 12 | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Young <m.a.young> | |
Component: | libtirpc | Assignee: | Steve Dickson <steved> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 12 | CC: | chuck.lever, jlayton, k.georgiou, steved | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | libtirpc-0.2.1-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 619792 (view as bug list) | Environment: | ||
Last Closed: | 2010-05-19 11:50:02 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 619792 | |||
Attachments: |
Description
Michael Young
2010-02-08 13:04:22 UTC
Going back through the f12 versions (that are still available from koji) 1.2.0-1.f12 works, 1.2.0-5.f12 doesn't. I have done a bit of experimenting and the problem seems to have been introduced when --enable-tirpc was made the default. If I add --disable-tirpc to the spec file of nfs-utils-1.2.1-5.fc12 and build and install an RPM then it works, however the nfs-utils-1.2.1-5.fc12 RPM from updates-testing doesn't. I'll test this as soon as I'm able. One question -- are any messages logged to syslog during these mount attempts? Even better might be to run gssd in the foreground in debug mode and see whether it prints out anything suspicious: # service rpcgssd stop # rpc.gssd -f -vvvvv ...attempt the mount in another shell, then kill gssd and copy the output to a file. That might help point out where the problem is. So far, this works for me. Client and server are both f12, both using nfs-utils-1.2.1-4.fc12. You'll probably need debug output from gssd to understand what's happening here. Created attachment 397288 [details]
rpc.gssd log from failed attempt
Here are the logs (slightly anonymized). I had already looked at this but I didn't think they were very informative.
This is failing: auth = authgss_create_default(rpc_clnt, clp->servicename, &sec); if (!auth) { /* Our caller should print appropriate message */ printerr(2, "WARNING: Failed to create %s context for " "user with uid %d for server %s\n", (authtype == AUTHTYPE_KRB5 ? "krb5":"spkm3"), ...though it's not clear to me why it's failing for you and not me. I'll have to look and see what sort of logging we can get out of libtirpc to diagnose this. Created attachment 397327 [details]
patch -- have gssd print rpc_createerr when auth_gss creation fails
Here's an initial patch that might help point us in the right direction. Tested for compilation only. You'll want to apply this patch to the nfs-utils sources and rebuild gssd (or maybe just build a new package with the patch).
Then, run gssd in foreground debug mode again and reattempt the mount. With luck, we'll get a bit more info when that error message prints. If that doesn't help then we may need to rebuild libtirpc with -DDEBUG and see whether that gives us more info.
It returns RPC: Success which still isn't very helpful. Actually that doesn't surprise me as we know why the remote end rejects the call, it receives a malformed packet. The question is why libtirpc is malforming the packet by not attaching the GSS token. Created attachment 397384 [details]
rpc.gssd log with libtirpc debugging turned on
This is the log with libtirpc debugging turned on. I have not had a chance to analyze it much yet.
rpcsec_gss: in authgss_marshal() rpcsec_gss: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0) rpcsec_gss: xdr_rpc_gss_init_args: encode failure (token 0x1992e30:1221) I have a hunch that I know what this is... From your logs it looks like you're using AD as a KDC. This is fine, but one thing about AD is that it puts extra authorization info into krb5 tickets (the PAC -- privilege access certificate). They can grow to be quite large (on the order of 64k). xdr_rpc_gss_init_args does this: xdr_stat = xdr_bytes(xdrs, (char **)&p->value, (u_int *)&p->length, MAX_NETOBJ_SZ); ...and... #define MAX_NETOBJ_SZ 1024 I suspect that the tickets from your AD server are larger than 1k and that's causing this to fail. What might be interesting is to increase this value and then rebuild tirpc and see if that works around the problem. A real fix will probably mean inlining the bytes, but we'll need to go over this carefully to be sure it out to be sure. Here's what I'd do: Try a mount, let it fail stat /tmp/krb5cc_machine_MDS.AD.DUR.AC.UK ...then increase MAX_NETOBJ_SZ to something bigger than the size of the credcache. I haven't surveyed this code fully, so I don't know whether a really big MAX_NETOBJ_SZ is ok, but it's worth a shot. (In reply to comment #10) > Here's what I'd do: > > Try a mount, let it fail > stat /tmp/krb5cc_machine_MDS.AD.DUR.AC.UK > > ...then increase MAX_NETOBJ_SZ to something bigger than the size of the > credcache. > > I haven't surveyed this code fully, so I don't know whether a really big > MAX_NETOBJ_SZ is ok, but it's worth a shot. I haven't looked at this code, but do note that a netobj is a well-known XDR type which is never larger than 1024, so I don't think the value of that constant should be changed. If the argument being marshalled can be larger than 1024, the use of MAX_NETOBJ_SZ for the maximum size of that particular argument is not appropriate. I tried increasing MAX_NETOBJ_SZ in two steps. Firstly we know from the logs how big the packet that fails was (1221 bytes) so I increased MAX_NETOBJ_SZ to 1280. That allowed me to mount the filesystem but not to access it. This is because the user tickets seem to be a bit bigger. Thus I increased it further to 1536 and I was then able to access the files. For reference /tmp/krb5cc_machine_MDS.AD.DUR.AC.UK is 2325 bytes and the user krb5cc file 2599 bytes, somewhat larger than the packets actually sent because they contain a krbtgt ticket as well as the ticket for the file server. Ok, that's good news. Yep, I knew that we'd have more than one ticket there, but figured you wouldn't need larger than that. Regarding Chuck's comment -- I'm not planning to propose that as a fix. It was simply a way to check to see whether the problem is what I think it is. From what I can tell, librpcsecgss inlines the service ticket rather than copying in the bytes, but I need to look over this code more closely and see what the proper fix should be. Changing this to a libtirpc bug since that's where the problem seems to be. Created attachment 397659 [details]
patch -- allow larger ticket sizes with auth_gss
Here's an initial (untested) patch that I think will fix this issue the correct way. It also "backports" a number of other fixes that went into librpcsecgss. Please test this patch if you're able and let me know if it fixes the problem.
Chuck, any comments?
(In reply to comment #15) > Chuck, any comments? I don't have any immediate objections, but you should have Kevin Coffman review this fix. Good idea. If it tests out ok, I'll cc him when I send it out to the list. (In reply to comment #15) > Created an attachment (id=397659) [details] > patch -- allow larger ticket sizes with auth_gss > > Here's an initial (untested) patch that I think will fix this issue the correct > way. It also "backports" a number of other fixes that went into librpcsecgss. > Please test this patch if you're able and let me know if it fixes the problem. Yes, with the patch it builds and works for me. I can mount the filesystem and view and write to files and directories within it. The patch has been pushed to mainline libtirpc. Reassigning to steved so he can work out how to release the fix. |