When running the cthon04 test suite on NFSv3 w/ sec=krb5i, we get a lot of messages that look like: RPC request reserved 172 but used 216 RPC request reserved 164 but used 208 RPC request reserved 172 but used 216 RPC request reserved 164 but used 208 RPC request reserved 196 but used 216 ...and eventually get an oops in svcauth_gss_release. Here's one such oops, though some other have crashed in kernel BUG at net/sunrpc/auth_gss/svcauth_gss.c:1001! invalid operand: 0000 [#1] SMP Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5 auth_rpcgss des sunrpc dm_mirror dm_mod xennet ext3 jbd xenblk sd_mod scsi_mod CPU: 0 EIP: 0061:[<e0af1b61>] Not tainted VLI EFLAGS: 00010202 (2.6.9-44.ELxenU) EIP is at svcauth_gss_release+0x130/0x301 [auth_rpcgss] eax: 00000198 ebx: d3d30044 ecx: de205f0c edx: 00000008 esi: c159cc64 edi: 00000070 ebp: 00000048 esp: de205f14 ds: 007b es: 007b ss: 0068 Process nfsd (pid: 1840, threadinfo=de205000 task=de091360) Stack: 00000010 ffffffea dae85aa0 00000001 c159cc00 d3d30048 00000068 d3d300b0 00000008 c016e2e3 d58012ac 00000000 00000070 00000070 e0c05af9 d7b799c0 e0c2df40 e0af6144 00000000 d3d30048 c159cc64 e0b5fe2b c159cc00 e0c2e404 Call Trace: [<c016e2e3>] dput+0x38/0x1ab [<e0c05af9>] fh_put+0x177/0x188 [nfsd] [<e0b5fe2b>] svc_authorise+0x1a/0x4f [sunrpc] [<e0b5da36>] svc_process+0x4b7/0x6fd [sunrpc] [<e0c03469>] nfsd+0x1ce/0x33e [nfsd] [<e0c0329b>] nfsd+0x0/0x33e [nfsd] [<c0105341>] kernel_thread_helper+0x5/0xb Code: 07 00 83 c4 10 85 c0 74 08 0f 0b e5 03 8d 28 af e0 83 7e 18 00 75 23 8b 56 0c 8d 82 90 01 00 00 3d ff 0f 00 00 77 13 85 d2 74 08 <0f> 0b e9 03 8d 28 af e0 89 f3 e9 85 00 00 00 83 7e 08 00 8d 5e ...though we had another that looked like this: Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c0233406 1de91000 -> *pde = 00000000:0e6c9001 1de95000 -> *pme = 00000000:00000000 Oops: 0002 [#1] SMP Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5 auth_rpcgss des sunrpc dm_mirror dm_mod xennet ext3 jbd xenblk sd_mod scsi_mod CPU: 0 EIP: 0061:[<c0233406>] Not tainted VLI EFLAGS: 00010246 (2.6.9-44.ELxenU) EIP is at do_tcp_sendpages+0x307/0x9e7 eax: dbc81240 ebx: dc139e40 ecx: 00000000 edx: 00000000 esi: 0000002c edi: dbc81458 ebp: 00000002 esp: ddf80e7c ds: 007b es: 007b ss: 0068 Process nfsd (pid: 1876, threadinfo=ddf80000 task=df3bd1a0) Stack: 00000000 0000002c 00000000 00000000 00000000 000005a8 00000bb8 00000000 dbc81240 0000002c 00000000 c0233b32 dbc81240 ddf80ed8 00000000 0000002c 00000000 00000000 00000000 0000002c dae94c80 e0b5e236 dae94c80 00000000 Call Trace: [<c0233b32>] tcp_sendpage+0x4c/0x5f [<e0b5e236>] svc_sendto+0x1b7/0x208 [sunrpc] [<e0b5ff7c>] auth_domain_put+0xe/0x11 [sunrpc] [<e0af1d19>] svcauth_gss_release+0x2e8/0x301 [auth_rpcgss] [<e0b5ef61>] svc_tcp_sendto+0x63/0xa9 [sunrpc] [<e0b5f5e2>] svc_send+0xbd/0xf4 [sunrpc] [<e0b5da44>] svc_process+0x4c5/0x6fd [sunrpc] [<e0c03469>] nfsd+0x1ce/0x33e [nfsd] [<e0c0329b>] nfsd+0x0/0x33e [nfsd] [<c0105341>] kernel_thread_helper+0x5/0xb Code: ff 83 c4 0c 85 c0 0f 84 5a 02 00 00 83 3c 24 00 74 15 8b 93 ac 00 00 00 0f b7 44 ea 0e 8d 04 06 66 89 44 ea 0e eb 33 8b 54 24 0c <f0> ff 42 04 8b 83 ac 00 00 00 8b 4c 24 0c 8d 04 e8 89 48 10 8d I think they're related, but it's hard to be certain. My first read of this makes me think that we aren't reserving a large enough buffer for the response when using krb5i with nfsv3, with some operations.
This problem also seems to exist on recent upstream kernels so it's probably also a problem on FC-6/RHEL-5.
Finally got a chance to start looking at this again. Mounted a fc7 rawhide server (2.6.20-2925.5.fc7xen) on a RHEL4 client: host:/export on /mnt/krb5i type nfs (rw,sec=krb5i,addr=xxx.yyy.xxx.yyy) When I do simple operations in the filesystem (ls, etc), I get messages like this: RPC request reserved 164 but used 208 RPC request reserved 164 but used 208 RPC request reserved 172 but used 216 RPC request reserved 164 but used 208 RPC request reserved 172 but used 216 RPC request reserved 164 but used 208 These all seem to be 44 bytes more than they should be. Not yet sure of the significance of 44 bytes here.
Had a look on the wire at what's happening at the time when these messages pop up: This message: RPC request reserved 164 but used 208 ...seems to correspond to a GETATTR call. On the wire, the GETATTR packets are 274 bytes total, and the RPC+NFS payload is 208 bytes. This one: RPC request reserved 172 but used 216 ...comes from an ACCESS call. On the wire, this is 282 bytes total, and the RPC+NFS payload is 216 bytes. So the kernel seems to be consistently underestimating the actual rq_reserved size that we need when using krb5i with nfs3. This code is pretty complex though, so still tracking down the actual cause...
In comparing the same ACCESS packet in a krb5 vs. a krb5i mount, the size difference is actually in the NFS section of the packet (as decoded by wireshark). The RPC sections of both are the exact same size (68 bytes). The main differences seem to be that the krb5i packet has an extra 8 bytes at the beginning of the NFS section: 4 bytes for GSS Data Length 4 bytes for GSS Sequence Number This part seems to be accounted for in the iov_len passed to svc_reserve. The part that's not accounted for is the 44 byte (!) GSS Checksum at the very end of the packet. I think that's the likely problem here...
At least one issue seems to be that after writing out the first part of the response header, svc_process() calls this: /* un-reserve some of the out-queue now that we have a * better idea of reply size */ if (procp->pc_xdrressize) svc_reserve(rqstp, procp->pc_xdrressize<<2); ...but, this does not take into account the size of the checksum at the end of the response. This is probably also a problem for krb5p. We don't seem to see this issue with NFSv4, but my guess is that's because the "reserved" size for a compound packet is likely large enough to encompass this in most cases. I did a quick hack to add 44 bytes to the end of the second argument in the call above, and that got rid of most of these errors. I still see some, so apparently there's another related issue someplace...
The other related issue is the svc_reserve calls in nfsd/nfs3proc.c and nfsd/nfsproc.c, when I add 44 bytes to the "space" arg in those calls, then the errors go away completely. My thinking at this point is to add a wrapper of some sort for svc_reserve that adds an appropriate amount for the checksum in this situation, and call that wrapper instead of svc_reserve in these 3 places.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
The wrapper seems to have taken care of the extra printk's but I'm still getting near-immediate panics when I have a RHEL4 box act as a krb5i/nfs3 server. I still need to test with Neil Brown's patch to remove the BUG in svcauth_gss_release, but I'm thinking there's more than one problem at play here. I may split this up into multiple BZ's Finally NeilB mentioned that the second panic here may be related to this: http://bugzilla.kernel.org/show_bug.cgi?id=7795 ...but he wasn't sure. I need to look into that as well...
Looking at a core from a panic with this stack trace (similar to the second one in comment #1): #5 [ffffff801c32bc40] error_exit at ffffffff8010ddc3 [exception RIP: tcp_sendpage+905] RIP: ffffffff8025a876 RSP: ffffff801c32bcf8 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffff801d2be500 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 00000000000005dc RDI: 00000000000005a8 RBP: 000000000000002c R8: 0000000000000000 R9: ffffffff8035b580 R10: ffffff801d2be500 R11: 00000000000000e4 R12: ffffff801b224100 R13: 0000000000000002 R14: ffffff801b224430 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b #6 [ffffff801c32bcf0] tcp_sendpage at ffffffff8025a5ac #7 [ffffff801c32bd80] svc_sendto at ffffffffa00a6e00 #8 [ffffff801c32be40] svc_tcp_sendto at ffffffffa00a763e #9 [ffffff801c32be60] svc_send at ffffffffa00a7c9e #10 [ffffff801c32be90] svc_process at ffffffffa00a64d8 #11 [ffffff801c32bef0] nfsd at ffffffffa015b534 #12 [ffffff801c32bf50] kernel_thread at ffffffff8010e092 I *think* we're crashing in do_tcp_sendpages, at the get_page() call here: if (can_coalesce) { skb_shinfo(skb)->frags[i - 1].size += copy; } else { get_page(page); skb_fill_page_desc(skb, i, page, offset, copy); } get_page is an inline the just increments the counter in the page, so it does seem like page is likely NULL here. The box seems to crash pretty consistently in this spot. When I add the "don't assume page data when there is tail data" patch, this crashes on the mount, whereas without that, it doesn't crash until the first access. So that seems to change the behavior somehow.
Created attachment 154924 [details] patch -- don't assume that when xdr_buf has tail data that it also has page data Tried this patch out and it actually seems to make the kernel crash sooner (on the mount rather than on first access), so I think we are dealing with a NULL payload or something along those lines...
Made some progress here. The issue (at least sometimes) seems to be that we have a non zero tail[0].iov_len but a NULL page for the tail. I added this small patch for debugging and it fired when the bug fired: --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -414,6 +418,8 @@ svc_sendto(struct svc_rqst *rqstp, struct xdr_buf *xdr) } /* send tail */ if (xdr->tail[0].iov_len) { + if (rqstp->rq_respages[rqstp->rq_restailpage] == NULL) + printk("%s: rqstp->rq_respages[rqstp->rq_restailpage]==NULL\n", __func__); result = sock->ops->sendpage(sock, rqstp->rq_respages[rqstp->rq_restailpage], ((unsigned long)xdr->tail[0].iov_base)& (PAGE_SIZE-1), xdr->tail[0].iov_len, 0); So it seems like bad tail section handling is the issue...
Got this just before the second svc_send in svc_process. This would seem to indicate that the tail length is 44 bytes (!), but the tail page is NULL. svc_process: 2 tail[0].iov_len=44, rqstp->rq_respages[rqstp->rq_restailpage] == NULL
Created attachment 155819 [details] patch -- set tail[0].iov_base to NULL in svc_process This patch seems to fix it. This went upstream as part of the patch to add krb5p to the kernel, but seems to be necessary to make sure that krb5i works as well. Without this what's happening is that we should be hitting this condition in svcauth_gss_release, but we're not: if (resbuf->tail[0].iov_base == NULL) { ...so we end up assuming that we have a page for the response tail when we actually don't. I need to do a bunch more testing with this, and I think we also will want to incorporate a couple of other patches that seem important here, but this seems to be the root cause of most of the panics.
Created attachment 155826 [details] patch -- combine several gssapi fixes This is the patch I'm currently testing. It contains the one-line patch posted earlier (which is part of the addition of rpcsec_gss privacy). That piece comes from: commit 7c9fdcfb1b64c47ed618c103b617af3f86e1239c I also rolled in this patch since it looks like it could prevent a possible bug: commit dfee55f06213a23b250ea63ba41caa461cdd5e72 Author: NeilBrown <neilb> Date: Mon Apr 10 22:55:34 2006 -0700 [PATCH] knfsd: svcrpc: gss: don't call svc_take_page unnecessarily ...and this one that removes the bogus BUG_ON() that I hit at connectathon while testing this... commit 153e44d22fb5f98198f90fbf56e89b345e48534d Author: NeilBrown <neilb> Date: Wed May 9 02:34:52 2007 -0700 knfsd: rpc: fix server-side wrapping of krb5i replies ...so far with this patch, the rhel4 server seems to be very stable when serving both nfs3 and nfs4 with krb5i. Continuing testing to make sure nothing is broken by this.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
committed in stream U6 build 55.12. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html