227726 – Oops in svcauth_gss_release and tcp_sendpage

Bug 227726 - Oops in svcauth_gss_release and tcp_sendpage

Summary: Oops in svcauth_gss_release and tcp_sendpage

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-02-07 19:34 UTC by Jeff Layton
Modified:	2007-11-17 01:14 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2007-0791
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-15 16:19:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch -- don't assume that when xdr_buf has tail data that it also has page data (1.11 KB, patch) 2007-05-17 15:15 UTC, Jeff Layton	no flags	Details \| Diff
patch -- set tail[0].iov_base to NULL in svc_process (659 bytes, patch) 2007-05-31 14:25 UTC, Jeff Layton	no flags	Details \| Diff
patch -- combine several gssapi fixes (1.78 KB, patch) 2007-05-31 16:06 UTC, Jeff Layton	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0791	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6	2007-11-14 18:25:55 UTC

Description Jeff Layton 2007-02-07 19:34:39 UTC

When running the cthon04 test suite on NFSv3 w/ sec=krb5i, we get a lot of
messages that look like:

RPC request reserved 172 but used 216
RPC request reserved 164 but used 208
RPC request reserved 172 but used 216
RPC request reserved 164 but used 208
RPC request reserved 196 but used 216

...and eventually get an oops in svcauth_gss_release. Here's one such oops,
though some other have crashed in 

kernel BUG at net/sunrpc/auth_gss/svcauth_gss.c:1001!
invalid operand: 0000 [#1]
SMP 
Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5
auth_rpcgss des sunrpc dm_mirror dm_mod xennet ext3 jbd xenblk sd_mod scsi_mod
CPU:    0
EIP:    0061:[<e0af1b61>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-44.ELxenU) 
EIP is at svcauth_gss_release+0x130/0x301 [auth_rpcgss]
eax: 00000198   ebx: d3d30044   ecx: de205f0c   edx: 00000008
esi: c159cc64   edi: 00000070   ebp: 00000048   esp: de205f14
ds: 007b   es: 007b   ss: 0068
Process nfsd (pid: 1840, threadinfo=de205000 task=de091360)
Stack: 00000010 ffffffea dae85aa0 00000001 c159cc00 d3d30048 00000068 d3d300b0 
       00000008 c016e2e3 d58012ac 00000000 00000070 00000070 e0c05af9 d7b799c0 
       e0c2df40 e0af6144 00000000 d3d30048 c159cc64 e0b5fe2b c159cc00 e0c2e404 
Call Trace:
 [<c016e2e3>] dput+0x38/0x1ab
 [<e0c05af9>] fh_put+0x177/0x188 [nfsd]
 [<e0b5fe2b>] svc_authorise+0x1a/0x4f [sunrpc]
 [<e0b5da36>] svc_process+0x4b7/0x6fd [sunrpc]
 [<e0c03469>] nfsd+0x1ce/0x33e [nfsd]
 [<e0c0329b>] nfsd+0x0/0x33e [nfsd]
 [<c0105341>] kernel_thread_helper+0x5/0xb
Code: 07 00 83 c4 10 85 c0 74 08 0f 0b e5 03 8d 28 af e0 83 7e 18 00 75 23 8b 56
0c 8d 82 90 01 00 00 3d ff 0f 00 00 77 13 85 d2 74 08 <0f> 0b e9 03 8d 28 af e0
89 f3 e9 85 00 00 00 83 7e 08 00 8d 5e 


...though we had another that looked like this:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c0233406
1de91000 -> *pde = 00000000:0e6c9001
1de95000 -> *pme = 00000000:00000000
Oops: 0002 [#1]
SMP 
Modules linked in: nfsd exportfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5
auth_rpcgss des sunrpc dm_mirror dm_mod xennet ext3 jbd xenblk sd_mod scsi_mod
CPU:    0
EIP:    0061:[<c0233406>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-44.ELxenU) 
EIP is at do_tcp_sendpages+0x307/0x9e7
eax: dbc81240   ebx: dc139e40   ecx: 00000000   edx: 00000000
esi: 0000002c   edi: dbc81458   ebp: 00000002   esp: ddf80e7c
ds: 007b   es: 007b   ss: 0068
Process nfsd (pid: 1876, threadinfo=ddf80000 task=df3bd1a0)
Stack: 00000000 0000002c 00000000 00000000 00000000 000005a8 00000bb8 00000000 
       dbc81240 0000002c 00000000 c0233b32 dbc81240 ddf80ed8 00000000 0000002c 
       00000000 00000000 00000000 0000002c dae94c80 e0b5e236 dae94c80 00000000 
Call Trace:
 [<c0233b32>] tcp_sendpage+0x4c/0x5f
 [<e0b5e236>] svc_sendto+0x1b7/0x208 [sunrpc]
 [<e0b5ff7c>] auth_domain_put+0xe/0x11 [sunrpc]
 [<e0af1d19>] svcauth_gss_release+0x2e8/0x301 [auth_rpcgss]
 [<e0b5ef61>] svc_tcp_sendto+0x63/0xa9 [sunrpc]
 [<e0b5f5e2>] svc_send+0xbd/0xf4 [sunrpc]
 [<e0b5da44>] svc_process+0x4c5/0x6fd [sunrpc]
 [<e0c03469>] nfsd+0x1ce/0x33e [nfsd]
 [<e0c0329b>] nfsd+0x0/0x33e [nfsd]
 [<c0105341>] kernel_thread_helper+0x5/0xb
Code: ff 83 c4 0c 85 c0 0f 84 5a 02 00 00 83 3c 24 00 74 15 8b 93 ac 00 00 00 0f
b7 44 ea 0e 8d 04 06 66 89 44 ea 0e eb 33 8b 54 24 0c <f0> ff 42 04 8b 83 ac 00
00 00 8b 4c 24 0c 8d 04 e8 89 48 10 8d 

I think they're related, but it's hard to be certain. My first read of this
makes me think that we aren't reserving a large enough buffer for the response
when using krb5i with nfsv3, with some operations.

Comment 1 Jeff Layton 2007-02-08 01:15:10 UTC

This problem also seems to exist on recent upstream kernels so it's probably
also a problem on FC-6/RHEL-5.

Comment 2 Jeff Layton 2007-04-18 17:49:17 UTC

Finally got a chance to start looking at this again. Mounted a fc7 rawhide
server (2.6.20-2925.5.fc7xen) on a RHEL4 client:

host:/export on /mnt/krb5i type nfs (rw,sec=krb5i,addr=xxx.yyy.xxx.yyy)

When I do simple operations in the filesystem (ls, etc), I get messages like this:

RPC request reserved 164 but used 208
RPC request reserved 164 but used 208
RPC request reserved 172 but used 216
RPC request reserved 164 but used 208
RPC request reserved 172 but used 216
RPC request reserved 164 but used 208

These all seem to be 44 bytes more than they should be. Not yet sure of the
significance of 44 bytes here.

Comment 3 Jeff Layton 2007-04-19 18:35:24 UTC

Had a look on the wire at what's happening at the time when these messages pop up:

This message:

RPC request reserved 164 but used 208

...seems to correspond to a GETATTR call. On the wire, the GETATTR packets are
274 bytes total, and the RPC+NFS payload is 208 bytes.

This one:

RPC request reserved 172 but used 216

...comes from an ACCESS call. On the wire, this is 282 bytes total, and the
RPC+NFS payload is 216 bytes.

So the kernel seems to be consistently underestimating the actual rq_reserved
size that we need when using krb5i with nfs3.

This code is pretty complex though, so still tracking down the actual cause...

Comment 4 Jeff Layton 2007-04-20 11:11:34 UTC

In comparing the same ACCESS packet in a krb5 vs. a krb5i mount, the size
difference is actually in the NFS section of the packet (as decoded by
wireshark). The RPC sections of both are the exact same size (68 bytes). The
main differences seem to be that the krb5i packet has an extra 8 bytes at the
beginning of the NFS section:

4 bytes for GSS Data Length
4 bytes for GSS Sequence Number

This part seems to be accounted for in the iov_len passed to svc_reserve. The
part that's not accounted for is the 44 byte (!) GSS Checksum at the very end of
the packet.

I think that's the likely problem here...

Comment 5 Jeff Layton 2007-04-20 13:02:35 UTC

At least one issue seems to be that after writing out the first part of the
response header, svc_process() calls this:

        /* un-reserve some of the out-queue now that we have a 
         * better idea of reply size
         */
        if (procp->pc_xdrressize)
                svc_reserve(rqstp, procp->pc_xdrressize<<2);

...but, this does not take into account the size of the checksum at the end of
the response. This is probably also a problem for krb5p. We don't seem to see
this issue with NFSv4, but my guess is that's because the "reserved" size for a
compound packet is likely large enough to encompass this in most cases.

I did a quick hack to add 44 bytes to the end of the second argument in the call
above, and that got rid of most of these errors. I still see some, so apparently
there's another related issue someplace...

Comment 6 Jeff Layton 2007-04-20 13:20:16 UTC

The other related issue is the svc_reserve calls in nfsd/nfs3proc.c and
nfsd/nfsproc.c, when I add 44 bytes to the "space" arg in those calls, then the
errors go away completely.

My thinking at this point is to add a wrapper of some sort for svc_reserve that
adds an appropriate amount for the checksum in this situation, and call that
wrapper instead of svc_reserve in these 3 places.

Comment 7 RHEL Program Management 2007-05-09 07:52:04 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Jeff Layton 2007-05-16 21:08:58 UTC

The wrapper seems to have taken care of the extra printk's but I'm still getting
near-immediate panics when I have a RHEL4 box act as a krb5i/nfs3 server. I
still need to test with Neil Brown's patch to remove the BUG in
svcauth_gss_release, but I'm thinking there's more than one problem at play
here. I may split this up into multiple BZ's

Finally NeilB mentioned that the second panic here may be related to this:

http://bugzilla.kernel.org/show_bug.cgi?id=7795

...but he wasn't sure. I need to look into that as well...

Comment 9 Jeff Layton 2007-05-17 15:12:01 UTC

Looking at a core from a panic with this stack trace (similar to the second one
in comment #1):

 #5 [ffffff801c32bc40] error_exit at ffffffff8010ddc3
    [exception RIP: tcp_sendpage+905]
    RIP: ffffffff8025a876  RSP: ffffff801c32bcf8  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffffff801d2be500  RCX: 0000000000000000
    RDX: 0000000000000002  RSI: 00000000000005dc  RDI: 00000000000005a8
    RBP: 000000000000002c   R8: 0000000000000000   R9: ffffffff8035b580
    R10: ffffff801d2be500  R11: 00000000000000e4  R12: ffffff801b224100
    R13: 0000000000000002  R14: ffffff801b224430  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: e030  SS: e02b
 #6 [ffffff801c32bcf0] tcp_sendpage at ffffffff8025a5ac
 #7 [ffffff801c32bd80] svc_sendto at ffffffffa00a6e00
 #8 [ffffff801c32be40] svc_tcp_sendto at ffffffffa00a763e
 #9 [ffffff801c32be60] svc_send at ffffffffa00a7c9e
#10 [ffffff801c32be90] svc_process at ffffffffa00a64d8
#11 [ffffff801c32bef0] nfsd at ffffffffa015b534
#12 [ffffff801c32bf50] kernel_thread at ffffffff8010e092

I *think* we're crashing in do_tcp_sendpages, at the get_page() call here:

                if (can_coalesce) {
                        skb_shinfo(skb)->frags[i - 1].size += copy;
                } else {
                        get_page(page);
                        skb_fill_page_desc(skb, i, page, offset, copy);
                }

get_page is an inline the just increments the counter in the page, so it does
seem like page is likely NULL here. The box seems to crash pretty consistently
in this spot.

When I add the "don't assume page data when there is tail data" patch, this
crashes on the mount, whereas without that, it doesn't crash until the first
access. So that seems to change the behavior somehow.

Comment 10 Jeff Layton 2007-05-17 15:15:49 UTC

Created attachment 154924 [details]
patch -- don't assume that when xdr_buf has tail data that it also has page data

Tried this patch out and it actually seems to make the kernel crash sooner (on
the mount rather than on first access), so I think we are dealing with a NULL
payload or something along those lines...

Comment 11 Jeff Layton 2007-05-21 15:32:48 UTC

Made some progress here. The issue (at least sometimes) seems to be that we have
a non zero tail[0].iov_len but a NULL page for the tail. I added this small
patch for debugging and it fired when the bug fired:

--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -414,6 +418,8 @@ svc_sendto(struct svc_rqst *rqstp, struct xdr_buf *xdr)
        }
        /* send tail */
        if (xdr->tail[0].iov_len) {
+               if (rqstp->rq_respages[rqstp->rq_restailpage] == NULL)
+                       printk("%s:
rqstp->rq_respages[rqstp->rq_restailpage]==NULL\n", __func__);
                result = sock->ops->sendpage(sock,
rqstp->rq_respages[rqstp->rq_restailpage], 
                                             ((unsigned
long)xdr->tail[0].iov_base)& (PAGE_SIZE-1),
                                             xdr->tail[0].iov_len, 0);


So it seems like bad tail section handling is the issue...

Comment 12 Jeff Layton 2007-05-21 16:12:51 UTC

Got this just before the second svc_send in svc_process. This would seem to
indicate that the tail length is 44 bytes (!), but the tail page is NULL.

svc_process: 2 tail[0].iov_len=44, rqstp->rq_respages[rqstp->rq_restailpage] == NULL

Comment 13 Jeff Layton 2007-05-31 14:25:43 UTC

Created attachment 155819 [details]
patch -- set tail[0].iov_base to NULL in svc_process

This patch seems to fix it. This went upstream as part of the patch to add
krb5p to the kernel, but seems to be necessary to make sure that krb5i works as
well.

Without this what's happening is that we should be hitting this condition in
svcauth_gss_release, but we're not:

 if (resbuf->tail[0].iov_base == NULL) {

...so we end up assuming that we have a page for the response tail when we
actually don't.

I need to do a bunch more testing with this, and I think we also will want to
incorporate a couple of other patches that seem important here, but this seems
to be the root cause of most of the panics.

Comment 14 Jeff Layton 2007-05-31 16:06:34 UTC

Created attachment 155826 [details]
patch -- combine several gssapi fixes

This is the patch I'm currently testing. It contains the one-line patch posted
earlier (which is part of the addition of rpcsec_gss privacy). That piece comes
from:

commit 7c9fdcfb1b64c47ed618c103b617af3f86e1239c

I also rolled in this patch since it looks like it could prevent a possible
bug:

commit dfee55f06213a23b250ea63ba41caa461cdd5e72
Author: NeilBrown <neilb>
Date:	Mon Apr 10 22:55:34 2006 -0700

    [PATCH] knfsd: svcrpc: gss: don't call svc_take_page unnecessarily

...and this one that removes the bogus BUG_ON() that I hit at connectathon
while testing this...

commit 153e44d22fb5f98198f90fbf56e89b345e48534d
Author: NeilBrown <neilb>
Date:	Wed May 9 02:34:52 2007 -0700

    knfsd: rpc: fix server-side wrapping of krb5i replies


...so far with this patch, the rhel4 server seems to be very stable when
serving both nfs3 and nfs4 with krb5i. Continuing testing to make sure nothing
is broken by this.

Comment 16 RHEL Program Management 2007-05-31 17:01:56 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 17 Jason Baron 2007-06-25 20:31:03 UTC

committed in stream U6 build 55.12. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 20 errata-xmlrpc 2007-11-15 16:19:45 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.