Bug 428720
Summary: | [RHEL5 U2] Connectathon RHEL5 client to RHEL4 server, Connectathon failure | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeff Burke <jburke> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | davem, duck, dzickus, jiali, mgahagan, nhorman, staubach, steved, tgraf |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980 | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 20:23:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 443655 | ||
Attachments: |
Description
Jeff Burke
2008-01-14 20:04:38 UTC
I'll try to set up a test rig today and see if I can run this test in a loop and reproduce this. Ran the test in a loop for several hours and never hit the problem once. Looking at the capture is interesting though. If I understand the test correctly, it creates 200 files in a directory, and then removes the files one by one, doing a readdir in between each remove and checking that the directory contents are what is expected. The capture shows everything going along well until after file.86 is removed. After that, we do a READDIR call. It looks like the first part of the READDIR response is lost, however and we get a UDP packet that contains what looks like the last part of the READDIR response (starting at ~file.145). Presumably, the client just discards this. After this are 2 retransmissions of the READDIR request, and finally a READDIR reply, which just contains the first 70 bytes of the packet. It ends just after the NFS status code in the header (which was NFS_OK). Wireshark says it's a malformed packet. Judging from the results of the test though, it looks like maybe the client accepted the packet and took it to mean that the directory was totally empty. Maybe the nfs2 XDR decoder is borked here? This packet should probably have been rejected... The NFSv2 code may not handle a borked response correctly, but the real questions are, I think, why the server generated such a packet and what happened to the rest of the first response or whether the server was doing the right things but the client mangled these two packets. This sure looks like either a lower level networking transport implementation issue or an issue where the NFS server manipulated, ie. mangled/freed/etc, the response data structures before the network transport implementation was done with them. Indeed. I'm thinking we're looking at (at least) 2 bugs here. One on the client and one on the server. I wonder if we could reproduce the server bug with pynfs or something -- make it so that the first readdir response on a directory is dropped and then see how the response looks after the request is retransmitted... After having a closer look, I think that Peter's theory about the server generating a bogus reply is probably the most likely. The UDP wrapper on the packet has the correct length field, so the packet is consistent from the standpoint of the UDP layer. The NFS field is definitely truncated though. After the status code, there's nothing. So this does seem like it's probably a server bug. One that is likely triggered by spotty network connectivity causing retransmits. On the client side, the NFS readdir xdr (nfs_xdr_readdirres) decoder has this: if ((status = ntohl(*p++))) return -nfs_stat_to_errno(status); hdrlen = (u8 *) p - (u8 *) iov->iov_base; if (iov->iov_len < hdrlen) { printk(KERN_WARNING "NFS: READDIR reply header overflowed:" "length %d > %Zu\n", hdrlen, iov->iov_len); return -errno_NFSERR_IO; } else if (iov->iov_len != hdrlen) { ... after setting "status" p should be beyond the end of the packet, and I think we would have fallen into the first case of the following if statement. Do we happen to have dmesg on the server at the time this occured? It would be good to confirm the theory since that KERN_WARNING message should have popped up. Jeff, Just a point of observation. This has never been seen at all up until recently. So far it has only been seen with a RHEL4 Server and RHEL5 Client. We have only just started testing RHEL5.2 kernels. It has never been seen with a RHEL5.1-Z Client. Jeff Sorry, my last post should have read: "Do we happen to have logs from the client when this occurred?" kernel logs, in particular... Below in the Bug Info section (and here) you will find the URL to the logs http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980 Scroll to the bottom to see the console logs Thanks. Hmm not quite what I was expecting to see, but similar: NFS: readdir reply truncated! ...so it did detect a problem with a readdir response at some point. Most likely, the problem is with the response packet we got. Here's the relevant code from nfs_xdr_readdirres: out: kunmap_atomic(kaddr, KM_USER0); return nr; short_pkt: entry[0] = entry[1] = 0; /* truncate listing ? */ if (!nr) { printk(KERN_NOTICE "NFS: readdir reply truncated!\n"); entry[1] = 1; } goto out; ...so nr was 0, which I think is the number of entries. I wonder if we should also be setting this inside that if statement? nr = -errno_NFSERR_IO; It seems like there's clearly a problem with the response. Returning zero from this function doesn't seem to be considered an error and I think it's what makes higher levels consider this to be a empty dir. Is this problem solved by DaveM's solution in bz 429417 comment #5? I think probably not (though if DaveM wants to chime in and tell me otherwise, I'd be happy to listen). As best I can tell in this case, the server sent us a bogus readdir response in an RPC retransmission, and the client accepted it as a valid response. I think this is at least 2 bugs, really: A) the server probably has a bug that causes it to send these bogus responses. Peter's theory was that the server might be prematurely releasing a buffer and that makes it send a packet with a status code but no actual entries to the client. I've been considering ways to reproduce this. We might be able to do it by selectively filtering packets. Something like this: 1) client does a readdir request 2) server sends a readdir response. We block the response here. 3) client retransmits the request, we block the retransmissions for a bit, hopefully giving the server to do whatever it does that's wrong here 4) unblock traffic -- hopefully the server will send along a truncated response ...we might even get lucky and force a crash if we turn on memory poisoning here too. B) client side bug, the capture shows the bogus readdir response. I think the client accepted the response and interpreted it to mean that the directory was empty (no entries at all, not even "." or ".."). The function that interprets this is nfs_xdr_readdirres(). The logic in this function is a bit...obtuse. This particular packet seems to be a very special case. We see this message in the client's log: NFS: readdir reply truncated! So we do know is that we ended up in this section of code: out: kunmap_atomic(kaddr, KM_USER0); return nr; short_pkt: entry[0] = entry[1] = 0; /* truncate listing ? */ if (!nr) { printk(KERN_NOTICE "NFS: readdir reply truncated!\n"); entry[1] = 1; } goto out; ...obviously, this function returned 0 here. My thinking is that we might should consider a patch like this here: diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c index 2cd35f2..e72d37b 100644 --- a/fs/nfs/nfs2xdr.c +++ b/fs/nfs/nfs2xdr.c @@ -474,6 +474,7 @@ nfs_xdr_readdirres(struct rpc_rqst *req, u32 *p, void *dummy) if (!nr) { printk(KERN_NOTICE "NFS: readdir reply truncated!\n"); entry[1] = 1; + nr = -errno_NFSERR_IO; } goto out; err_unmap: ...instead of returning 0 entries, we return -EIO and (hopefully) the request would be retransmitted. Peter, any thoughts on that patch? Or am I misinterpreting the readdir response xdr code? Well, I don't think that I would be quite so quick to dismiss the possibility that it is the bug from 429417. I don't know that it is, but it sounds like perhaps it could be. On the client side, it seems that the client is not being quite as defensive as it could be. The suggestion to return the EIO error seems reasonable to me. I could be very wrong here, but here's why I think that's not it... If I'm reading bug 429417 correctly, the problem is that the UDP layer is occasionally getting SKB's that have inconsistent size fields. I don't think that would affect the host's ability to collect captures, however. The packet capture here seem to pretty clearly indicate that the server sent the client a bad packet. I suppose the problem could ultimately be a manifestation of that problem on the server, but I'm not sure. I've not really looked at the server side of this yet... Either way, we should probably go ahead and clone this bug for RHEL4 and start treating the server side as a separate issue. Ok, I think I understand what's happening on the client side, and I think we can probably fix it, though I need to propose the patch upstream first. Before I do that, I'd like to see if I can come up with a way to reproduce this, maybe via unfsd or by patching the server's kernel to make it send these bogus responses. This is the info from the server side. We'll probably need to look at why it's sending the bogus responses as a separate issue: # uname -a Linux dell-pe840-01.rhts.boston.redhat.com 2.6.9-67.0.3.ELsmp #1 SMP Thu Jan 17 13:21:04 EST 2008 i686 i686 i386 GNU/Linux I'm convinced that the server is actually sending these bogus responses, though I don't see why right offhand. I also took a quick look at the server and don't see anything in the logs for that time. I have a patch for RHEL4 that causes it to send these packets. The packets we're seeing here are consistent with what we'd see if the server were not doing the pc_encode portion on a READDIR reply. Looking at the nfsd_dispatch function (which is essentially the same in rhel4 and upstream): /* Encode result. * For NFSv2, additional info is never returned in case of an error. */ if (!(nfserr && rqstp->rq_vers == 2)) { xdr = proc->pc_encode; if (xdr && !xdr(rqstp, nfserrp, rqstp->rq_resp)) { /* Failed to encode result. Release cache entry */ dprintk("nfsd: failed to encode result!\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); *statp = rpc_system_err; return 1; } } ...I wonder if this is correct. If nfserr is set and rq_vers is 2, then we don't try to encode the reply, but we don't do a NULL cache update either. I think that might be wrong. It seems like we should be doing a negative cache update there and that that block of code logic could use changes. As to why we only seem to see this problem on a retransmit, I'm not sure... For now, I'm going to focus on the client-side piece, I'll post a debug patch here soon... Server side bug cloned to RHEL4 bug 430946. Did some testing today with a patch that has this case return -EIO. While I think that's probably correct and will likely push it upstream, it won't make this testcase pass when the server sends one of these short packets. It does make the getdents() call return an error however, which I think is appropriate, but Peter is quite correct. We'll have to track down this problem on the server and fix it there. how is the upstream feedback on your sggested patch? I haven't pushed it upstream yet... In truth, this is really a server bug. Even if we had patched the client, it would have still failed this connectathon test. I think it's probably reasonable that we return an error in this situation, but this should be very rare. Now that I look at the code, I'll probably want to propose that we return an error for NFSv3 and v4 when it gets a packet like this. I'll need to do some testing on it once I do that to make sure that we are actually getting an error when it occurs. Either way, now that we've got this pinned down as a server-side problem, I don't think there's much urgency to patch the client-side. I'll put this on 5.3 proposed list, but it may slip to later if there's pushback from upstream. Created attachment 294274 [details]
patch -- when truncating a readdir reply and setting EOF return -EIO
This is the patch I'm considering. I'm not yet convinced that it's the right
thing to do. It seems like the current behavior in this special case is so
deliberate that there must be some reasoning behind it that I'm missing.
I'll plan to post it upstream as a RFC unless Peter or Steve want to chime in
on it -- if it's not the right thing to do, then maybe someone can clarify why
that is...
This does seem like a reasonable patch. However, is it the right thing to retry the request, which I would think would normally be the right thing to do and assume that the server will respond more properly the next time? Or is it the right thing to just return an error to the application because in this case, we "know" that the server will just continue to return the same short response. In general, bugs in implementations should not be recovered from by returning errors to the application, but in this case, I am not sure that we have too much of an option. Thanks Peter. I posted the patch upstream as an RFC. Hopefully we'll get some confirmation either way... After discussing with Trond (and others), he mentioned that the existing behavior is a workaround for some buggy servers. So we can change it but there may be consequences... I've sent a new patchset upstream that tries to preserve this workaround but should make the server return -EIO when we get an actual malformed packet. If it gets taken upstream, I'll plan to backport it for 5.3. Created attachment 303350 [details]
clean up short packet handling for NFSv2 readdir
Created attachment 303352 [details]
clean up short packet handling for NFSv3 readdir
Created attachment 303353 [details]
clean up short packet handling for NFSv4 readdir
This is the patchset sent upstream. It's currently in Bruce Fields' tree, and should be on track for 2.6.26. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |