Bug 430946
Summary: | nfs server sending short packets on nfsv2 UDP readdir replies | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jeff Layton <jlayton> | ||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.6 | CC: | davem, dzickus, guenter.koellner, jgiles, mgahagan, nhorman, staubach, steved, tgraf, vgoyal | ||||
Target Milestone: | beta | Keywords: | TestBlocker | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
URL: | http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980 | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-07-24 19:26:10 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jeff Layton
2008-01-30 19:00:21 UTC
I think it's likely that this problem occurred when we hit that stanza in nfsd_dispatch. Most of the error messages that would indicate that are dprintk's though. In both captures, this seems to occur when we receive a garbled packet or two and then ask the server to retransmit. I think what I may try to do here is to simulate a lossy network using the --hashlimit rule in iptables on the client. I'll have it drop packets on some interval and then run connectathon test 6 in a loop. With luck, maybe I can make this happen on my test rig... Wow. This turned out to be even easier to simulate than I thought. I set up the netem scheduling discipline on the server. This does 10% packet loss between the server and the client (10.10.10.10) : tc qdisc add dev eth0 handle 1: root prio tc qdisc add dev eth0 parent 1:3 handle 30: netem loss 10% tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 10.10.10.10/32 flowid 10:3 after that, I ran the test6 from the cthon04 test suite. It failed on the first pass. Looks like we have a reproducer... Added a little bit of debug code to nfsd_dispatch: ...first READDIR request: nfsd_dispatch: vers 2 proc 16 nfsd: READDIR 32: 01010001 00000000 00102f64 a3833ac1 00000000 00000000 4096 bytes at 0 nfsd: fh_verify(32: 01010001 00000000 00102f64 a3833ac1 00000000 00000000) nfsd: READDIR returning 0 ...subsequent READDIR requests (which are satisfied from the nfsd cache): nfsd_dispatch: vers 2 proc 16 nfsd_dispatch: vers 2 proc 16 ...my guess is that the problem occurs when we have READDIR requests that are satisfied from the cache. That's not universal though, it appears that we often have valid replies sent from the cache though I need to confirm that. A bit more debug code: ...actual call into the VFS layer to generate new READDIR reply: nfsd: READDIR 32: 01010001 00000000 00102f64 624e73fd 00000000 00000000 4096 bytes at 0 nfsd: fh_verify(32: 01010001 00000000 00102f64 624e73fd 00000000 00000000) nfsd: READDIR returning 0 ...we cache the reply -- only 4 bytes? nfsd_cache_update: len=4 ...second call comes in. Earlier reply was probably lost and we had to ask for it again: nfsd_dispatch: vers 2 proc 16 ...cache_lookup sends back a DROPIT reply -- maybe different READDIR call here? I should probably be sniffing traffic as I do this... nfsd: nfsd_cache_lookup returning 0 ...third readdir -- we find a reply in cache, and append the 4 bytes that are in the cache to the reply iovec. nfsd_dispatch: vers 2 proc 16 nfsd_cache_append: before iov_len=24 nfsd_cache_append: after iov_len=28 nfsd: nfsd_cache_lookup returning 1 ...the problem appears to be in how the cache entries are being stored. Still looking at the details... I think I sort of see what's happening here. The nfsd_cache stuff only seems to actually cache the head section of the reply. If there's anything in the pages or tail section of the xdr_buf, it doesn't seem to be written to the cache, and I don't see any provision for recreating that info. This caching really seems to be pretty useless for anything but tiny replies that all fit into the head section. nfsd_cache_update has this check in it: /* Don't cache excessive amounts of data and XDR failures */ if (!statp || len > (256 >> 2)) { rp->c_state = RC_UNUSED; return } ...I think we should probably add another condition and also not cache the data if the xdr_buf has anything but a zero length page_len. If I'm interpreting this correctly, it looks like upstream also has this problem. I don't seem to be able to reproduce this on a rawhide server. The nfsd_debug output shows that READDIR requests are never satisfied from the nfsd cache, though I'm not exactly clear yet as to why that is. I suspect that we're not storing those replies in the cache at all for some reason, but haven't tracked it down yet... Created attachment 293632 [details]
patch -- don't cache nfsv2 readdir replies
Upstream patch that went in soon after RHEL4's release. I haven't yet tested
it, but I'm pretty sure this will fix the problem.
To reproduce this: 1) Set up a RHEL4 NFS server and export a directory: /export *(rw,sync,no_root_squash) 2) Set up a NFS client -- I used a RHEL5 client, but anything should do. Install the connectathon test suite on the client. Mount the directory mount -t nfs -o nfsvers=2,udp,hard,intr rhel4box:/export /mnt/rhel4 3) install the connectathon test suite on the client 4) On the server run a script like this to simulate packet loss between the client and server: -------------[snip]--------------- #!/bin/sh CLIENT_IP_ADDR=10.10.10.10 tc qdisc add dev eth0 handle 1: root prio tc qdisc add dev eth0 parent 1:3 handle 30: netem loss 10% tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst $CLIENT_IP_ADDR/32 flowid 10:3 -------------[snip]--------------- This has the server drop 10% of the packets going from it to the client. Be sure to set the right $CLIENT_IP_ADDR. (as a side note, this might be a good thing to consider in RHTS in general -- testing that simulates lossy networks) 5) run basic test6 against the server: # cd cthon04/basic # env NFSTESTDIR=/mnt/rhel4/cthon ./test6 ...note that this much packet loss can make this take a loooooong time. This fails pretty reliably against a server without this patch. With this patch I've yet to see it fail. *** Bug 429734 has been marked as a duplicate of this bug. *** Committed in 68.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ *** Bug 447989 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html *** Bug 218919 has been marked as a duplicate of this bug. *** *** Bug 461008 has been marked as a duplicate of this bug. *** |