Bug 430946

Summary:

nfs server sending short packets on nfsv2 UDP readdir replies

Product:

Red Hat Enterprise Linux 4

Reporter:

Jeff Layton <jlayton>

Component:

kernel

Assignee:

Jeff Layton <jlayton>

Status:

CLOSED ERRATA

QA Contact:

Martin Jenner <mjenner>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.6

CC:

davem, dzickus, guenter.koellner, jgiles, mgahagan, nhorman, staubach, steved, tgraf, vgoyal

Target Milestone:

beta

Keywords:

TestBlocker

Target Release:

---

Hardware:

All

OS:

Linux

URL:

http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980

Whiteboard:

Fixed In Version:

RHSA-2008-0665

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-07-24 19:26:10 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
patch -- don't cache nfsv2 readdir replies	none

Description Jeff Layton 2008-01-30 19:00:21 UTC

+++ This bug was initially created as a clone of Bug #428720 +++

Description of problem:
Connectathon NFSv2 udp base test06 connectathon fails when running against a
RHEL4.U6 server.

Version-Release number of selected component (if applicable):
2.6.18-64.el5

How reproducible:
Intermittently

Steps to Reproduce:
1. Run the Connectathon test in RHTS
2. with RHEL5.1 base distro and 2.6.18-64-el5
  
Actual results:
We saw this issue two different kernels. With -64,-67
-64 log
http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980
-67 log
http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1583755

While running the rhel4-nfs NFSv2 udp base test06 connectathon test we see these
errors:
./test6: readdir
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read '.' dir entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read '..' dir entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.87' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.88' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.89' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.90' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.91' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.92' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.93' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.94' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.95' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.96' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.97' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.98' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.99' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.100' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.101' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.102' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.103' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.104' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.105' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.106' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.107' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.108' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.109' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.110' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.111' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.112' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.113' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.114' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.115' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.116' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.117' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.118' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.119' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.120' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.121' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.122' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.123' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.124' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.125' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.126' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.127' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.128' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.129' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.130' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.131' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.132' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.133' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.134' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.135' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.136' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.137' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.138' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.139' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.140' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.141' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.142' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.143' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.144' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.145' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.146' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.147' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.148' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.149' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.150' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.151' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.152' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.153' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.154' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.155' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.156' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.157' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.158' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.159' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.160' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.161' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.162' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.163' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.164' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.165' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.166' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.167' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.168' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.169' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.170' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.171' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.172' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.173' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.174' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.175' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.176' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.177' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.178' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.179' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.180' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.181' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.182' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.183' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.184' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.185' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.186' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.187' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.188' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.189' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.190' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.191' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.192' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.193' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.194' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.195' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.196' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.197' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.198' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) didn't read expected 'file.199' dir
entry, pass 87
	./test6: (/mnt/rhel4-nfs/hp-dl585-01.test) Test failed with 115 errors
basic tests failed

Looking at the latest failure it was between:
----- start: Fri Jan 11 10:38:13 EST 2008 -----
and
----- end: Fri Jan 11 10:38:50 EST 2008 -----

Looking at the server log files during that time I see the following:
Jan 11 10:38:11 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from ixen-02.rhts.boston.redhat.com:920 for /export/home (/export/home)
Jan 11 10:38:13 intel-d3c4102-01 mountd[4041]: authenticated mount request from
hp-olympia1.rhts.boston.redhat.com:637 for /export/home (/export/home)
Jan 11 10:38:17 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from hp-matterhorn1.rhts.boston.redhat.com:787 for /export/home (/export/home)
Jan 11 10:38:20 intel-d3c4102-01 mountd[4041]: authenticated mount request from
intel-s6e5231-01.rhts.boston.redhat.com:1022 for /export/home (/export/home)
Jan 11 10:38:20 intel-d3c4102-01 mountd[4041]: authenticated mount request from
ixen-02.rhts.boston.redhat.com:600 for /export/home (/export/home)
Jan 11 10:38:26 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from sun-x4600-01.rhts.boston.redhat.com:731 for /export/home (/export/home)
Jan 11 10:38:27 intel-d3c4102-01 mountd[4041]: authenticated mount request from
hp-matterhorn1.rhts.boston.redhat.com:894 for /export/home (/export/home)
Jan 11 10:38:31 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from intel-s6e5231-01.rhts.boston.redhat.com:708 for /export/home (/export/home)
Jan 11 10:38:36 intel-d3c4102-01 mountd[4041]: authenticated mount request from
sun-x4600-01.rhts.boston.redhat.com:838 for /export/home (/export/home)
Jan 11 10:38:40 intel-d3c4102-01 mountd[4041]: authenticated mount request from
intel-s6e5231-01.rhts.boston.redhat.com:815 for /export/home (/export/home)
Jan 11 10:38:47 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from dell-pesc430-02.rhts.boston.redhat.com:939 for /export/home (/export/home)
Jan 11 10:38:48 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from hp-matterhorn1.rhts.boston.redhat.com:648 for /export/home (/export/home)
Jan 11 10:38:53 intel-d3c4102-01 mountd[4041]: authenticated unmount request
from ixen-02.rhts.boston.redhat.com:854 for /export/home (/export/home)
Jan 11 10:38:58 intel-d3c4102-01 mountd[4041]: authenticated mount request from
dell-pesc430-02.rhts.boston.redhat.com:632 for /export/home (/export/home)
Jan 11 10:38:58 intel-d3c4102-01 rpc.statd[3630]: Received erroneous SM_UNMON
request from 127.0.0.1 for 10.12.5.42
Jan 11 10:38:59 intel-d3c4102-01 mountd[4041]: authenticated mount request from
hp-matterhorn1.rhts.boston.redhat.com:756 for /export/home (/export/home)

The client ia64 that was running the test was 
hp-olympia1.rhts.boston.redhat.com has address 10.12.4.185
I did not see anything obvious that happened on the server during that moment in
time.

The client i386 that had this issue was hp-dl585-01.rhts.boston.redhat.com

Expected results:
This should pass without errors.

Additional info:
The first time it happened it was on a i386 kernel. The second time was on a
ia64 kernel.

Here is a link to a packet capture while the error happened.
http://rhts.lab.boston.redhat.com/testlogs/13197/45855/357857/rhel4-nfs_nfsvers=2_udp_base.cap.bz2

-- Additional comment from jlayton on 2008-01-15 10:03 EST --
I'll try to set up a test rig today and see if I can run this test in a loop and
reproduce this.


-- Additional comment from jlayton on 2008-01-15 14:37 EST --
Ran the test in a loop for several hours and never hit the problem once.

Looking at the capture is interesting though. If I understand the test
correctly, it creates 200 files in a directory, and then removes the files one
by one, doing a readdir in between each remove and checking that the directory
contents are what is expected.

The capture shows everything going along well until after file.86 is removed.
After that, we do a READDIR call. It looks like the first part of the READDIR
response is lost, however and we get a UDP packet that contains what looks like
the last part of the READDIR response (starting at ~file.145). Presumably, the
client just discards this.

After this are 2 retransmissions of the READDIR request, and finally a READDIR
reply, which just contains the first 70 bytes of the packet. It ends just after
the NFS status code in the header (which was NFS_OK). Wireshark says it's a
malformed packet. Judging from the results of the test though, it looks like
maybe the client accepted the packet and took it to mean that the directory was
totally empty.

Maybe the nfs2 XDR decoder is borked here? This packet should probably have been
rejected...


-- Additional comment from staubach on 2008-01-15 14:49 EST --
The NFSv2 code may not handle a borked response correctly, but the real
questions are, I think, why the server generated such a packet and what
happened to the rest of the first response or whether the server was
doing the right things but the client mangled these two packets.

This sure looks like either a lower level networking transport
implementation issue or an issue where the NFS server manipulated,
ie. mangled/freed/etc, the response data structures before the network
transport implementation was done with them.

-- Additional comment from jlayton on 2008-01-15 15:01 EST --
Indeed. I'm thinking we're looking at (at least) 2 bugs here. One on the client
and one on the server. I wonder if we could reproduce the server bug with pynfs
or something -- make it so that the first readdir response on a directory is
dropped and then see how the response looks after the request is retransmitted...


-- Additional comment from jlayton on 2008-01-16 09:34 EST --
After having a closer look, I think that Peter's theory about the server
generating a bogus reply is probably the most likely. The UDP wrapper on the
packet has the correct length field, so the packet is consistent from the
standpoint of the UDP layer. The NFS field is definitely truncated though. After
the status code, there's nothing.

So this does seem like it's probably a server bug. One that is likely triggered
by spotty network connectivity causing retransmits.

On the client side, the NFS readdir xdr (nfs_xdr_readdirres) decoder has this:

        if ((status = ntohl(*p++)))
                return -nfs_stat_to_errno(status);

        hdrlen = (u8 *) p - (u8 *) iov->iov_base;
        if (iov->iov_len < hdrlen) {
                printk(KERN_WARNING "NFS: READDIR reply header overflowed:"
                                "length %d > %Zu\n", hdrlen, iov->iov_len);
                return -errno_NFSERR_IO;
        } else if (iov->iov_len != hdrlen) {
...

after setting "status" p should be beyond the end of the packet, and I think we
would have fallen into the first case of the following if statement. Do we
happen to have dmesg on the server at the time this occured? It would be good to
confirm the theory since that KERN_WARNING message should have popped up.


-- Additional comment from jburke on 2008-01-16 10:30 EST --
Jeff,
   Just a point of observation. This has never been seen at all up until
recently. So far it has only been seen with a RHEL4 Server and RHEL5 Client.

   We have only just started testing RHEL5.2 kernels. It has never been seen
with a RHEL5.1-Z Client.

Jeff


-- Additional comment from jlayton on 2008-01-16 11:04 EST --
Sorry, my last post should have read:

"Do we happen to have logs from the client when this occurred?"

kernel logs, in particular...



-- Additional comment from dzickus on 2008-01-16 11:09 EST --
Below in the Bug Info section (and here) you will find the URL to the logs
http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1539980

Scroll to the bottom to see the console logs

-- Additional comment from jlayton on 2008-01-16 13:09 EST --
Thanks. Hmm not quite what I was expecting to see, but similar:

     NFS: readdir reply truncated!

...so it did detect a problem with a readdir response at some point. Most
likely, the problem is with the response packet we got. Here's the relevant code
from nfs_xdr_readdirres:

 out:
        kunmap_atomic(kaddr, KM_USER0);
        return nr;
 short_pkt:
        entry[0] = entry[1] = 0;
        /* truncate listing ? */
        if (!nr) {
                printk(KERN_NOTICE "NFS: readdir reply truncated!\n");
                entry[1] = 1;
        }
        goto out;


...so nr was 0, which I think is the number of entries. I wonder if we should
also be setting this inside that if statement?

    nr = -errno_NFSERR_IO;

It seems like there's clearly a problem with the response. Returning zero from
this function doesn't seem to be considered an error and I think it's what makes
higher levels consider this to be a empty dir.


-- Additional comment from dzickus on 2008-01-22 10:33 EST --

Is this problem solved by DaveM's solution in bz 429417 comment #5?

-- Additional comment from jlayton on 2008-01-22 11:09 EST --
I think probably not (though if DaveM wants to chime in and tell me otherwise,
I'd be happy to listen).

As best I can tell in this case, the server sent us a bogus readdir response in
an RPC retransmission, and the client accepted it as a valid response. I think
this is at least 2 bugs, really:

A) the server probably has a bug that causes it to send these bogus responses.
Peter's theory was that the server might be prematurely releasing a buffer and
that makes it send a packet with a status code but no actual entries to the client.

I've been considering ways to reproduce this. We might be able to do it by
selectively filtering packets. Something like this:

1) client does a readdir request
2) server sends a readdir response. We block the response here.
3) client retransmits the request, we block the retransmissions for a bit,
hopefully giving the server to do whatever it does that's wrong here
4) unblock traffic -- hopefully the server will send along a truncated response

...we might even get lucky and force a crash if we turn on memory poisoning here
too.

B) client side bug, the capture shows the bogus readdir response. I think the
client accepted the response and interpreted it to mean that the directory was
empty (no entries at all, not even "." or "..").

The function that interprets this is nfs_xdr_readdirres(). The logic in this
function is a bit...obtuse. This particular packet seems to be a very special
case. We see this message in the client's log:

     NFS: readdir reply truncated!

So we do know is that we ended up in this section of code:

 out:
        kunmap_atomic(kaddr, KM_USER0);
        return nr;
 short_pkt: 
        entry[0] = entry[1] = 0;
        /* truncate listing ? */
        if (!nr) {
                printk(KERN_NOTICE "NFS: readdir reply truncated!\n");
                entry[1] = 1;
        }
        goto out;

...obviously, this function returned 0 here. My thinking is that we might should
consider a patch like this here:

diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 2cd35f2..e72d37b 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -474,6 +474,7 @@ nfs_xdr_readdirres(struct rpc_rqst *req, u32 *p, void *dummy)
        if (!nr) {
                printk(KERN_NOTICE "NFS: readdir reply truncated!\n");
                entry[1] = 1;
+               nr = -errno_NFSERR_IO;
        }
        goto out;
 err_unmap:

...instead of returning 0 entries, we return -EIO and (hopefully) the request
would be retransmitted.

Peter, any thoughts on that patch? Or am I misinterpreting the readdir response
xdr code?


-- Additional comment from staubach on 2008-01-22 13:36 EST --
Well, I don't think that I would be quite so quick to dismiss the
possibility that it is the bug from 429417.  I don't know that it
is, but it sounds like perhaps it could be.

On the client side, it seems that the client is not being quite as
defensive as it could be.  The suggestion to return the EIO error
seems reasonable to me.

-- Additional comment from jlayton on 2008-01-22 14:21 EST --
I could be very wrong here, but here's why I think that's not it...

If I'm reading bug 429417 correctly, the problem is that the UDP layer is
occasionally getting SKB's that have inconsistent size fields. I don't think
that would affect the host's ability to collect captures, however. The packet
capture here seem to pretty clearly indicate that the server sent the client a
bad packet.

I suppose the problem could ultimately be a manifestation of that problem on the
server, but I'm not sure. I've not really looked at the server side of this yet...

Either way, we should probably go ahead and clone this bug for RHEL4 and start
treating the server side as a separate issue.


-- Additional comment from jburke on 2008-01-30 09:57 EST --
Running the 2.6.18-76.el5 kernel through rhts last night we did see this failure
again.

http://rhts.lab.boston.redhat.com/cgi-bin/rhts/test_log.cgi?id=1797087



-- Additional comment from jlayton on 2008-01-30 10:16 EST --
Ok, I think I understand what's happening on the client side, and I think we can
probably fix it, though I need to propose the patch upstream first. Before I do
that, I'd like to see if I can come up with a way to reproduce this, maybe via
unfsd or by patching the server's kernel to make it send these bogus responses.

This is the info from the server side. We'll probably need to look at why it's
sending the bogus responses as a separate issue:

# uname -a
Linux dell-pe840-01.rhts.boston.redhat.com 2.6.9-67.0.3.ELsmp #1 SMP Thu Jan 17
13:21:04 EST 2008 i686 i686 i386 GNU/Linux

I'm convinced that the server is actually sending these bogus responses, though
I don't see why right offhand. I also took a quick look at the server and don't
see anything in the logs for that time.

-- Additional comment from jlayton on 2008-01-30 13:24 EST --
I have a patch for RHEL4 that causes it to send these packets. The packets we're
seeing here are consistent with what we'd see if the server were not doing the
pc_encode portion on a READDIR reply. Looking at the nfsd_dispatch function
(which is essentially the same in rhel4 and upstream):

        /* Encode result.
         * For NFSv2, additional info is never returned in case of an error.
         */
        if (!(nfserr && rqstp->rq_vers == 2)) {
                xdr = proc->pc_encode;
                if (xdr && !xdr(rqstp, nfserrp,
                                rqstp->rq_resp)) {
                        /* Failed to encode result. Release cache entry */
                        dprintk("nfsd: failed to encode result!\n");
                        nfsd_cache_update(rqstp, RC_NOCACHE, NULL);
                        *statp = rpc_system_err;
                        return 1;
                }
        }

...I wonder if this is correct. If nfserr is set and rq_vers is 2, then we don't
try to encode the reply, but we don't do a NULL cache update either. I think
that might be wrong. It seems like we should be doing a negative cache update
there and that that block of code logic could use changes.

As to why we only seem to see this problem on a retransmit, I'm not sure...

For now, I'm going to focus on the client-side piece, I'll post a debug patch
here soon...

Comment 1 Jeff Layton 2008-01-30 22:17:13 UTC

I think it's likely that this problem occurred when we hit that stanza in
nfsd_dispatch. Most of the error messages that would indicate that are dprintk's
though.

In both captures, this seems to occur when we receive a garbled packet or two
and then ask the server to retransmit. I think what I may try to do here is to
simulate a lossy network using the --hashlimit rule in iptables on the client.
I'll have it drop packets on some interval and then run connectathon test 6 in a
loop.

With luck, maybe I can make this happen on my test rig...

Comment 2 Jeff Layton 2008-01-31 12:24:01 UTC

Wow. This turned out to be even easier to simulate than I thought. I set up the
netem scheduling discipline on the server. This does 10% packet loss between the
server and the client (10.10.10.10) :

tc qdisc add dev eth0 handle 1: root prio
tc qdisc add dev eth0 parent 1:3 handle 30: netem loss 10%
tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst
10.10.10.10/32 flowid 10:3

after that, I ran the test6 from the cthon04 test suite. It failed on the first
pass.

Looks like we have a reproducer...

Comment 3 Jeff Layton 2008-01-31 12:52:19 UTC

Added a little bit of debug code to nfsd_dispatch:

...first READDIR request:
nfsd_dispatch: vers 2 proc 16
nfsd: READDIR  32: 01010001 00000000 00102f64 a3833ac1 00000000 00000000 4096
bytes at 0
nfsd: fh_verify(32: 01010001 00000000 00102f64 a3833ac1 00000000 00000000)
nfsd: READDIR returning 0

...subsequent READDIR requests (which are satisfied from the nfsd cache):

nfsd_dispatch: vers 2 proc 16
nfsd_dispatch: vers 2 proc 16

...my guess is that the problem occurs when we have READDIR requests that are
satisfied from the cache. That's not universal though, it appears that we often
have valid replies sent from the cache though I need to confirm that.

Comment 4 Jeff Layton 2008-01-31 13:48:59 UTC

A bit more debug code:

...actual call into the VFS layer to generate new READDIR reply:

nfsd: READDIR  32: 01010001 00000000 00102f64 624e73fd 00000000 00000000 4096
bytes at 0
nfsd: fh_verify(32: 01010001 00000000 00102f64 624e73fd 00000000 00000000)
nfsd: READDIR returning 0

...we cache the reply -- only 4 bytes?

nfsd_cache_update: len=4

...second call comes in. Earlier reply was probably lost and we had to ask for
it again:

nfsd_dispatch: vers 2 proc 16

...cache_lookup sends back a DROPIT reply -- maybe different READDIR call here?
I should probably be sniffing traffic as I do this...

nfsd: nfsd_cache_lookup returning 0

...third readdir -- we find a reply in cache, and append the 4 bytes that are in
the cache to the reply iovec.

nfsd_dispatch: vers 2 proc 16
nfsd_cache_append: before iov_len=24
nfsd_cache_append: after iov_len=28
nfsd: nfsd_cache_lookup returning 1

...the problem appears to be in how the cache entries are being stored. Still
looking at the details...

Comment 5 Jeff Layton 2008-01-31 15:25:33 UTC

I think I sort of see what's happening here. The nfsd_cache stuff only seems to
actually cache the head section of the reply. If there's anything in the pages
or tail section of the xdr_buf, it doesn't seem to be written to the cache, and
I don't see any provision for recreating that info.

This caching really seems to be pretty useless for anything but tiny replies
that all fit into the head section. nfsd_cache_update has this check in it:

        /* Don't cache excessive amounts of data and XDR failures */
        if (!statp || len > (256 >> 2)) {
                rp->c_state = RC_UNUSED;
                return
        }

...I think we should probably add another condition and also not cache the data
if the xdr_buf has anything but a zero length page_len.

If I'm interpreting this correctly, it looks like upstream also has this problem.

Comment 6 Jeff Layton 2008-01-31 18:25:19 UTC

I don't seem to be able to reproduce this on a rawhide server. The nfsd_debug
output shows that READDIR requests are never satisfied from the nfsd cache,
though I'm not exactly clear yet as to why that is. I suspect that we're not
storing those replies in the cache at all for some reason, but haven't tracked
it down yet...

Comment 7 Jeff Layton 2008-01-31 19:27:39 UTC

Created attachment 293632 [details]
patch -- don't cache nfsv2 readdir replies

Upstream patch that went in soon after RHEL4's release. I haven't yet tested
it, but I'm pretty sure this will fix the problem.

Comment 9 Jeff Layton 2008-01-31 19:51:21 UTC

To reproduce this:

1) Set up a RHEL4 NFS server and export a directory:

    /export *(rw,sync,no_root_squash)

2) Set up a NFS client -- I used a RHEL5 client, but anything should do. Install
the connectathon test suite on the client. Mount the directory

    mount -t nfs -o nfsvers=2,udp,hard,intr rhel4box:/export /mnt/rhel4

3) install the connectathon test suite on the client

4) On the server run a script like this to simulate packet loss between the
client and server:

-------------[snip]---------------
#!/bin/sh

CLIENT_IP_ADDR=10.10.10.10
tc qdisc add dev eth0 handle 1: root prio
tc qdisc add dev eth0 parent 1:3 handle 30: netem loss 10%
tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst
$CLIENT_IP_ADDR/32 flowid 10:3
-------------[snip]---------------

This has the server drop 10% of the packets going from it to the client. Be sure
to set the right $CLIENT_IP_ADDR. 

(as a side note, this might be a good thing to consider in RHTS in general --
testing that simulates lossy networks)

5) run basic test6 against the server:

# cd cthon04/basic
# env NFSTESTDIR=/mnt/rhel4/cthon ./test6

...note that this much packet loss can make this take a loooooong time. This
fails pretty reliably against a server without this patch. With this patch I've
yet to see it fail.

Comment 12 Jeff Layton 2008-02-02 14:49:29 UTC

*** Bug 429734 has been marked as a duplicate of this bug. ***

Comment 13 Vivek Goyal 2008-03-06 03:31:09 UTC

Committed in 68.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 15 Jeff Layton 2008-06-03 15:07:31 UTC

*** Bug 447989 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2008-07-24 19:26:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html

Comment 18 Peter Staubach 2009-03-25 15:06:21 UTC

*** Bug 218919 has been marked as a duplicate of this bug. ***

Comment 19 Jeff Layton 2009-06-09 11:55:18 UTC

*** Bug 461008 has been marked as a duplicate of this bug. ***