Bug 218919 - NFS Readdir Call failes occassionally
Summary: NFS Readdir Call failes occassionally
Keywords:
Status: CLOSED DUPLICATE of bug 430946
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Staubach
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-12-08 14:02 UTC by Guenter Koellner
Modified: 2009-03-25 15:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-03-25 15:06:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Comparison of the partly failing READDIR CALL (left) and the always working one (right) (80.03 KB, image/gif)
2006-12-08 14:07 UTC, Guenter Koellner
no flags Details
Comparison of the READDIR REPLY (left) and the correct answer (right) (107.92 KB, image/gif)
2006-12-08 14:08 UTC, Guenter Koellner
no flags Details

Description Guenter Koellner 2006-12-08 14:02:34 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ICM60; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

Description of problem:
We have a client running Montavista Carrier Grade Linux that is mounting a directory IN ITS BOOT PHASE on a Redhat Linux server. In about 10% of the tries the client reports a "NFS READDIR TRUNCATED" and does, of course, not mount the directory.

We found that the NFS commands of the Montavista Client do not contain a field named "Auxiliary GID"-"GID". The NFS client accepts this message, but somehow the internal readdir crashes in about 10% of cases.

Detailed description - long because it contains the Ethereal traces. See also the two GIF pictures in above web page which show the comparison of the two traces in a better way (will be loaded in about 6h)

Using ethereal we found that our client while booting is sending his NFS READDIR CALL like this:
-----
No.     Time        Source                Destination           Protocol Info
     79 0.003147    126.0.17.44           126.0.253.253         NFS      V2 READDIR Call (Reply In 80), FH:0xdb03a6ea

Frame 79 (150 bytes on wire, 150 bytes captured)
    Packet Length: 150 bytes
    Capture Length: 150 bytes
    Protocols in frame: eth:ip:udp:rpc
Ethernet II, Src: Intel_41:37:79 (00:07:e9:41:37:79), Dst: Intel_9c:5d:24 (00:0e:0c:9c:5d:24)
    Type: IP (0x0800)
Internet Protocol, Src: 126.0.17.44 (126.0.17.44), Dst: 126.0.253.253 (126.0.253.253)
User Datagram Protocol, Src Port: 800 (800), Dst Port: nfsd (2049)
    Source port: 800 (800)
    Destination port: nfsd (2049)
    Length: 116
    Checksum: 0x3cec [correct]
Remote Procedure Call, Type:Call XID:0x0001f002
    XID: 0x1f002 (126978)
    Message Type: Call (0)
    RPC Version: 2
    Program: NFS (100003)
    Program Version: 2
    Procedure: READDIR (16)
    The reply to this request is in frame 80
    Credentials
        Flavor: AUTH_UNIX (1)
        Length: 28
        Stamp: 0x0000001f
        Machine Name: (none)
            length: 6
            contents: (none)
            fill bytes: opaque data
        UID: 0
        GID: 0
>>>>>   Auxiliary GIDs
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
Network File System, READDIR Call FH:0xdb03a6ea
    Program Version: 2
    V2 Procedure: READDIR (16)
    dir
        hash: 0xdb03a6ea
        type: unknown
        data: 01000001000300010C70060068AF0600
              DA2DCB77000000000000000000000000
    Cookie: 0
    Count: 4096

0000  00 0e 0c 9c 5d 24 00 07 e9 41 37 79 08 00 45 00   ....]$...A7y..E.
0010  00 88 00 00 40 00 40 11 2f 3b 7e 00 11 2c 7e 00   ....@.@./;~..,~.
0020  fd fd 03 20 08 01 00 74 3c ec 00 01 f0 02 00 00   ... ...t<.......
0030  00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00   ................
0040  00 10 00 00 00 01 00 00 00 1c 00 00 00 1f 00 00   ................
0050  00 06 28 6e 6f 6e 65 29 00 00 00 00 00 00 00 00   ..(none)........
0060  00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00   ................
0070  00 01 00 03 00 01 0c 70 06 00 68 af 06 00 da 2d   .......p..h....-
0080  cb 77 00 00 00 00 00 00 00 00 00 00 00 00 00 00   .w..............
0090  00 00 00 00 10 00                                 ......
-----
The errored reply (in about 10% of the cases) is like this

No.     Time        Source                Destination           Protocol Info
     80 0.000038    126.0.253.253         126.0.17.44           NFS      V2 READDIR Reply (Call In 79)[Malformed Packet]

Frame 80 (70 bytes on wire, 70 bytes captured)
    Packet Length: 70 bytes
    Capture Length: 70 bytes
    Protocols in frame: eth:ip:udp:rpc
Ethernet II, Src: Intel_9c:5d:24 (00:0e:0c:9c:5d:24), Dst: Intel_41:37:79 (00:07:e9:41:37:79)
    Type: IP (0x0800)
Internet Protocol, Src: 126.0.253.253 (126.0.253.253), Dst: 126.0.17.44 (126.0.17.44)
User Datagram Protocol, Src Port: nfsd (2049), Dst Port: 800 (800)
    Source port: nfsd (2049)
    Destination port: 800 (800)
    Length: 36
    Checksum: 0x0b60 [incorrect, should be 0xf955]
Remote Procedure Call, Type:Reply XID:0x0001f002
    XID: 0x1f002 (126978)
    Message Type: Reply (1)
    Program: NFS (100003)
    Program Version: 2
    Procedure: READDIR (16)
    Reply State: accepted (0)
    This is a reply to a request in frame 79
    Time from request: 0.000038000 seconds
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
    Accept State: RPC executed successfully (0)
Network File System, READDIR Reply
    Program Version: 2
    V2 Procedure: READDIR (16)
    Status: NFS_OK (0)
[Malformed Packet: NFS]

0000  00 07 e9 41 37 79 00 0e 0c 9c 5d 24 08 00 45 00   ...A7y....]$..E.
0010  00 38 2d 8e 40 00 40 11 01 fd 7e 00 fd fd 7e 00   .8-.@.@...~...~.
0020  11 2c 08 01 03 20 00 24 0b 60 00 01 f0 02 00 00   .,... .$.`......
0030  00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0040  00 00 00 00 00 00                                 ......

-----
After doing "echo 1023 >> /proc/sys/sunrpc/nfsd_debug" we see this in /var/log/messages that the malformed packet is already caused in the NFS server:
Dec 5 11:11:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 RNC_01_12_00_015_C1_01_RNL/rnld, ino=422431)
Dec 5 11:11:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/*, ino=0)

-----

After the system has booted, we manually mount the directory, which always works. Surprisingly the message then contains another field "Auxiliary GIDs"-"GID" containing 0x00 0x00 0x00 0x00 (marked with >>>> below

No.     Time        Source                Destination           Protocol Info
     21 0.016634    126.1.9.1             126.0.253.253         NFS      V2 READDIR Call (Reply In 22), FH:0xc9b7ea98

Frame 21 (154 bytes on wire, 154 bytes captured)
    Packet Length: 154 bytes
    Capture Length: 154 bytes
    Protocols in frame: eth:ip:udp:rpc
Ethernet II, Src: IntelCor_00:b2:fc (00:15:17:00:b2:fc), Dst: Intel_7f:7b:be (00:0e:0c:7f:7b:be)
    Type: IP (0x0800)
Internet Protocol, Src: 126.1.9.1 (126.1.9.1), Dst: 126.0.253.253 (126.0.253.253)
User Datagram Protocol, Src Port: 800 (800), Dst Port: nfsd (2049)
    Source port: 800 (800)
    Destination port: nfsd (2049)
    Length: 120
    Checksum: 0x0337 [correct]
Remote Procedure Call, Type:Call XID:0x0001a555
    XID: 0x1a555 (107861)
    Message Type: Call (0)
    RPC Version: 2
    Program: NFS (100003)
    Program Version: 2
    Procedure: READDIR (16)
    The reply to this request is in frame 22
    Credentials
        Flavor: AUTH_UNIX (1)
        Length: 32
        Stamp: 0x000006dc
        Machine Name: (none)
            length: 6
            contents: (none)
            fill bytes: opaque data
        UID: 0
        GID: 0
>>>>>   Auxiliary GIDs
>>>>>       GID: 0
    Verifier
        Flavor: AUTH_NULL (0)
        Length: 0
Network File System, READDIR Call FH:0xc9b7ea98
    Program Version: 2
    V2 Procedure: READDIR (16)
    dir
        hash: 0xc9b7ea98
        type: unknown
        data: 01000001000300010C7006001F720600
              775AFD4A000000000000000000000000
    Cookie: 0
    Count: 4096

0000  00 0e 0c 7f 7b be 00 15 17 00 b2 fc 08 00 45 00   ....{.........E.
0010  00 8c 00 00 40 00 40 11 37 61 7e 01 09 01 7e 00   ....@.@.7a~...~.
0020  fd fd 03 20 08 01 00 78 03 37 00 01 a5 55 00 00   ... ...x.7...U..
0030  00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00   ................
0040  00 10 00 00 00 01 00 00 00 20 00 00 06 dc 00 00   ......... ......
0050  00 06 28 6e 6f 6e 65 29 00 00 00 00 00 00 00 00   ..(none)........
0060  00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00   ................
0070  00 00 01 00 00 01 00 03 00 01 0c 70 06 00 1f 72   ...........p...r
0080  06 00 77 5a fd 4a 00 00 00 00 00 00 00 00 00 00   ..wZ.J..........
0090  00 00 00 00 00 00 00 00 10 00                     ..........

------
In this case we see a correct READDIR from the NFS server in /var/log/messages, too:
Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 RNC_01_12_00_015_C1_01_RNL/rnld, ino=422431)
Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/WBSrvr, ino=422432)
Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/apddrv.o, ino=422433)
Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/backplane_stats_2800, ino=422434)
...

-----

Unfortunatly we are not able to manually generate the NFS packets without the Auxiliary GID field manually. This is another item...

Conclusions:
- either the server must not accept a NFS call without "Auxiliary GID"-"GID"
- or the "Auxiliary GID"-"GID" does not have a default value in the server

or what else?


Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-65.EL4

How reproducible:
Sometimes


Steps to Reproduce:
Send a READDIR CALL without "Auxiliary GID"-"GID" parameter

Actual Results:
The client reports "NFS: READDIR REPLY TRUNCATED" and does not mount the remote file system

Expected Results:
NFS file system is mounted

Additional info:

Comment 1 Guenter Koellner 2006-12-08 14:06:00 UTC
Pictures are not on the web page, but attached directly here.

Comment 2 Guenter Koellner 2006-12-08 14:07:20 UTC
Created attachment 143146 [details]
Comparison of the partly failing READDIR CALL (left) and the always working one (right)

Comment 3 Guenter Koellner 2006-12-08 14:08:37 UTC
Created attachment 143147 [details]
Comparison of the READDIR REPLY (left) and the correct answer (right)

Comment 4 Peter Staubach 2008-02-07 16:38:11 UTC
There are several things here.

First, it is legal to have a request which contains no auxiliary groups.
The server can not make any judgements based on whether the client passed
it a list of groups or not.

Second, I believe that this is the same as bz430946, which was recently
found and fixed.  To verify this, could a raw capture file of the
failing traffic be attached?

Comment 5 Peter Staubach 2009-03-25 15:06:21 UTC

*** This bug has been marked as a duplicate of bug 430946 ***


Note You need to log in before you can comment on or make changes to this bug.