From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ICM60; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727) Description of problem: We have a client running Montavista Carrier Grade Linux that is mounting a directory IN ITS BOOT PHASE on a Redhat Linux server. In about 10% of the tries the client reports a "NFS READDIR TRUNCATED" and does, of course, not mount the directory. We found that the NFS commands of the Montavista Client do not contain a field named "Auxiliary GID"-"GID". The NFS client accepts this message, but somehow the internal readdir crashes in about 10% of cases. Detailed description - long because it contains the Ethereal traces. See also the two GIF pictures in above web page which show the comparison of the two traces in a better way (will be loaded in about 6h) Using ethereal we found that our client while booting is sending his NFS READDIR CALL like this: ----- No. Time Source Destination Protocol Info 79 0.003147 126.0.17.44 126.0.253.253 NFS V2 READDIR Call (Reply In 80), FH:0xdb03a6ea Frame 79 (150 bytes on wire, 150 bytes captured) Packet Length: 150 bytes Capture Length: 150 bytes Protocols in frame: eth:ip:udp:rpc Ethernet II, Src: Intel_41:37:79 (00:07:e9:41:37:79), Dst: Intel_9c:5d:24 (00:0e:0c:9c:5d:24) Type: IP (0x0800) Internet Protocol, Src: 126.0.17.44 (126.0.17.44), Dst: 126.0.253.253 (126.0.253.253) User Datagram Protocol, Src Port: 800 (800), Dst Port: nfsd (2049) Source port: 800 (800) Destination port: nfsd (2049) Length: 116 Checksum: 0x3cec [correct] Remote Procedure Call, Type:Call XID:0x0001f002 XID: 0x1f002 (126978) Message Type: Call (0) RPC Version: 2 Program: NFS (100003) Program Version: 2 Procedure: READDIR (16) The reply to this request is in frame 80 Credentials Flavor: AUTH_UNIX (1) Length: 28 Stamp: 0x0000001f Machine Name: (none) length: 6 contents: (none) fill bytes: opaque data UID: 0 GID: 0 >>>>> Auxiliary GIDs Verifier Flavor: AUTH_NULL (0) Length: 0 Network File System, READDIR Call FH:0xdb03a6ea Program Version: 2 V2 Procedure: READDIR (16) dir hash: 0xdb03a6ea type: unknown data: 01000001000300010C70060068AF0600 DA2DCB77000000000000000000000000 Cookie: 0 Count: 4096 0000 00 0e 0c 9c 5d 24 00 07 e9 41 37 79 08 00 45 00 ....]$...A7y..E. 0010 00 88 00 00 40 00 40 11 2f 3b 7e 00 11 2c 7e 00 ....@.@./;~..,~. 0020 fd fd 03 20 08 01 00 74 3c ec 00 01 f0 02 00 00 ... ...t<....... 0030 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00 ................ 0040 00 10 00 00 00 01 00 00 00 1c 00 00 00 1f 00 00 ................ 0050 00 06 28 6e 6f 6e 65 29 00 00 00 00 00 00 00 00 ..(none)........ 0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 ................ 0070 00 01 00 03 00 01 0c 70 06 00 68 af 06 00 da 2d .......p..h....- 0080 cb 77 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .w.............. 0090 00 00 00 00 10 00 ...... ----- The errored reply (in about 10% of the cases) is like this No. Time Source Destination Protocol Info 80 0.000038 126.0.253.253 126.0.17.44 NFS V2 READDIR Reply (Call In 79)[Malformed Packet] Frame 80 (70 bytes on wire, 70 bytes captured) Packet Length: 70 bytes Capture Length: 70 bytes Protocols in frame: eth:ip:udp:rpc Ethernet II, Src: Intel_9c:5d:24 (00:0e:0c:9c:5d:24), Dst: Intel_41:37:79 (00:07:e9:41:37:79) Type: IP (0x0800) Internet Protocol, Src: 126.0.253.253 (126.0.253.253), Dst: 126.0.17.44 (126.0.17.44) User Datagram Protocol, Src Port: nfsd (2049), Dst Port: 800 (800) Source port: nfsd (2049) Destination port: 800 (800) Length: 36 Checksum: 0x0b60 [incorrect, should be 0xf955] Remote Procedure Call, Type:Reply XID:0x0001f002 XID: 0x1f002 (126978) Message Type: Reply (1) Program: NFS (100003) Program Version: 2 Procedure: READDIR (16) Reply State: accepted (0) This is a reply to a request in frame 79 Time from request: 0.000038000 seconds Verifier Flavor: AUTH_NULL (0) Length: 0 Accept State: RPC executed successfully (0) Network File System, READDIR Reply Program Version: 2 V2 Procedure: READDIR (16) Status: NFS_OK (0) [Malformed Packet: NFS] 0000 00 07 e9 41 37 79 00 0e 0c 9c 5d 24 08 00 45 00 ...A7y....]$..E. 0010 00 38 2d 8e 40 00 40 11 01 fd 7e 00 fd fd 7e 00 .8-.@.@...~...~. 0020 11 2c 08 01 03 20 00 24 0b 60 00 01 f0 02 00 00 .,... .$.`...... 0030 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0040 00 00 00 00 00 00 ...... ----- After doing "echo 1023 >> /proc/sys/sunrpc/nfsd_debug" we see this in /var/log/messages that the malformed packet is already caused in the NFS server: Dec 5 11:11:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 RNC_01_12_00_015_C1_01_RNL/rnld, ino=422431) Dec 5 11:11:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/*, ino=0) ----- After the system has booted, we manually mount the directory, which always works. Surprisingly the message then contains another field "Auxiliary GIDs"-"GID" containing 0x00 0x00 0x00 0x00 (marked with >>>> below No. Time Source Destination Protocol Info 21 0.016634 126.1.9.1 126.0.253.253 NFS V2 READDIR Call (Reply In 22), FH:0xc9b7ea98 Frame 21 (154 bytes on wire, 154 bytes captured) Packet Length: 154 bytes Capture Length: 154 bytes Protocols in frame: eth:ip:udp:rpc Ethernet II, Src: IntelCor_00:b2:fc (00:15:17:00:b2:fc), Dst: Intel_7f:7b:be (00:0e:0c:7f:7b:be) Type: IP (0x0800) Internet Protocol, Src: 126.1.9.1 (126.1.9.1), Dst: 126.0.253.253 (126.0.253.253) User Datagram Protocol, Src Port: 800 (800), Dst Port: nfsd (2049) Source port: 800 (800) Destination port: nfsd (2049) Length: 120 Checksum: 0x0337 [correct] Remote Procedure Call, Type:Call XID:0x0001a555 XID: 0x1a555 (107861) Message Type: Call (0) RPC Version: 2 Program: NFS (100003) Program Version: 2 Procedure: READDIR (16) The reply to this request is in frame 22 Credentials Flavor: AUTH_UNIX (1) Length: 32 Stamp: 0x000006dc Machine Name: (none) length: 6 contents: (none) fill bytes: opaque data UID: 0 GID: 0 >>>>> Auxiliary GIDs >>>>> GID: 0 Verifier Flavor: AUTH_NULL (0) Length: 0 Network File System, READDIR Call FH:0xc9b7ea98 Program Version: 2 V2 Procedure: READDIR (16) dir hash: 0xc9b7ea98 type: unknown data: 01000001000300010C7006001F720600 775AFD4A000000000000000000000000 Cookie: 0 Count: 4096 0000 00 0e 0c 7f 7b be 00 15 17 00 b2 fc 08 00 45 00 ....{.........E. 0010 00 8c 00 00 40 00 40 11 37 61 7e 01 09 01 7e 00 ....@.@.7a~...~. 0020 fd fd 03 20 08 01 00 78 03 37 00 01 a5 55 00 00 ... ...x.7...U.. 0030 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 00 00 ................ 0040 00 10 00 00 00 01 00 00 00 20 00 00 06 dc 00 00 ......... ...... 0050 00 06 28 6e 6f 6e 65 29 00 00 00 00 00 00 00 00 ..(none)........ 0060 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 ................ 0070 00 00 01 00 00 01 00 03 00 01 0c 70 06 00 1f 72 ...........p...r 0080 06 00 77 5a fd 4a 00 00 00 00 00 00 00 00 00 00 ..wZ.J.......... 0090 00 00 00 00 00 00 00 00 10 00 .......... ------ In this case we see a correct READDIR from the NFS server in /var/log/messages, too: Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 RNC_01_12_00_015_C1_01_RNL/rnld, ino=422431) Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/WBSrvr, ino=422432) Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/apddrv.o, ino=422433) Dec 5 11:14:33 demmc03c kernel: nfsd: fh_compose(exp 03:01/421900 rnld/backplane_stats_2800, ino=422434) ... ----- Unfortunatly we are not able to manually generate the NFS packets without the Auxiliary GID field manually. This is another item... Conclusions: - either the server must not accept a NFS call without "Auxiliary GID"-"GID" - or the "Auxiliary GID"-"GID" does not have a default value in the server or what else? Version-Release number of selected component (if applicable): nfs-utils-1.0.6-65.EL4 How reproducible: Sometimes Steps to Reproduce: Send a READDIR CALL without "Auxiliary GID"-"GID" parameter Actual Results: The client reports "NFS: READDIR REPLY TRUNCATED" and does not mount the remote file system Expected Results: NFS file system is mounted Additional info:
Pictures are not on the web page, but attached directly here.
Created attachment 143146 [details] Comparison of the partly failing READDIR CALL (left) and the always working one (right)
Created attachment 143147 [details] Comparison of the READDIR REPLY (left) and the correct answer (right)
There are several things here. First, it is legal to have a request which contains no auxiliary groups. The server can not make any judgements based on whether the client passed it a list of groups or not. Second, I believe that this is the same as bz430946, which was recently found and fixed. To verify this, could a raw capture file of the failing traffic be attached?
*** This bug has been marked as a duplicate of bug 430946 ***