Bug 2231130

Summary: Decoding filenames longer than 255 characters causes READDIR loop
Product: Red Hat Enterprise Linux 8 Reporter: Frank Sorenson <fsorenso>
Component: kernelAssignee: NFS Team <nfs-team>
kernel sub component: NFS QA Contact: JianHong Yin <jiyin>
Status: NEW --- Docs Contact:
Severity: medium    
Priority: unspecified CC: xzhou, yoyang
Version: 8.8Keywords: Regression
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Sorenson 2023-08-10 17:28:39 UTC
Description of problem:

The following commit went into RHEL between RHEL  8.7 and RHEL 8.8:
    3bce449d67c0 NFS: Return valid errors from nfs2/3_decode_dirent()

This changed the return value on READDIR directory entry decoding to EAGAIN.  When the nfs server returns an entry which is invalid, the client will request the entry again.  If the nfs server continues to return the same invalid entry, the client goes into a loop, requesting the same cookie over and over again, without making any progress.

In this specific case, the nfs server is returning filenames with lengths longer than 255 characters.


Version-Release number of selected component (if applicable):

kernel-4.18.0-477.15.1.el8_8



How reproducible:

Always at customer's site; unknown otherwise


Steps to Reproduce:

unknown


Actual results:

nfs client goes into READDIR loop, never returns from getdents()



Expected results:

nfs client ignores or returns the error, or truncates the filename, or ...?


Additional info:

customer-provided packet capture shows 262-byte filenames, and several thousand READDIRPLUS calls with the same cookie per second

Comment 2 Frank Sorenson 2023-08-10 18:37:11 UTC
just confirming that the server's reply is the same for both the RHEL 8.7 and 8.8 kernels:

$ tshark -z proto,colinfo,nfs.cookie3,nfs.cookie3 -n -r non-working-trimmed.pcap 'rpc.msgtyp==0 && nfs.fh.hash==0x64f6b3e6' | head
   15   0.000198 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   40   0.000627 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   59   0.000999 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   78   0.001366 10.24.85.207 → 10.24.85.215 NFS 362 V3 READDIRPLUS Call, FH: 0x2aba2fe0  ; V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 2801733632  nfs.cookie3 == 3360432128
   94   0.001725 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  116   0.002086 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  145   0.002516 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  163   0.002898 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  182   0.003257 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  205   0.003644 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
(there are 77,000+ of these in the full packet capture)

looking at one response:

   35   0.000585 10.24.85.215 → 10.24.85.207 NFS 2362 V3 READDIRPLUS Reply (Call In 15) ����������������������...
Frame 35: 2362 bytes on wire (18896 bits), 2362 bytes captured (18896 bits) on interface unknown, id 0
Ethernet II, Src: 88:e9:a4:23:7a:fa, Dst: 24:6e:96:7a:2d:22
Internet Protocol Version 4, Src: 10.24.85.215, Dst: 10.24.85.207
Transmission Control Protocol, Src Port: 2049, Dst Port: 1019, Seq: 21773, Ack: 1977, Len: 2308
Remote Procedure Call, Type:Reply XID:0x1d38602c
Network File System, READDIRPLUS Reply
...
    Value Follows: Yes
     [truncated]Entry: name ���������������������� �������������������� ������������������ ����������
        File ID: 278459986
         [truncated]Name: ���������������������� �������������������� ������������������ ������������
            length: 256
             [truncated]contents: ���������������������� �������������������� ������������������ �����������
        Cookie: 3716853760
...


and in a packet capture from RHEL 8.7 kernel:
$ tshark -z proto,colinfo,nfs.cookie3,nfs.cookie3 -n -r working.pcap 'rpc.msgtyp==0 && nfs.fh.hash==0x64f6b3e6'
16351  45.443913 10.24.85.207 → 10.24.85.215 NFS 182 V3 GETATTR Call, FH: 0x64f6b3e6
16737  48.342009 10.24.85.207 → 10.24.85.215 NFS 182 V3 GETATTR Call, FH: 0x64f6b3e6
16739  48.342247 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128

16740  48.343576 10.24.85.215 → 10.24.85.207 NFS 2362 V3 READDIRPLUS Reply (Call In 16739) ����������������������...
Frame 16740: 2362 bytes on wire (18896 bits), 2362 bytes captured (18896 bits)
Ethernet II, Src: 88:e9:a4:23:7a:fa, Dst: 24:6e:96:7a:2d:22
Internet Protocol Version 4, Src: 10.24.85.215, Dst: 10.24.85.207
Transmission Control Protocol, Src Port: 2049, Dst Port: 890, Seq: 40776121, Ack: 932721, Len: 2308
Remote Procedure Call, Type:Reply XID:0x13085c1c
Network File System, READDIRPLUS Reply
...
    Value Follows: Yes
     [truncated]Entry: name ���������������������� �������������������� ������������������ ����������
        File ID: 278459986
         [truncated]Name: ���������������������� �������������������� ������������������ ������������
            length: 256
             [truncated]contents: ���������������������� �������������������� ������������������ �����������
        Cookie: 3716853760
...