RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2231130 - Decoding filenames longer than 255 characters causes READDIR loop
Summary: Decoding filenames longer than 255 characters causes READDIR loop
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Benjamin Coddington
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-10 17:28 UTC by Frank Sorenson
Modified: 2024-05-22 09:55 UTC (History)
4 users (show)

Fixed In Version: kernel-4.18.0-553.el8_10
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-05-22 09:51:48 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
systemtap reproducer (4.13 KB, text/x-csrc)
2023-09-12 22:28 UTC, Frank Sorenson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/kernel rhel-8 merge_requests 5318 0 None None None 2023-09-05 20:17:02 UTC
Red Hat Issue Tracker RHELPLAN-165534 0 None None None 2023-08-10 17:29:55 UTC
Red Hat Product Errata RHSA-2024:3138 0 None None None 2024-05-22 09:51:50 UTC

Description Frank Sorenson 2023-08-10 17:28:39 UTC
Description of problem:

The following commit went into RHEL between RHEL  8.7 and RHEL 8.8:
    3bce449d67c0 NFS: Return valid errors from nfs2/3_decode_dirent()

This changed the return value on READDIR directory entry decoding to EAGAIN.  When the nfs server returns an entry which is invalid, the client will request the entry again.  If the nfs server continues to return the same invalid entry, the client goes into a loop, requesting the same cookie over and over again, without making any progress.

In this specific case, the nfs server is returning filenames with lengths longer than 255 characters.


Version-Release number of selected component (if applicable):

kernel-4.18.0-477.15.1.el8_8



How reproducible:

Always at customer's site; unknown otherwise


Steps to Reproduce:

unknown


Actual results:

nfs client goes into READDIR loop, never returns from getdents()



Expected results:

nfs client ignores or returns the error, or truncates the filename, or ...?


Additional info:

customer-provided packet capture shows 262-byte filenames, and several thousand READDIRPLUS calls with the same cookie per second

Comment 2 Frank Sorenson 2023-08-10 18:37:11 UTC
just confirming that the server's reply is the same for both the RHEL 8.7 and 8.8 kernels:

$ tshark -z proto,colinfo,nfs.cookie3,nfs.cookie3 -n -r non-working-trimmed.pcap 'rpc.msgtyp==0 && nfs.fh.hash==0x64f6b3e6' | head
   15   0.000198 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   40   0.000627 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   59   0.000999 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
   78   0.001366 10.24.85.207 → 10.24.85.215 NFS 362 V3 READDIRPLUS Call, FH: 0x2aba2fe0  ; V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 2801733632  nfs.cookie3 == 3360432128
   94   0.001725 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  116   0.002086 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  145   0.002516 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  163   0.002898 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  182   0.003257 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
  205   0.003644 10.24.85.207 → 10.24.85.215 NFS 210 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128
(there are 77,000+ of these in the full packet capture)

looking at one response:

   35   0.000585 10.24.85.215 → 10.24.85.207 NFS 2362 V3 READDIRPLUS Reply (Call In 15) ����������������������...
Frame 35: 2362 bytes on wire (18896 bits), 2362 bytes captured (18896 bits) on interface unknown, id 0
Ethernet II, Src: 88:e9:a4:23:7a:fa, Dst: 24:6e:96:7a:2d:22
Internet Protocol Version 4, Src: 10.24.85.215, Dst: 10.24.85.207
Transmission Control Protocol, Src Port: 2049, Dst Port: 1019, Seq: 21773, Ack: 1977, Len: 2308
Remote Procedure Call, Type:Reply XID:0x1d38602c
Network File System, READDIRPLUS Reply
...
    Value Follows: Yes
     [truncated]Entry: name ���������������������� �������������������� ������������������ ����������
        File ID: 278459986
         [truncated]Name: ���������������������� �������������������� ������������������ ������������
            length: 256
             [truncated]contents: ���������������������� �������������������� ������������������ �����������
        Cookie: 3716853760
...


and in a packet capture from RHEL 8.7 kernel:
$ tshark -z proto,colinfo,nfs.cookie3,nfs.cookie3 -n -r working.pcap 'rpc.msgtyp==0 && nfs.fh.hash==0x64f6b3e6'
16351  45.443913 10.24.85.207 → 10.24.85.215 NFS 182 V3 GETATTR Call, FH: 0x64f6b3e6
16737  48.342009 10.24.85.207 → 10.24.85.215 NFS 182 V3 GETATTR Call, FH: 0x64f6b3e6
16739  48.342247 10.24.85.207 → 10.24.85.215 NFS 206 V3 READDIRPLUS Call, FH: 0x64f6b3e6  nfs.cookie3 == 3360432128

16740  48.343576 10.24.85.215 → 10.24.85.207 NFS 2362 V3 READDIRPLUS Reply (Call In 16739) ����������������������...
Frame 16740: 2362 bytes on wire (18896 bits), 2362 bytes captured (18896 bits)
Ethernet II, Src: 88:e9:a4:23:7a:fa, Dst: 24:6e:96:7a:2d:22
Internet Protocol Version 4, Src: 10.24.85.215, Dst: 10.24.85.207
Transmission Control Protocol, Src Port: 2049, Dst Port: 890, Seq: 40776121, Ack: 932721, Len: 2308
Remote Procedure Call, Type:Reply XID:0x13085c1c
Network File System, READDIRPLUS Reply
...
    Value Follows: Yes
     [truncated]Entry: name ���������������������� �������������������� ������������������ ����������
        File ID: 278459986
         [truncated]Name: ���������������������� �������������������� ������������������ ������������
            length: 256
             [truncated]contents: ���������������������� �������������������� ������������������ �����������
        Cookie: 3716853760
...

Comment 9 Frank Sorenson 2023-09-12 22:28:47 UTC
Created attachment 1988540 [details]
systemtap reproducer

Systemtap reproducer for nfs server, modifying the filename and length.  Whenever nfsd is encoding a 255-byte filename, modifies the filename to 260 bytes and changes the length to 260 accordingly.

Can be used as a single-system reproducer (just mount the export from localhost)

# stap -vg mod_entry3_filename_len.stp



Create a 255-byte file in the exported directory or nfs mounted directory:
    # touch AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYY

    # sysctl vm.drop_caches=3

list the directory:
    # ls -l
    (command hangs)


packet capture looks like:

   18 11.416369410    127.0.0.1 → 127.0.0.1    NFS 226 V3 READDIRPLUS Call (Reply In 19), FH: 0xe6a616ba
   19 11.416626921    127.0.0.1 → 127.0.0.1    NFS 938 V3 READDIRPLUS Reply (Call In 18) . .. AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZ
   20 11.416807354    127.0.0.1 → 127.0.0.1    NFS 226 V3 READDIRPLUS Call (Reply In 21), FH: 0xe6a616ba
   21 11.416992454    127.0.0.1 → 127.0.0.1    NFS 622 V3 READDIRPLUS Reply (Call In 20) AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZ
   22 11.417131927    127.0.0.1 → 127.0.0.1    NFS 226 V3 READDIRPLUS Call (Reply In 23), FH: 0xe6a616ba
   23 11.417265637    127.0.0.1 → 127.0.0.1    NFS 622 V3 READDIRPLUS Reply (Call In 22) AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZ
   ...

Comment 10 Frank Sorenson 2023-09-12 22:32:32 UTC
Note that the reproducer is kernel-specific, and likely requires some setup for other kernel versions (currently works for 4.18.0-477.15.1.el8_8.x86_64).  See comments in the .stp for details.

Other than verifying that this bug is fixed, I doubt this reproducer is worth making into a general-purpose test.

Comment 11 Yongcheng Yang 2023-09-26 13:51:40 UTC
Hi Frank,

Now the command won't hang but turns to show an error as "File name too long".

IMO this is as expected thus I'm pre-verifying it for now.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@ibm-x3850x5-03-vm-01 ~]# uname -r
4.18.0-513.el8.5318_994444625.x86_64
[root@ibm-x3850x5-03-vm-01 ~]# mount $HOSTNAME:/export_test /mnt_test -o vers=3
[root@ibm-x3850x5-03-vm-01 ~]# nfsstat -m
/mnt_test from ibm-x3850x5-03-vm-01.rhts.eng.pek2.redhat.com:/export_test
 Flags: rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.73.4.115,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.73.4.115

[root@ibm-x3850x5-03-vm-01 ~]# cd /mnt_test/
[root@ibm-x3850x5-03-vm-01 mnt_test]# touch AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYY
[root@ibm-x3850x5-03-vm-01 mnt_test]# sysctl vm.drop_caches=3
vm.drop_caches = 3
[root@ibm-x3850x5-03-vm-01 mnt_test]# ls -l
ls: reading directory '.': File name too long      <<<<<<<<<<<<<<<<<
total 0
[root@ibm-x3850x5-03-vm-01 mnt_test]# echo $?
2
[root@ibm-x3850x5-03-vm-01 mnt_test]#

#
# On another terminal running the systemtap
#
[root@ibm-x3850x5-03-vm-01 ~]# stap -vg mod_entry3_filename_len.stp
Pass 1: parsed user script and 486 library scripts using 299124virt/97164res/16700shr/82192data kb, in 490usr/420sys/1232real ms.
WARNING: liveness analysis unable to parse binary /lib/modules/4.18.0-513.el8.5318_994444625.x86_64/kernel/fs/nfsd/nfsd.ko.xz: identifier '$namlen' at mod_entry3_filename_len.stp:23:3
 source: 		$namlen = strlen(new_name)
         		^
Pass 2: analyzed script: 4 probes, 17 functions, 1 embed, 7 globals using 303440virt/102848res/18136shr/86508data kb, in 670usr/1470sys/2973real ms.
Pass 3: using cached /root/.systemtap/cache/7a/stap_7aaa2af9fb26756c8542cc8bf5a70440_14025.c
Pass 4: using cached /root/.systemtap/cache/7a/stap_7aaa2af9fb26756c8542cc8bf5a70440_14025.ko
Pass 5: starting run.
1: modified maximum filename length to 260 bytes        <<<<<<<<<<<<<
^CPass 5: run completed in 430usr/3970sys/179605real ms.
[root@ibm-x3850x5-03-vm-01 ~]#
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Compared with previous kernel:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@kvm-02-guest17 ~]# nfsstat -m
/mnt_test from kvm-02-guest17.rhts.eng.brq.redhat.com:/export_test/
 Flags: rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.37.153.91,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.37.153.91

[root@kvm-02-guest17 ~]# ls -l /export_test/
total 0
-rw-r--r--. 1 root root 0 Sep 26 11:17 AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYY
[root@kvm-02-guest17 ~]# sysctl vm.drop_caches=3
vm.drop_caches = 3
[root@kvm-02-guest17 ~]# ls -l /mnt_test/

^C
[root@kvm-02-guest17 ~]# uname -r
4.18.0-514.el8.x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@kvm-02-guest17 ~]#

Comment 12 Frank Sorenson 2023-10-02 15:44:59 UTC
(In reply to Yongcheng Yang from comment #11)
> Hi Frank,
> 
> Now the command won't hang but turns to show an error as "File name too
> long".
> 
> IMO this is as expected thus I'm pre-verifying it for now.

correct, that's the expected error

Comment 17 Yongcheng Yang 2023-10-07 13:51:29 UTC
Verified in kernel-4.18.0-516.el8

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@hp-dl360g9-06 ~]# uname -r
4.18.0-516.el8.x86_64
[root@hp-dl360g9-06 ~]# mount $HOSTNAME:/export_test /mnt_test -o vers=3
[root@hp-dl360g9-06 ~]# nfsstat -m
/mnt_test from hp-dl360g9-06.rhts.eng.pek2.redhat.com:/export_test
 Flags: rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.73.4.165,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.73.4.165

#
# On another terminal start the systemtap i.e. stap -vg mod_entry3_filename_len.stp
#

[root@hp-dl360g9-06 ~]# cd /mnt_test/
[root@hp-dl360g9-06 mnt_test]# touch AAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYY
[root@hp-dl360g9-06 mnt_test]# sysctl vm.drop_caches=3
vm.drop_caches = 3
[root@hp-dl360g9-06 mnt_test]# ls -l
ls: reading directory '.': File name too long
total 0
[root@hp-dl360g9-06 mnt_test]# cd
[root@hp-dl360g9-06 ~]# umount /mnt_test/
[root@hp-dl360g9-06 ~]# uname -r
4.18.0-516.el8.x86_64
[root@hp-dl360g9-06 ~]#

Comment 19 errata-xmlrpc 2024-05-22 09:51:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3138


Note You need to log in before you can comment on or make changes to this bug.