Description of problem: The ext4 filesystem has supported case-insensitive filenames since kernel version 5.2. This needs the CONFIG_UNICODE kernel option to be enabled. This feature is useful, for example, for allowing Samba to operate on large directories (e.g. millions of files), given that SMB is case-insensitive. Version-Release number of selected component (if applicable): all How reproducible: Wishlist, so always Steps to Reproduce: 0. modprobe ext4 1.ls cat /sys/fs/ext4/features/casefold Actual results: ls: cannot access '/sys/fs/ext4/features/casefold': No such file or directory Expected results: File exists, indicating kernel was compiled with CONFIG_UNICODE=y. Additional info: /boot/config-5.14.0-162.6.1.el9_1.x86_86 (for example) has # CONFIG_UNICODE is not set
Yes, this is useful for Samba as it improves performance for those shares! If Windows looks for a file foo, we do not have to check all permutations. See the following options in `man smb.conf` to improve performance: case sensitive default case preserve case short preserve case See also https://wiki.samba.org/index.php/Performance_Tuning#Directories_with_a_Large_Number_of_Files
(Re-adding these notes as non-private comments, sorry about that.) XFS has had ascii-ci capability for a long time, for just this purpose (samba). However, it was recently deprecated due to problems: commit 7ba83850ca2691865713b307ed001bde5fddb084 Author: Darrick J. Wong <djwong> Date: Tue Apr 11 19:05:19 2023 -0700 xfs: deprecate the ascii-ci feature This feature is a mess -- the hash function has been broken for the entire 15 years of its existence if you create names with extended ascii bytes; metadump name obfuscation has silently failed for just as long; and the feature clashes horribly with the UTF8 encodings that most systems use today. There is exactly one fstest for this feature. In other words, this feature is crap. Let's deprecate it now so we can remove it from the codebase in 2030. Signed-off-by: Darrick J. Wong <djwong> Reviewed-by: Christoph Hellwig <hch> The more involved CONFIG_UNICODE implementation actually started with XFS as well, as noted in the upstream commit that got merged: commit 955405d1174eebcd1b89ab335f720adc27d52b67 Author: Gabriel Krisman Bertazi <krisman> Date: Thu Apr 25 13:38:44 2019 -0400 unicode: introduce UTF-8 character database The original XFS RFCs for this feature can be found at: V1: https://lore.kernel.org/linux-xfs/20140911203735.GA19952@sgi.com/ V2: https://lore.kernel.org/linux-xfs/20140918195650.GI19952@sgi.com/ V3: https://lore.kernel.org/linux-xfs/20141003214758.GY1865@sgi.com/ The EXT4 RFC based on the above can be found here: https://lore.kernel.org/linux-ext4/20180112071234.29470-1-krisman@collabora.co.uk/ These contain a lot of the information about how this works and the rationale for it. I don't remember why the work died on the vine for XFS - Dave? Maybe the other thing to note is that today, enabling CONFIG_UNICODE affects/alters primarily ext4, but also f2fs (which we don't build or ship) and ksmbd. In the past, this would have been a KABI issue but in RHEL9 that would not be a problem since we reset anyway. I'm also not sure how robust the ext4 implementation is; ext4 has a habit of merging new features that are not quite complete. At a minimum we'd want to look for robust test coverage before enabling and supporting this.