Bug 2230217 - Please enable CONFIG_UNICODE kernel option [NEEDINFO]
Summary: Please enable CONFIG_UNICODE kernel option
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.1
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: fs-maint Bot
QA Contact: Boyang Xue
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-09 02:13 UTC by Martin Schwenke
Modified: 2023-08-14 15:42 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:
swhiteho: needinfo? (madam)
esandeen: needinfo? (dchinner)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-164945 0 None None None 2023-08-09 02:17:44 UTC

Description Martin Schwenke 2023-08-09 02:13:26 UTC
Description of problem:

The ext4 filesystem has supported case-insensitive filenames since kernel version 5.2.  This needs the CONFIG_UNICODE kernel option to be enabled.

This feature is useful, for example, for allowing Samba to operate on large directories (e.g. millions of files), given that SMB is case-insensitive.


Version-Release number of selected component (if applicable):

all


How reproducible:

Wishlist, so always


Steps to Reproduce:
0. modprobe ext4
1.ls cat /sys/fs/ext4/features/casefold

Actual results:

ls: cannot access '/sys/fs/ext4/features/casefold': No such file or directory

Expected results:

File exists, indicating kernel was compiled with CONFIG_UNICODE=y.


Additional info:

/boot/config-5.14.0-162.6.1.el9_1.x86_86 (for example) has

# CONFIG_UNICODE is not set

Comment 2 Andreas Schneider 2023-08-09 10:03:24 UTC
Yes, this is useful for Samba as it improves performance for those shares! If Windows looks for a file foo, we do not have to check all permutations.

See the following options in `man smb.conf` to improve performance:

case sensitive
default case
preserve case
short preserve case


See also
https://wiki.samba.org/index.php/Performance_Tuning#Directories_with_a_Large_Number_of_Files

Comment 7 Eric Sandeen 2023-08-14 15:42:10 UTC
(Re-adding these notes as non-private comments, sorry about that.)

XFS has had ascii-ci capability for a long time, for just this purpose (samba). However, it was recently deprecated due to problems:

commit 7ba83850ca2691865713b307ed001bde5fddb084
Author: Darrick J. Wong <djwong>
Date:   Tue Apr 11 19:05:19 2023 -0700

    xfs: deprecate the ascii-ci feature
    
    This feature is a mess -- the hash function has been broken for the
    entire 15 years of its existence if you create names with extended ascii
    bytes; metadump name obfuscation has silently failed for just as long;
    and the feature clashes horribly with the UTF8 encodings that most
    systems use today.  There is exactly one fstest for this feature.
    
    In other words, this feature is crap.  Let's deprecate it now so we can
    remove it from the codebase in 2030.
    
    Signed-off-by: Darrick J. Wong <djwong>
    Reviewed-by: Christoph Hellwig <hch>

The more involved CONFIG_UNICODE implementation actually started with XFS as well, as noted in the upstream commit that got merged:

commit 955405d1174eebcd1b89ab335f720adc27d52b67
Author: Gabriel Krisman Bertazi <krisman>
Date:   Thu Apr 25 13:38:44 2019 -0400

    unicode: introduce UTF-8 character database

The original XFS RFCs for this feature can be found at:

V1: https://lore.kernel.org/linux-xfs/20140911203735.GA19952@sgi.com/
V2: https://lore.kernel.org/linux-xfs/20140918195650.GI19952@sgi.com/
V3: https://lore.kernel.org/linux-xfs/20141003214758.GY1865@sgi.com/

The EXT4 RFC based on the above can be found here:

https://lore.kernel.org/linux-ext4/20180112071234.29470-1-krisman@collabora.co.uk/

These contain a lot of the information about how this works and the rationale for it.

I don't remember why the work died on the vine for XFS - Dave?

Maybe the other thing to note is that today, enabling CONFIG_UNICODE affects/alters primarily ext4, but also f2fs (which we don't build or ship) and ksmbd.  In the past, this would have been a KABI issue but in RHEL9 that would not be a problem since we reset anyway.

I'm also not sure how robust the ext4 implementation is; ext4 has a habit of merging new features that are not quite complete. At a minimum we'd want to look for robust test coverage before enabling and supporting this.


Note You need to log in before you can comment on or make changes to this bug.