Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2230217

Summary: Please enable CONFIG_UNICODE kernel option
Product: Red Hat Enterprise Linux 9 Reporter: Martin Schwenke <martin>
Component: kernelAssignee: fs-maint Bot <fs-maint>
kernel sub component: File Systems QA Contact: Boyang Xue <bxue>
Status: CLOSED MIGRATED Docs Contact:
Severity: medium    
Priority: unspecified CC: asn, dchinner, dhowells, esandeen, madam, mszeredi, swhiteho, xzhou
Version: 9.1Keywords: MigratedToJIRA
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-23 12:06:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Schwenke 2023-08-09 02:13:26 UTC
Description of problem:

The ext4 filesystem has supported case-insensitive filenames since kernel version 5.2.  This needs the CONFIG_UNICODE kernel option to be enabled.

This feature is useful, for example, for allowing Samba to operate on large directories (e.g. millions of files), given that SMB is case-insensitive.


Version-Release number of selected component (if applicable):

all


How reproducible:

Wishlist, so always


Steps to Reproduce:
0. modprobe ext4
1.ls cat /sys/fs/ext4/features/casefold

Actual results:

ls: cannot access '/sys/fs/ext4/features/casefold': No such file or directory

Expected results:

File exists, indicating kernel was compiled with CONFIG_UNICODE=y.


Additional info:

/boot/config-5.14.0-162.6.1.el9_1.x86_86 (for example) has

# CONFIG_UNICODE is not set

Comment 2 Andreas Schneider 2023-08-09 10:03:24 UTC
Yes, this is useful for Samba as it improves performance for those shares! If Windows looks for a file foo, we do not have to check all permutations.

See the following options in `man smb.conf` to improve performance:

case sensitive
default case
preserve case
short preserve case


See also
https://wiki.samba.org/index.php/Performance_Tuning#Directories_with_a_Large_Number_of_Files

Comment 7 Eric Sandeen 2023-08-14 15:42:10 UTC
(Re-adding these notes as non-private comments, sorry about that.)

XFS has had ascii-ci capability for a long time, for just this purpose (samba). However, it was recently deprecated due to problems:

commit 7ba83850ca2691865713b307ed001bde5fddb084
Author: Darrick J. Wong <djwong>
Date:   Tue Apr 11 19:05:19 2023 -0700

    xfs: deprecate the ascii-ci feature
    
    This feature is a mess -- the hash function has been broken for the
    entire 15 years of its existence if you create names with extended ascii
    bytes; metadump name obfuscation has silently failed for just as long;
    and the feature clashes horribly with the UTF8 encodings that most
    systems use today.  There is exactly one fstest for this feature.
    
    In other words, this feature is crap.  Let's deprecate it now so we can
    remove it from the codebase in 2030.
    
    Signed-off-by: Darrick J. Wong <djwong>
    Reviewed-by: Christoph Hellwig <hch>

The more involved CONFIG_UNICODE implementation actually started with XFS as well, as noted in the upstream commit that got merged:

commit 955405d1174eebcd1b89ab335f720adc27d52b67
Author: Gabriel Krisman Bertazi <krisman>
Date:   Thu Apr 25 13:38:44 2019 -0400

    unicode: introduce UTF-8 character database

The original XFS RFCs for this feature can be found at:

V1: https://lore.kernel.org/linux-xfs/20140911203735.GA19952@sgi.com/
V2: https://lore.kernel.org/linux-xfs/20140918195650.GI19952@sgi.com/
V3: https://lore.kernel.org/linux-xfs/20141003214758.GY1865@sgi.com/

The EXT4 RFC based on the above can be found here:

https://lore.kernel.org/linux-ext4/20180112071234.29470-1-krisman@collabora.co.uk/

These contain a lot of the information about how this works and the rationale for it.

I don't remember why the work died on the vine for XFS - Dave?

Maybe the other thing to note is that today, enabling CONFIG_UNICODE affects/alters primarily ext4, but also f2fs (which we don't build or ship) and ksmbd.  In the past, this would have been a KABI issue but in RHEL9 that would not be a problem since we reset anyway.

I'm also not sure how robust the ext4 implementation is; ext4 has a habit of merging new features that are not quite complete. At a minimum we'd want to look for robust test coverage before enabling and supporting this.

Comment 8 Dave Chinner 2023-08-21 09:01:33 UTC
(In reply to Eric Sandeen from comment #7)
> I don't remember why the work died on the vine for XFS - Dave?

Because the people that proposed it (SGI employees) refused to make any changes to the code to support the unicode trie in the way that the wider upstream community wanted it to be done. They essentially said "We've already shipped this to customers and so the on-disk format is already fixed and cannot be changed. Take it or leave it." In the absence of anyone else having the time, resources or need to implement full unicode case folding, "leave it" was what happened.

I can't think of a case other than Android or Samba for CI filesystem support. There just isn't a demand for this outside of these two applications.

> Maybe the other thing to note is that today, enabling CONFIG_UNICODE
> affects/alters primarily ext4, but also f2fs (which we don't build or ship)
> and ksmbd.  In the past, this would have been a KABI issue but in RHEL9 that
> would not be a problem since we reset anyway.
> 
> I'm also not sure how robust the ext4 implementation is; ext4 has a habit of
> merging new features that are not quite complete. At a minimum we'd want to
> look for robust test coverage before enabling and supporting this.

Yeah, it has not been particularly robust so far - it's had on-disk name hashing problems (which then led us to the XFS ascii-ci name hashing issues referenced) - and there' still upstream VFS changes going on to solve various problems with general CI support. There is only one test in fstests (generic/556) that exercises basic case folding for CI filesystems, so I'm not really sure how much trust we can put in a "it's supported upstream" statement given how little test coverage the feature has....

-Dave.

Comment 9 Eric Sandeen 2023-08-21 15:38:13 UTC
Thanks Dave, that aligns with what I thought the situation was.

So I think we'd want to hear more from Samba folks (both upstream and the RHEL team) about how much performance difference this makes, to determine whether there's a good RHEL business case for it - because enabling it in RHEL sounds like it would take some significant effort to get testing and fixes in place for ext4, and TBH we'd want to at least consider implementing it on XFS as well, for parity.

Comment 10 Andreas Schneider 2023-08-22 08:44:57 UTC
As always it depends on the customers setup. A directory which is shared via samba could also be modified locally.

If a an SMB client connects to a share and wants to open a file, first we look for the file by the filename provided by client. If we are lucky the filename matches exactly the name we have on disk. If not we have find it with a full directory scan, checking every file name in the directory with a case insensitive string compare!

If you have a directory with 30k files, the directory scan can make things really slow. What if you have 100 clients trying to open a file in that directory at the same time and you have 100 scans running walking over 30k files?

Other issues are, that you might have a file Foo and fOO. The one which will be opened, is the first file the directory scan finds. So it depends on how the directory is walked and that not in alphabetical order ...

Comment 11 Steve Whitehouse 2023-08-22 09:15:51 UTC
So as far as I can tell there are really two issues here: ideally the kernel doesn't need to know about char sets, since it just treats the filename as a string and doesn't need to parse it, and also it seems as if case folding unicode is non-trivial.

Without filesystem support it seems that the main concern is mixing of Samba and non-Samba I/O (i.e. local modifications) in the same directory tree. The penalty is paid in terms of performance and some odd access semantics in case of filenames that differ only by the case of one or more letters.

If we could restrict the access to the tree to only Samba, then a simple fix would be to do the case folding (say, for example, to all lower case) in userspace, and use that folded name as the filename. An xattr could then be used to keep a note of the original filename case for display purposes. That would avoid any performance issues with lookup I think, but does introduce the need to read xattrs on a directory read. It is clearly not a complete solution, but it would probably be an improvement on where we are today.

Comment 12 Martin Schwenke 2023-08-25 02:35:13 UTC
(In reply to Steve Whitehouse from comment #11)
> So as far as I can tell there are really two issues here: ideally the kernel
> doesn't need to know about char sets, since it just treats the filename as a
> string and doesn't need to parse it, and also it seems as if case folding
> unicode is non-trivial.
> 
> Without filesystem support it seems that the main concern is mixing of Samba
> and non-Samba I/O (i.e. local modifications) in the same directory tree. The
> penalty is paid in terms of performance and some odd access semantics in
> case of filenames that differ only by the case of one or more letters.
> 
> If we could restrict the access to the tree to only Samba, then a simple fix
> would be to do the case folding (say, for example, to all lower case) in
> userspace, and use that folded name as the filename. An xattr could then be
> used to keep a note of the original filename case for display purposes. That
> would avoid any performance issues with lookup I think, but does introduce
> the need to read xattrs on a directory read. It is clearly not a complete
> solution, but it would probably be an improvement on where we are today.

Yes, that's a very good summary.

There is even an abandoned Samba merge request
(https://gitlab.com/samba-team/samba/-/merge_requests/2278)
to do this in Samba, storing the original filename in an xattr.  I believe
the reasons for abandoning it were:

* It touches too many places...
* ... so, it is too hard to maintain...
* ... partly because Samba-only exports aren't interesting enough to warrant
  all of those tweaks

There are other methods of accomplishing desired behaviours around
case-insensitivity.  There are Windows applications that continuously
dump large numbers of files into a single directory (e.g. electron microscopes),
resulting in hundreds of thousands (and perhaps even millions) of files in a
single directory.  In this case, since we know unique filenames are generated,
we can use lossy case folding, where the original case is lost.  A native or
NFS application must then also make assumptions about case.  So, that can work,
but a lot of care is required - one slip up and it can break in ugly ways.

However, for general purpose, cross-protocol storage the filesystem is one of
the protocols.  So, if you want case-insensitivity then it has to be in the
filesystem... and when it is there, it is foolproof.

Comment 13 Martin Schwenke 2023-08-25 03:05:05 UTC
Given that nobody else is offering up speed comparisons, here is a naive one, on
my home hardware...

The server hardware is hardly enterprise grade:

  Intel 10th gen i3 NUC
  8GB RAM
  Samsung 8TB 870 QVO 2.5in SATA SSD

but it was also idle during the test.

The Linux client is mounting the Samba share via the cifs filesystem.

# time touch /mnt/test/foobar00999999

real	0m0.126s
user	0m0.004s
sys	0m0.001s
# time touch /mnt/test2/foobar00999999

real	0m0.076s
user	0m0.001s
sys	0m0.005s

The first directory (/mnt/test) contains 100K files, foobar00000000-foobar00100000).
So, each comparison compares 9 bytes of the filename with each file in the directory.

The second directory (/mnt/test2) is empty, which approximates being able to directly
compare filenames... or not bother, which is roughly what would happen if the underlying
filesystem was case-insensitive.

The numbers vary, but the above look to be eerily repeatable.  There are occasional
outliers, possibly due to client-side caching or similar.

So, in a directory with 100K files, the "case insensitive scanning" in Samba causes an
extra ~0.05s to create a file.

Reducing the filename being created to "a" still seems to result in an extra 0.035s,
so common prefixes don't cost an order of magnitude.  The readdir() time is probably
the limiting factor.

So, if we extend this to a directory containing 1M files, it should cost ~0.5s to do the
"case insensitive scan" on file creation.

Comment 14 Dave Chinner 2023-08-26 22:20:44 UTC
There's no question that the kernel CI code is faster than trying to do it userspace in Samba.

The question is whether a -single application- is worth the initial and ongoing effort and resources to develop and support that functionality in RHEL and upstream in multiple filesystems.

i.e. supporting unicode CI is not a technical decision, it's a business decision. We can't keep expanding the feature support matrix without having a correspnding increase in engineering and QE resources to support that feature matrix.

> However, for general purpose, cross-protocol storage the filesystem is one of
> the protocols.  So, if you want case-insensitivity then it has to be in the
> filesystem... and when it is there, it is foolproof.

CI in the filesystem is not foolproof. It's just as complex as doing it in userspace, along with the additional problems of not having any of the userspace context or support and having to define and implement a robust on-disk directory structure format for the CI infrastructure. The former makes unicode case folding a nasty problem (which of the 5 models is the least worst?) and _none_ of them are quite right in all situations and we have to ensure the unicode parser has no exploitable bugs in it. The latter we've demonstrated that we screw up the on-disk format in non-obvious ways, both on XFS and ext4 (e.g. the CI name hashing having endian encoding issues).

So any argument that "it can only been done right in the filesystem" really need to be taken with a grain of salt. It just moves teh complexity from userspace to the kernel - we can still screw it up in exciting and new ways but now we are doing it in ring 0 instead of in userspace...

-Dave.

Comment 15 Dave Chinner 2023-08-26 22:41:03 UTC
(In reply to Martin Schwenke from comment #13)
> The first directory (/mnt/test) contains 100K files,
> foobar00000000-foobar00100000).
> So, each comparison compares 9 bytes of the filename with each file in the
> directory.

And there's the problem. Filesystems don't scan directories names to find a match. They have a tree of hashes that they search for matches. Only once hash matches are found do they then do a string compare. For CI filesystems, the name is case converted to lower case, then hashed. On a hash match, we do an exact name string check, followed by a case-folded string check. The hashes in XFS are stored in a btree and the hash key points to the dirent that contains the original name, so searches in large directories scale very well.

IOWs, the idea that you have to exhaustively scan every filename in the directory to find a CI match is 1990s thinking. All that is needed is a scalable search tree with a CI key and CI string matches when a key match is found. Yes, the app needs to read the directory to populate the cache first, but then it can be watched via fanotify events to capture create/unlink/link/rename operations on the dir and updated with the delta changes that are occurring in the directory from outside the application.

So, in reality, I'm not convinced that "CI name matching can only be done fast in the kernel" is actually true these days...

-Dave.

Comment 16 Martin Schwenke 2023-08-27 23:53:57 UTC
Hi Dave.  I really appreciate you spending time looking at this. Thanks!

I understand that this is a business decision that might not go the way I want it to...

I agree that "CI name matching can only be done fast in the kernel" probably isn't true.  The problem is that correctly implementing all of the pieces in userspace seems very fragile...

We could maintain a search-tree/cache in Samba and update it via fanotify.  fanotify is a bit disappointing because you can't recursively monitor a directory, so need to mark newly create subdirectories and then scan them anyway, to cover the inherent race... and recurse on any subdirectories.  Still, that's not terrible and it can be done. The only alternative seems to be monitoring whole filesystems and then doing filtering on exported directories, since even monitoring whole mounts doesn't appear to support create and move.  However, monitoring things we don't care about might result in a low signal to noise ratio.  With fanotify there is are also issues around correctly configuring the maximum queue length and the maximum number of marks - they seem like we would be playing a game of whack-a-mole while users potentially lose data between whacks.  Monitoring whole filesystems would work around the limit on the number of marks, but signal:noise.  When looking at fanotify I always find myself sighing a lot... though I do understand that the magic I really want from its API can only be provided by implementing all the mitigations in the kernel, which isn't realistic.

Then, if we also want to access those files via NFS, we need the search-tree/cache in the NFS server too.  If that's NFS-Ganesha then that could be something we can do.  However, if it is kernel then we're back to the kernel.

One way of doing this across applications might be FUSE.  I've never looked at FUSE, so I'm not aware of performance and other implications.  I'd be happy to take advice about whether to invest any time there.

Thanks again...

--martin

Comment 17 RHEL Program Management 2023-09-23 12:03:56 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 18 RHEL Program Management 2023-09-23 12:06:31 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Comment 19 Red Hat Bugzilla 2024-01-22 04:25:36 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days