The expansion code for the directory exhash table currently works like this:
1. vmalloc a chunk of memory
2. read the current hash table into it
3. write out a new version doubling the size
It involved being able to allocate a lot of memory and potentially also
increasing latency by doing all the reading and then all the writing. A better
algorithm looks like this:
1. Expand the directory's data blocks to the newly required size allocating
blocks as required
2. Use two "pointers", one starts at the new end of the file, the other at the
original end of the file.
3. Copy the data from original to new moving towards the start of the file, the
"new" pointer will catch up with the "original" pointer only on the final copy
4. Update the i_size to indicate that the operation is complete
Potantially we might be able to increase the maximum size of the hash table
since the limiting factor appeared to be set only be the maximum size of memory
that it was reasonable to allocate using vmalloc. If this is done then we need
to check that it will remain backward compatible, but it would seem reasonable
that this should be the case.
Move priority to low as this is really a performance thing rather than correctness.
nothing to triage
There are places in the dir code where we are using GFP_NOFAIL for allocations which might be larger in size than order 0. In the latest upstream kernels this is causing warnings to appear as this is not allowed.
We will need to review the memory allocations in the dir code to ensure that this doesn't happen, and we should look at fixing this bug at the same time as its all related.
Created attachment 348275 [details]
dmesg showing the issue
The full stack and warning are shown at the end of this attachment.
This was done upstream some time ago.