Bug 1657200

Summary: fill_dir API runs out of memory when creating 10^6 files
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, ptoscano, steved, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
debug log none

Description Richard W.M. Jones 2018-12-07 12:46:37 UTC
Created attachment 1512483 [details]
debug log

Description of problem:

The following one-liner creates a filesystem with a directory
and creates a million files in that directory.  It runs out of
memory during the test on some machines.

$ guestfish -N fs:ext4:50G -m /dev/sda1 mkdir /dir : fill-dir /dir 1000000

Add -vx flags to show debugging output.

This seems to be something to do with the Linux kernel rather than
libguestfs itself.  However I didn't identify the exact cause.

The debug log from a failed run is attached.

Version-Release number of selected component (if applicable):

libguestfs-1.39.11-1.fc30.x86_64

Fails with:
  kernel-4.20.0-0.rc5.git2.1.fc30.x86_64 compiled with nodebug

Works with:
  kernel-4.18.18-300.fc29.x86_64

How reproducible:

100%

Steps to Reproduce:
1. See above.

Additional information:

I tried using different amounts of appliance memory by setting the
LIBGUESTFS_MEMSIZE environment variable before running the test:

LIBGUESTFS_MEMSIZE (megabytes)       result
---------------------------------------------------
(default = 500)                      fails
768                                  fails
1024                                 fails
2048                                 works
4096                                 works

Note all of the above were tested with kernel 4.20.0-0.rc5.git2.1.fc30

This is basically what this upstream test does, so this test is failing:
https://github.com/libguestfs/libguestfs/blob/master/tests/bigdirs/test-big-dirs.pl

Comment 1 Richard W.M. Jones 2018-12-07 16:51:09 UTC
Dan noticed that most of the memory is reclaimable:

[  218.957865]  slab_reclaimable:109662 slab_unreclaimable:5656

and so the oom-killer really ought not to run on userspace.