Created attachment 1915352 [details] reproducer program Description of problem: When multiple processes perform parallel directory listings, when readdir results are cached, periodically a process listing a cached directory will begin reading from a page containing bogus data. The kernel typically detects this bogus data due to an invalid (very large) entry name length/entry size, outputs a WARNING, and returns EIO to userspace. The directory will continue to (attempt to) parse this invalid data for every readdir of the directory until the cache times out. [3109130.031012] WARNING: CPU: 2 PID: 317313 at fs/fuse/readdir.c:396 fuse_readdir+0x5bb/0x680 [fuse] [3109130.031070] CPU: 2 PID: 317313 Comm: find Kdump: loaded Tainted: GF W OE --------- - - 4.18.0-348.7.1.el8_5.x86_64 #1 [3109130.031072] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [3109130.031081] RIP: 0010:fuse_readdir+0x5bb/0x680 [fuse] [3109130.031100] Call Trace: [3109130.031125] iterate_dir+0x13c/0x190 [3109130.031128] ksys_getdents64+0x9c/0x130 [3109130.031130] ? iterate_dir+0x190/0x190 [3109130.031133] __x64_sys_getdents64+0x16/0x20 [3109130.031138] do_syscall_64+0x5b/0x1a0 [3109130.031142] entry_SYSCALL_64_after_hwframe+0x65/0xca 377 static enum fuse_parse_result fuse_parse_cache(struct fuse_file *ff, 378 void *addr, unsigned int size, 379 struct dir_context *ctx) 380 { 381 unsigned int offset = ff->readdir.cache_off & ~PAGE_MASK; 382 enum fuse_parse_result res = FOUND_NONE; ... 396 if (WARN_ON(dirent->namelen > FUSE_NAME_MAX)) 397 return FOUND_ERR; examining a vmcore or using systemtap, the contents of the page are unrecognizable (beginning with offset 0). namelen will be some large number, such as 100663296 (0x6000000) or 1269538688 (0x4bab9f80). Version-Release number of selected component (if applicable): RHEL kernel versions that support fuse cache_readdir, at least since 8.3; seen with: kernel-4.18.0-240.22.1.el8_3 kernel-4.18.0-348.20.1.el8_5 kernel-4.18.0-372.27.1.el8_6 also reproduced with upstream kernel fuse3-libs versions which support readdir caching: fuse3-libs-2.9.7-16.el8 and newer How reproducible: easy Steps to Reproduce: # dnf install fuse{,-libs}-2.9.7-16.el8.x86_64 fuse3{,-devel,-libs,-common}-3.3.0-16.el8.x86_64 download & install cvmfs rpms from https://cernvm.cern.ch/fs/ https://ecsft.cern.ch/dist/cvmfs/cvmfs-2.9.4/cvmfs-2.9.4-1.el8.x86_64.rpm https://ecsft.cern.ch/dist/cvmfs/cvmfs-2.9.4/cvmfs-fuse3-2.9.4-1.el8.x86_64.rpm http://ecsft.cern.ch/dist/cvmfs/cvmfs-config/cvmfs-config-default-2.0-2.noarch.rpm simple cvmfs setup: /etc/cvmfs/default.local: CVMFS_REPOSITORIES="$((echo oasis.opensciencegrid.org;echo cms.cern.ch;ls /cvmfs)|sort -u|paste -sd ,)" CVMFS_HTTP_PROXY="DIRECT" # and if you need to limit the size of the cached data: # CVMFS_QUOTA_LIMIT=500 # or to move it to another location (from default location of /var/lib/cvmfs) -- it will then complain if the limit is less than 1 GiB # CVMFS_CACHE_BASE=/other/path/cvmfs setup and restart autofs: /etc/auto.master.d/cvmfs.autofs: /cvmfs /etc/auto.cvmfs # systemctl restart autofs.service compile the provided walk_tree.c: # gcc -Wall walk_tree.c -o walk_tree -g start multiple processes crawling a cvmfs filesystem: usage: ./walk_tree <child_threads> <starting path> # ./walk_tree 8 /cvmfs/oasis.opensciencegrid.org tid 5670, child 0: alive tid 5671, child 1: alive tid 5672, child 2: alive tid 5673, child 3: alive tid 5675, child 5: alive tid 5676, child 6: alive tid 5674, child 4: alive tid 5677, child 7: alive tid 5672, child 2: error getting directory entries in '/cvmfs/oasis.opensciencegrid.org/geant4/externals/cmake/v3_9_0/source/cmake-3.9.0/Tests/TryCompile/Inner', inode # 327519: Input/output error tid 5673, child 3: error getting directory entries in '/cvmfs/oasis.opensciencegrid.org/geant4/externals/cmake/v3_9_0/source/cmake-3.9.0/Tests/TryCompile/Inner', inode # 327519: Input/output error tid 5673, child 3: exiting with ERROR tid 5676, child 6: error getting directory entries in '/cvmfs/oasis.opensciencegrid.org/geant4/externals/cmake/v3_9_0/source/cmake-3.9.0/Tests/TryCompile/Inner', inode # 327519: Input/output error ... Actual results: EIO is returned to userspace when listing the directory, kernel outputs WARNING Expected results: no kernel warnings or errors when accessing the filesystem Additional info: Thus far, this has only been reproduced with cvmfs2, despite some attempts to create a stripped-down filesystem to simplify debugging. However, the issue occurs after the kernel does a sanity check on the response from the userspace filesystem and has copied the dirents into cache. Therefore, the problem appears to be entirely within the kernel. The fact that this only occurs when multiple processes are listing the same directories would also suggest that there may be a race in the kernel code which manages the readdir cache.
Can you please provide a crashdump (echo 1 > /proc/sys/kernel/panic_on_warn) and upload it to https://galvatron-x86.cee.redhat.com/
looking at the vmcore: [ 852.595240] WARNING: CPU: 3 PID: 15069 at fs/fuse/readdir.c:396 fuse_readdir+0x5bb/0x680 [fuse] [ 852.595323] Kernel panic - not syncing: panic_on_warn set ... [ 852.595451] CPU: 3 PID: 15069 Comm: walk_tree Kdump: loaded Not tainted 4.18.0-372.31.1.el8_6.x86_64 #1 [ 852.595574] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 852.595708] Call Trace: [ 852.595928] dump_stack+0x41/0x60 [ 852.596126] panic+0xe7/0x2ac [ 852.596281] ? fuse_readdir+0x5bb/0x680 [fuse] [ 852.596500] __warn.cold.14+0x31/0x40 [ 852.596712] ? fuse_readdir+0x5bb/0x680 [fuse] [ 852.596898] ? fuse_readdir+0x5bb/0x680 [fuse] [ 852.597048] report_bug+0xb1/0xd0 [ 852.597176] ? terminate_walk+0x7a/0xe0 [ 852.597319] do_error_trap+0x9e/0xd0 [ 852.597456] do_invalid_op+0x36/0x40 [ 852.597608] ? fuse_readdir+0x5bb/0x680 [fuse] [ 852.597761] invalid_op+0x14/0x20 [ 852.597902] RIP: 0010:fuse_readdir+0x5bb/0x680 [fuse] [ 852.598064] Code: c1 48 39 c2 0f 85 81 00 00 00 48 d1 e9 48 89 8b f0 02 00 00 e9 74 fb ff ff 4d 89 fe 48 8b 5c 24 10 4c 8b 64 24 08 4c 8b 3c 24 <0f> 0b c7 44 24 1c ff ff ff ff e9 5a fe ff ff 4d 89 fe 48 8b 5c 24 [ 852.598417] RSP: 0018:ffffbcd3036abe10 EFLAGS: 00010286 [ 852.598586] RAX: 0000000000000090 RBX: ffff9971d2edaa00 RCX: 000000008949ffff [ 852.598749] RDX: 000000008949ffff RSI: 0000000000000000 RDI: ffff9971d2edacf8 [ 852.598918] RBP: ffffbcd3036abe80 R08: ffffbcd3036abd80 R09: 0000000000000000 [ 852.599083] R10: ffffbcd3036abe90 R11: ffff99718d86b000 R12: ffffe54584361ac0 [ 852.599244] R13: ffff9970b30d99c0 R14: ffffbcd3036abed0 R15: ffff997182ccab00 [ 852.599446] iterate_dir+0x13c/0x190 [ 852.599619] ksys_getdents64+0x9c/0x130 [ 852.599752] ? iterate_dir+0x190/0x190 [ 852.599891] __x64_sys_getdents64+0x16/0x20 [ 852.600031] do_syscall_64+0x5b/0x1a0 [ 852.600173] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 852.600325] RIP: 0033:0x7f3a1857f78d [ 852.600468] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cb 56 2c 00 f7 d8 64 89 01 48 [ 852.600854] RSP: 002b:00007ffdb84e18e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9 [ 852.601026] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a1857f78d [ 852.601192] RDX: 0000000000010000 RSI: 000000000105e930 RDI: 000000000000001b [ 852.601367] RBP: 00007ffdb84e19d0 R08: 0000000000000000 R09: 0000000000b5e680 [ 852.601529] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400c30 [ 852.601713] R13: 00007ffdb84e2780 R14: 0000000000000000 R15: 0000000000000000 PID: 15069 TASK: ffff997241270000 CPU: 3 COMMAND: "walk_tree" [exception RIP: fuse_readdir+0x5bb] RIP: ffffffffc021521b RSP: ffffbcd3036abe10 RFLAGS: 00010286 RAX: 0000000000000090 RBX: ffff9971d2edaa00 RCX: 000000008949ffff RDX: 000000008949ffff RSI: 0000000000000000 RDI: ffff9971d2edacf8 RBP: ffffbcd3036abe80 R8: ffffbcd3036abd80 R9: 0000000000000000 R10: ffffbcd3036abe90 R11: ffff99718d86b000 R12: ffffe54584361ac0 R13: ffff9970b30d99c0 R14: ffffbcd3036abed0 R15: ffff997182ccab00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffffbcd3036abe88] iterate_dir at ffffffff9775593c #11 [ffffbcd3036abec8] ksys_getdents64 at ffffffff977565bc #12 [ffffbcd3036abf30] __x64_sys_getdents64 at ffffffff97756666 #13 [ffffbcd3036abf38] do_syscall_64 at ffffffff9740430b #14 [ffffbcd3036abf50] entry_SYSCALL_64_after_hwframe at ffffffff97e000ad the getdents64() is on fd 27: PID: 15069 TASK: ffff997241270000 CPU: 3 COMMAND: "walk_tree" ROOT: / CWD: /var/tmp/test_fs FD FILE DENTRY INODE TYPE PATH 27 ffff997182ccab00 ffff9971d2ec36c0 ffff9971d2edaa00 DIR /cvmfs/oasis.opensciencegrid.org/geant4/externals/boost/v1_57_0/source/boost_1_57_0/doc/html/boost_asio/reference/windows__basic_stream_handle/get_implementation the WARNING is at: 377 static enum fuse_parse_result fuse_parse_cache(struct fuse_file *ff, 378 void *addr, unsigned int size, 379 struct dir_context *ctx) 380 { 381 unsigned int offset = ff->readdir.cache_off & ~PAGE_MASK; 382 enum fuse_parse_result res = FOUND_NONE; 383 384 WARN_ON(offset >= size); 385 386 for (;;) { 387 struct fuse_dirent *dirent = addr + offset; 388 unsigned int nbytes = size - offset; 389 size_t reclen; 390 391 if (nbytes < FUSE_NAME_OFFSET || !dirent->namelen) 392 break; /usr/src/debug/kernel-4.18.0-372.31.1.el8_6/linux-4.18.0-372.31.1.el8_6.x86_64/fs/fuse/readdir.c: 391 0xffffffffc0214fc3 <fuse_readdir+0x363>: cmp $0x17,%eax 0xffffffffc0214fc6 <fuse_readdir+0x366>: jbe 0xffffffffc02152ab <fuse_readdir+0x64b> ^^^^^ (nbytes < FUSE_NAME_OFFSET) 0xffffffffc0214fcc <fuse_readdir+0x36c>: mov 0x10(%r11),%ecx 0xffffffffc0214fd0 <fuse_readdir+0x370>: movl $0x0,0x1c(%rsp) 0xffffffffc0214fd8 <fuse_readdir+0x378>: test %ecx,%ecx 0xffffffffc0214fda <fuse_readdir+0x37a>: je 0xffffffffc0215084 <fuse_readdir+0x424> ^^^^^ (! dirent->namelen) dirent is in %r11 dirent->namelen is in %ecx RAX: 0000000000000090 RBX: ffff9971d2edaa00 RCX: 000000008949ffff R10: ffffbcd3036abe90 R11: ffff99718d86b000 R12: ffffe54584361ac0 393 394 reclen = FUSE_DIRENT_SIZE(dirent); /* derefs ->namelen */ 395 396 if (WARN_ON(dirent->namelen > FUSE_NAME_MAX)) 397 return FOUND_ERR; /usr/src/debug/kernel-4.18.0-372.31.1.el8_6/linux-4.18.0-372.31.1.el8_6.x86_64/fs/fuse/readdir.c: 396 0xffffffffc0215001 <fuse_readdir+0x3a1>: cmp $0x400,%ecx 0xffffffffc0215007 <fuse_readdir+0x3a7>: ja 0xffffffffc021520a <fuse_readdir+0x5aa> ^^^^^ (dirent->namelen > FUSE_NAME_MAX) crash> struct fuse_dirent.namelen ffff99718d86b000 -d namelen = 2303328255, ... 0xffffffffc021520a <fuse_readdir+0x5aa>: mov %r15,%r14 0xffffffffc021520d <fuse_readdir+0x5ad>: mov 0x10(%rsp),%rbx 0xffffffffc0215212 <fuse_readdir+0x5b2>: mov 0x8(%rsp),%r12 0xffffffffc0215217 <fuse_readdir+0x5b7>: mov (%rsp),%r15 /usr/src/debug/kernel-4.18.0-372.31.1.el8_6/linux-4.18.0-372.31.1.el8_6.x86_64/fs/fuse/readdir.c: 396 0xffffffffc021521b <fuse_readdir+0x5bb>: ud2 crash> struct fuse_dirent ffff99718d86b000 struct fuse_dirent { ino = 0x8b480000441f0fff, off = 0xf95ce8f7894c1053, namelen = 0x8949ffff, type = 0xc08548c7, name = 0xffff99718d86b018 "\017\204\220\376\377\377H\213\005\371\266*" crash> rd ffff99718d86b000 10 ffff99718d86b000: 8b480000441f0fff f95ce8f7894c1053 ...D..H.S.L...\. ffff99718d86b010: c08548c78949ffff 8b48fffffe90840f ..I..H........H. ffff99718d86b020: 38394c002ab6f905 f64100000120840f ...*.L98.. ...A. ffff99718d86b030: 0000cf840f010147 8324558b24438b00 G.........C$.U$. ffff99718d86b040: 000000da840ffff8 000209840ffffa83 ................ crash> inode.i_mapping ffff9971d2edaa00 i_mapping = 0xffff9971d2edab78, crash> address_space.i_pages 0xffff9971d2edab78 -ox struct address_space { [ffff9971d2edab80] struct xarray i_pages; crash> tree -t x -r address_space.i_pages 0xffff9971d2edab78 -s page.index ffffe54584361ac0 index = 0x0, crash> kmem ffffe54584361ac0 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe54584361ac0 10d86b000 ffff9971d2edab78 0 3 57ffffc00000a1 locked,lru,waiters crash> ptov 10d86b000 VIRTUAL PHYSICAL ffff99718d86b000 10d86b000 crash> fuse_inode.rdc ffff9971d2edaa00 rdc = { cached = 0x1, size = 0x90, pos = 0x90, version = 0x2, mtime = { tv_sec = 0x54521895, tv_nsec = 0x0 }, iversion = 0x0, crash> struct file.private_data ffff997182ccab00 private_data = 0xffff9970b30d99c0, crash> fuse_file.open_flags 0xffff9970b30d99c0 open_flags = 0x8, include/uapi/linux/fuse.h #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) #define FOPEN_NONSEEKABLE (1 << 2) #define FOPEN_CACHE_DIR (1 << 3) the inode is also found in this process, which is currently opening that directory: PID: 15075 TASK: ffff9972424b4d80 CPU: 2 COMMAND: "walk_tree" #0 [ffffbcd30167f9a0] __schedule at ffffffff97da1861 #1 [ffffbcd30167fa30] schedule at ffffffff97da1df5 #2 [ffffbcd30167fa40] io_schedule at ffffffff97da21f2 #3 [ffffbcd30167fa50] __lock_page at ffffffff976832ed #4 [ffffbcd30167fad8] invalidate_inode_pages2_range at ffffffff97694e29 #5 [ffffbcd30167fc60] fuse_finish_open at ffffffffc020ce81 [fuse] #6 [ffffbcd30167fc88] fuse_open_common at ffffffffc020d07a [fuse] #7 [ffffbcd30167fcd0] do_dentry_open at ffffffff9773b832 #8 [ffffbcd30167fd00] path_openat at ffffffff977501ee #9 [ffffbcd30167fdd8] do_filp_open at ffffffff97752503 #10 [ffffbcd30167fee0] do_sys_open at ffffffff9773d0a4 #11 [ffffbcd30167ff38] do_syscall_64 at ffffffff9740430b #12 [ffffbcd30167ff50] entry_SYSCALL_64_after_hwframe at ffffffff97e000ad openat() for: FD FILE DENTRY INODE TYPE PATH 25 ffff99718171d000 ffff9971d2e7d780 ffff9971d2edd080 DIR /cvmfs/oasis.opensciencegrid.org/geant4/externals/boost/v1_57_0/source/boost_1_57_0/doc/html/boost_asio/reference/windows__basic_stream_handle crash> filename.name ffff99718242d000 name = 0xffff99718242d020 "get_implementation", symbols in scope at 0xffffffffc020ce7c in 'fuse_finish_open' void fuse_finish_open(struct inode *, struct file *) inode - len 8: * in register $rdi * in register $rbp file - len 8: in register $rbx ff - len 8: in register $r13 fc - len 8: in register $r12 r13 got stored in invalidate_inode_pages2: +RBP: 0xffff9971d2edaa00 << inode +RBX: 0xffff9971824b5500 << file +R12: 0xffff997241eabe00 << fuse_conn +R13: 0xffff9970b30d93c0 << fuse_file crash> struct file.private_data 0xffff9971824b5500 private_data = 0xffff9970b30d93c0, (just making sure) the invalidate_inode_pages2_range() is called from: fs/fuse/file.c 198 void fuse_finish_open(struct inode *inode, struct file *file) 199 { 200 struct fuse_file *ff = file->private_data; 201 struct fuse_conn *fc = get_fuse_conn(inode); 202 203 if (!(ff->open_flags & FOPEN_KEEP_CACHE)) 204 invalidate_inode_pages2(inode->i_mapping); crash> fuse_file.open_flags 0xffff9970b30d93c0 open_flags = 0x8, so FOPEN_CACHE_DIR mm/truncate.c: 780 int invalidate_inode_pages2(struct address_space *mapping) 781 { 782 return invalidate_inode_pages2_range(mapping, 0, -1); 783 } (so the cached pages are being invalidated because FOPEN_KEEP_CACHE isn't set... however, this is a directory, and libfuse says this has no effect with an opendir: /** Can be filled in by open. It signals the kernel that any currently cached file data (ie., data that the filesystem provided the last time the file was open) need not be invalidated. Has no effect when set in other contexts (in particular it does nothing when set by opendir()). */ unsigned int keep_cache : 1; so I believe this means that directories are being cached, but cache is then immediately invalidated on the next opendir I suspect the page we're choking on while reading from cache has just recently been attached to this address_space, but does not actually hold any cached fuse_dirents (hence the bogus data).
Created attachment 1916128 [details] proposed fix (upstream) untested patch against upstream kernel
There might also be an issue with FOPEN_KEEP_CACHE usage. It might be required, despite the misleading comment in libfuse. FOPEN_CACHE_DIR: cache directory contents and use it if available FOPEN_KEEP_CACE: if not set, clear the cache on open They are orthogonal, both need to be used for effective caching. FOPEN_CACHE_DIR without FOPEN_KEEP_CACHE means: reset the cache, but build up another one.
Created attachment 1916749 [details] patch to control file and directory caching separately (In reply to Miklos Szeredi from comment #9) > There might also be an issue with FOPEN_KEEP_CACHE usage. It might be > required, despite the misleading comment in libfuse. I did spot this... How about something like this (untested) patch against upstream
No, I think the existing kernel behavior is okay. FOPEN_KEEP_CACHE means invalidate current cache but continue building the cache (if caching is enabled, which is the default for regular files). This is a perfectly valid concept for directory cache as well. The issue seems to be with the libfuse API documentation, and possibly with cvmfs code.
Comment on attachment 1916749 [details] patch to control file and directory caching separately okay, gotcha
I fail to reproduce this bug by following the steps in #0. Is there something I've missed? I will try it again and try from other data centers. log ``` [root@kvm102 ~]# rpm -qa | grep -E "fuse|cvm" fuse3-libs-3.3.0-16.el8.x86_64 cvmfs-config-default-2.0-2.noarch fuse-common-3.3.0-16.el8.x86_64 fuse3-3.3.0-16.el8.x86_64 cvmfs-2.9.4-1.el8.x86_64 fuse3-devel-3.3.0-16.el8.x86_64 fuse-libs-2.9.7-16.el8.x86_64 fuse-2.9.7-16.el8.x86_64 cvmfs-fuse3-2.9.4-1.el8.x86_64 [root@kvm102 ~]# uname -a Linux kvm102 4.18.0-424.el8.x86_64 #1 SMP Mon Sep 5 20:37:40 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux [root@kvm102 ~]# cat /etc/cvmfs/default.local CVMFS_REPOSITORIES=cms.cern.ch,oasis.opensciencegrid.org CVMFS_HTTP_PROXY=DIRECT [root@kvm102 ~]# cat /etc/auto.master.d/cvmfs.autofs /cvmfs /etc/auto.cvmfs [root@kvm102 ~]# systemctl status autofs.service ● autofs.service - Automounts filesystems on demand Loaded: loaded (/usr/lib/systemd/system/autofs.service; disabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/autofs.service.d └─50-cvmfs.conf Active: active (running) since Tue 2022-10-11 08:31:01 EDT; 2h 18min ago Main PID: 28625 (automount) Tasks: 6 (limit: 14592) Memory: 1.4G CGroup: /system.slice/autofs.service └─28625 /usr/sbin/automount --systemd-service --dont-check-daemon Oct 11 08:50:37 kvm102 cvmfs2[29138]: (oasis.opensciencegrid.org) released nested catalogs Oct 11 09:05:48 kvm102 cvmfs2[29066]: high watermark of pinned files (1500M > 1500M) Oct 11 09:05:48 kvm102 cvmfs2[29075]: (cvmfs-config.cern.ch) released nested catalogs Oct 11 09:05:48 kvm102 cvmfs2[29138]: (oasis.opensciencegrid.org) released nested catalogs Oct 11 09:20:18 kvm102 cvmfs2[29066]: clean up cache until at most 2048000 KB is used Oct 11 09:24:58 kvm102 cvmfs2[29066]: high watermark of pinned files (1529M > 1500M) Oct 11 09:24:58 kvm102 cvmfs2[29075]: (cvmfs-config.cern.ch) released nested catalogs Oct 11 09:24:58 kvm102 cvmfs2[29138]: (oasis.opensciencegrid.org) released nested catalogs Oct 11 09:43:48 kvm102 cvmfs2[29138]: (oasis.opensciencegrid.org) CernVM-FS: unmounted /cvmfs/oasis.opensciencegrid.org (oasis.opensciencegrid.org) Oct 11 09:47:34 kvm102 cvmfs2[29075]: (cvmfs-config.cern.ch) CernVM-FS: unmounted /cvmfs/cvmfs-config.cern.ch (cvmfs-config.cern.ch) [root@kvm102 ~]# ./walk_tree 8 /cvmfs/oasis.opensciencegrid.org tid 28994, child 0: alive tid 28997, child 3: alive tid 28998, child 4: alive tid 28999, child 5: alive tid 28995, child 1: alive tid 29000, child 6: alive tid 28996, child 2: alive tid 29001, child 7: alive tid 28995, child 1: exiting with NO ERROR tid 28997, child 3: exiting with NO ERROR tid 29000, child 6: exiting with NO ERROR tid 28994, child 0: exiting with NO ERROR tid 29001, child 7: exiting with NO ERROR tid 28998, child 4: exiting with NO ERROR tid 28996, child 2: exiting with NO ERROR tid 28999, child 5: exiting with NO ERROR tid 0, child -1: child 0 exited tid 0, child -1: child 1 exited tid 0, child -1: child 2 exited tid 0, child -1: child 3 exited tid 0, child -1: child 4 exited tid 0, child -1: child 5 exited tid 0, child -1: child 6 exited tid 0, child -1: child 7 exited # no warning in kernel log ```
fuse-2.9.7-16.el8 / fuse3-3.3.0-16.el8 should be recent enough to contain the readdir caching code: %changelog * Mon May 30 2022 Pavel Reichl <preichl> - 2.9.7-16 - Back-port max_pages support, - caching symlinks in kernel page cache, - and in-kernel readdir caching - Fixed rhbz#2080000 and cvmfs 2.9.4-1.el8 is the same version I've been testing with kernel 4.18.0-424.el8.x86_64 should definitely support How long did the walk_tree run? does cvmfs have contents? for example: # ls -al /cvmfs/oasis.opensciencegrid.org total 56 drwxr-xr-x. 22 cvmfs cvmfs 4096 Nov 16 2017 . drwxr-xr-x. 4 cvmfs cvmfs 4096 Oct 3 2018 accre drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 atlas drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 auger drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 cmssoft drwxr-xr-x. 5 cvmfs cvmfs 1024 Nov 16 2017 csiu -rw-r--r--. 1 cvmfs cvmfs 511 Jun 14 2021 .cvmfsdirtab drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 enmr drwxrwxr-x. 2 cvmfs cvmfs 4096 Jan 17 2016 fermilab drwxrwxr-x. 5 cvmfs cvmfs 1024 Apr 4 2017 geant4 drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 glow drwxr-xr-x. 19 cvmfs cvmfs 4096 May 3 19:51 gluex drwxrwxr-x. 6 cvmfs cvmfs 1024 Nov 16 2017 ilc drwxr-xr-x. 7 cvmfs cvmfs 4096 Dec 21 2020 jlab drwxrwxr-x. 6 cvmfs cvmfs 4096 Mar 30 2020 ligo drwxr-xr-x. 10 cvmfs cvmfs 4096 Oct 7 10:24 mis drwxr-xr-x. 4 cvmfs cvmfs 1024 Nov 17 2017 nanohub drwxrwxr-x. 2 cvmfs cvmfs 1024 Nov 17 2017 nova drwxrwxr-x. 8 cvmfs cvmfs 4096 Jun 23 2020 osg drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 osg-software drwxr-xr-x. 17 cvmfs cvmfs 1024 Nov 17 2017 sbgrid
(In reply to Frank Sorenson from comment #16) > fuse-2.9.7-16.el8 / fuse3-3.3.0-16.el8 should be recent enough to contain > the readdir caching code: > > > %changelog > * Mon May 30 2022 Pavel Reichl <preichl> - 2.9.7-16 > - Back-port max_pages support, > - caching symlinks in kernel page cache, > - and in-kernel readdir caching > - Fixed rhbz#2080000 > > > and cvmfs 2.9.4-1.el8 is the same version I've been testing with > > kernel 4.18.0-424.el8.x86_64 should definitely support > > > > > How long did the walk_tree run? It took 70 minutes for a single run. > > does cvmfs have contents? for example: > > # ls -al /cvmfs/oasis.opensciencegrid.org > total 56 > drwxr-xr-x. 22 cvmfs cvmfs 4096 Nov 16 2017 . > drwxr-xr-x. 4 cvmfs cvmfs 4096 Oct 3 2018 accre > drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 atlas > drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 auger > drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 cmssoft > drwxr-xr-x. 5 cvmfs cvmfs 1024 Nov 16 2017 csiu > -rw-r--r--. 1 cvmfs cvmfs 511 Jun 14 2021 .cvmfsdirtab > drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 enmr > drwxrwxr-x. 2 cvmfs cvmfs 4096 Jan 17 2016 fermilab > drwxrwxr-x. 5 cvmfs cvmfs 1024 Apr 4 2017 geant4 > drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 glow > drwxr-xr-x. 19 cvmfs cvmfs 4096 May 3 19:51 gluex > drwxrwxr-x. 6 cvmfs cvmfs 1024 Nov 16 2017 ilc > drwxr-xr-x. 7 cvmfs cvmfs 4096 Dec 21 2020 jlab > drwxrwxr-x. 6 cvmfs cvmfs 4096 Mar 30 2020 ligo > drwxr-xr-x. 10 cvmfs cvmfs 4096 Oct 7 10:24 mis > drwxr-xr-x. 4 cvmfs cvmfs 1024 Nov 17 2017 nanohub > drwxrwxr-x. 2 cvmfs cvmfs 1024 Nov 17 2017 nova > drwxrwxr-x. 8 cvmfs cvmfs 4096 Jun 23 2020 osg > drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 osg-software > drwxr-xr-x. 17 cvmfs cvmfs 1024 Nov 17 2017 sbgrid Yes. During the `walk_tree` run, my `ls` output: ``` [root@kvm102 ~]# ls -al /cvmfs/oasis.opensciencegrid.org total 56 drwxr-xr-x. 22 cvmfs cvmfs 4096 Nov 16 2017 . -rw-r--r--. 1 cvmfs cvmfs 511 Jun 14 2021 .cvmfsdirtab drwxr-xr-x. 4 cvmfs cvmfs 4096 Oct 3 2018 accre drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 atlas drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 auger drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 cmssoft drwxr-xr-x. 5 cvmfs cvmfs 1024 Nov 16 2017 csiu drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 enmr drwxrwxr-x. 2 cvmfs cvmfs 4096 Jan 17 2016 fermilab drwxrwxr-x. 5 cvmfs cvmfs 1024 Apr 4 2017 geant4 drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 16 2017 glow drwxr-xr-x. 19 cvmfs cvmfs 4096 May 3 20:51 gluex drwxrwxr-x. 6 cvmfs cvmfs 1024 Nov 16 2017 ilc drwxr-xr-x. 7 cvmfs cvmfs 4096 Dec 21 2020 jlab drwxrwxr-x. 6 cvmfs cvmfs 4096 Mar 30 2020 ligo drwxr-xr-x. 10 cvmfs cvmfs 4096 Oct 7 11:24 mis drwxr-xr-x. 4 cvmfs cvmfs 1024 Nov 17 2017 nanohub drwxrwxr-x. 2 cvmfs cvmfs 1024 Nov 17 2017 nova drwxrwxr-x. 8 cvmfs cvmfs 4096 Jun 23 2020 osg drwxr-xr-x. 2 cvmfs cvmfs 1024 Nov 16 2017 osg-software drwxr-xr-x. 17 cvmfs cvmfs 1024 Nov 17 2017 sbgrid drwxrwxr-x. 3 cvmfs cvmfs 1024 Nov 17 2017 snoplussnolabca ``` I have tried 1) Randomize the number of the child threads from 1-16 (my system is a 4 vCPU KVM guest) 2) Run the test from the RDU2 data center (my original test was done from a PEK2 system) Both of these attempts failed.
Created attachment 1918546 [details] proposed fix (v2) Attaching updated patch.
Created attachment 1919090 [details] proposed patch (v3) Updated patch
Since we have qa_ack+ and devel_ack+, and the BZ status is ASSIGNED. I'm setting the ITR to 8.8. Developer please set the DTM when it's ready.
TEST PASS. Unable to reproduce the bug with the reproducer. Verified by running regression tests. No regression was found in the tests. Reproduced with kernel-4.18.0-439.el8. Link to Beaker jobs: https://url.corp.redhat.com/bz2131391-reproduce Verified with kernel-4.18.0-437.el8.mr3741_221114_1537.g9461. Link to Beaker jobs: https://url.corp.redhat.com/bz2131391-verify
TEST PASS. Unable to reproduce the bug with the reproducer. Verified by running regression tests. No regression was found in the tests. Reproduced with kernel-4.18.0-439.el8. Link to Beaker jobs: https://url.corp.redhat.com/bz2131391-reproduce Verified with kernel-4.18.0-441.el8. Link to Beaker jobs: https://url.corp.redhat.com/bz2131391-final-verify
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2951