Hide Forgot
+++ This bug was initially created as a clone of Bug #1371587 +++ Description of problem: When multiple processes are listing the contents of a directory simultaneously, calls to getdents() can take a very long time. This occurs even if the contents of the directory are unchanging. As the number of directory entries increases, and as the number of processes listing the directory concurrently increases, the time required to list a directory can become huge. Version-Release number of selected component (if applicable): RHEL 7 kernels kernel 3.10.0-327.10.1.el7.x86_64 kernel 3.10.0-229.el7.x86_64 RHEL 6 kernels 2.6.32-573.8.1.el6.x86_64 2.6.32-642.el6.x86_64 upstream 4.8.0-rc2+ How reproducible: fairly reliably Steps to Reproduce: start multiple 'ls -fl' commands on an nfs-mounted directory simultaneously simple reproducer attached Actual results: for a directory with 200,000 files, each having a 23-byte filename: 1 process: 13.250018110 20 simultaneously: 53.123712762 20 processes, 1 second delay: 58.065284049 20 processes, 2 second delay: 98.601665008 20 processes, 3 second delay: 96.790224244 20 processes, 4 second delay, 97.487171089 (time from start of first thread to finish of last thread) Expected results: some performance loss due to scalability problems is expected, but the very long listing times are much longer than expected Additional info: Using 30 processes simultaneously listing a directory with 1.8 million 23-byte files (arbitrary, but seen in customer case), I have reproduced a 39-hour hang (after which all 30 processes completed within a second of each other) I have additional results, debugging, and analysis, which I will add/attach parallel_ls reproducer script can take up to 3 arguments, with default values if not provided: # parallel_ls <threads> <path> <delay> threads: # of processes to start simultaneously (default: 2) path: directory to list (default: /mnt/tmp) delay: time to sleep between starting each process (default: no delay) --- Additional comment from Frank Sorenson on 2016-08-30 11:17:31 EDT --- After doing some debugging, I believe that much of the issue may be related to the following (cut down): int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page* page) { ret = nfs_readdir_xdr_to_array(desc, page, inode); if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) { /* Should never happen */ nfs_zap_mapping(inode, inode->i_mapping); } here, every time that we finish with an nfs_readdir, we invalidate all the remaining pages of the directory inode mapping. This includes any pages just filled by another process further along in the directory listing. As the number of processes calling nfs_readdir increases, the processes each invalidate pages, so each is required to make its own READDIR/READDIRPLUS calls. (note: I have not found evidence of the invalidate failing, and the nfs_zap_mapping being called, but it's possible that may occur as well) Also, calling nfs_zap_mapping every time we decide to force readdirplus seems suspicious: void nfs_force_use_readdirplus(struct inode *dir) { if (!list_empty(&NFS_I(dir)->open_files)) { nfs_advise_use_readdirplus(dir); nfs_zap_mapping(dir, dir->i_mapping); Wouldn't something like this make more sense? void nfs_force_use_readdirplus(struct inode *dir) { int status; if (!list_empty(&NFS_I(dir)->open_files)) { if (! test_and_set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags)) nfs_zap_mapping(dir, dir->i_mapping); also, also it seems non-intuitive that 'nfs_use_readdirplus', while returning the status of the NFS_INO_ADVISE_RDPLUS flag, also clears that flag: bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) { if (!nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) return false; if (test_and_clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags)) return true; if (ctx->pos == 0) return true; return false; What if this were to only test the bit: bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) { if (!nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) return false; if (test_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags)) return true; if (ctx->pos == 0) return true; return false; and have nfs_readdir clear the bit at the bottom IFF nfs_force_readdirplus() hasn't been called to set it again since the last time that nfs_readdir was called (at least by this process?)? (obviously requires a bit more logic to get right) --- Additional comment from Benjamin Coddington on 2016-08-30 11:23:18 EDT --- Hi Frank, we've been working on a very similar problem in bug 1325766. I think things should work better with this fix: http://marc.info/?l=linux-nfs&m=147222590709214&w=2
Please note the following items were found in the vmcores for this issue: - the directory was not being modified when the listing was being done, so this is not https://bugzilla.redhat.com/show_bug.cgi?id=1325766 - the directory contained over 1 million entries - multiple processes were listing the directory at the same time - the process listing the directory and holding the directory mutex was waiting on a readdirplus response - there were 43 waiters on the directory mutex - the oldest waiter was almost 48 minutes, and the youngest waiter was 40 minutes
This bug is actually a dup of bug 1325766. The 3 patches for this bug are already in RHEL6 kernels starting with kernel-2.6.32-682.el6 so it is fixed in RHEL6.9 (2.6.32-696.el6) or above. *** This bug has been marked as a duplicate of bug 1325766 ***