The following information provided by Fujitsu: Description of Problem: If we delete files and get files statistics concurrently, system may work incorrectly (Ex. a panic happens). For instance, there is a conflict between d_delete() and d_invalidate(). (These two function calls can operate same dentry->d_flags at the same time.) On the one hand, while do_lookup() is being executed to find a certain dentry: Then do_lookup() calls do_revalidate() and then do_revalidate() calls d_invalidate(). And d_invalidate() calls __d_drop(). On the other hand, while unlink() is being executed to delete its dentry: Then unlink() calls do_unlinkat() and then do_unlinkat() calls vfs_unlink(). And vfs_unlink() calls d_delete(). If its dentry->d_count.counter is 1, d_delete() cuts off its connected inode (dentry_iput()) and finally d_delete() changes its dentry->d_flags without spinlocks. ( dentry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED; ) ...(1) NOTE: the spinlocks of dentry->d_lock and dcache_lock are released in the dentry_iput(). At the same moment, __d_drop() which is called by d_invalidate() changes its dentry->d_flags with spinlocks. ( dentry->d_flags |= DCACHE_UNHASHED; ) ...(2) Therefore its dentry->d_flags is broken by d_delete() because (1) and (2) can run concurrently. do_lookup | sys_unlink -> do_revalidate | -> do_unlinkat -> d_invalidate | -> vfs_unlink | -> d_delete | spin_lock(&dcache_lock) | spin_lock(&dentry->d_lock) spin_lock(&dcache_lock) | . | if (atomic_read(&dentry->d_count) == 1) { . | -> dentry_iput . | spin_unlock(&dentry->d_lock); . | spin_unlock(&dcache_lock); spin_lock(&dentry->d_lock) | -> __d_drop | -> if (!(dentry->d_flags & | DCACHE_UNHASHED)) { | dentry->d_flags |= DCACHE_UNHASHED| -> dentry->d_flags &= | ~DCACHE_INOTIFY_PARENT_WATCHED This bug is fixed in linux-2.6.25. Report: http://lkml.org/lkml/2007/9/7/93 Patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0d71bd5993b630a989d15adc2562a9ffe41cd26d Version-Release number of selected component: Red Hat Enterprise Linux Version Number: 5 Release Number: 2 Architecture: x86_64 Kernel Version: 2.6.18-92.el5 How reproducible: Sometimes. Step to Reproduce: 1) Create many files on a directory (Ex. over 10000 files) (# cd <dir>; for ((i=0;i<10000;i++));do touch $i; done; cd) 2) Create a lot of processes which delete its directory and run them concurrently (# for ((i=0;i<10;i++));do find <dir> -type f -exec rm {} \; & done) NOTE: The system needs multiple CPUs. Many CPUs are welcome. Actual Results: Panic. Expected Results: Not panic. Additional Info: * The customer has provided a kernel dump that suggests that the aforementioned race condition could have been the culprit. * I provided the customer with a patched kernel to see if it stops the panic, still awaiting results. I will update this bz as soon as I receive feedback. * Upstream bug report: http://bugzilla.kernel.org/show_bug.cgi?id=8938
Created attachment 359347 [details] Backported patch
See BZ 499019, same patch, different reason.
I'm dup'ing this against bz 499019. Even though the problem is different, the same patch fixes both. *** This bug has been marked as a duplicate of bug 499019 ***