Bug 520556
| Summary: | d_delete() and d_invalidate() can simultaneously change the same dentry->d_flags possibly causing a panic | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Harshula Jayasuriya <harshula> | ||||
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 5.2 | CC: | eparis, esandeen, jlayton, rwheeler, tao | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 5.5 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2009-09-18 03:06:02 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 359347 [details]
Backported patch
I'm dup'ing this against bz 499019. Even though the problem is different, the same patch fixes both. *** This bug has been marked as a duplicate of bug 499019 *** |
The following information provided by Fujitsu: Description of Problem: If we delete files and get files statistics concurrently, system may work incorrectly (Ex. a panic happens). For instance, there is a conflict between d_delete() and d_invalidate(). (These two function calls can operate same dentry->d_flags at the same time.) On the one hand, while do_lookup() is being executed to find a certain dentry: Then do_lookup() calls do_revalidate() and then do_revalidate() calls d_invalidate(). And d_invalidate() calls __d_drop(). On the other hand, while unlink() is being executed to delete its dentry: Then unlink() calls do_unlinkat() and then do_unlinkat() calls vfs_unlink(). And vfs_unlink() calls d_delete(). If its dentry->d_count.counter is 1, d_delete() cuts off its connected inode (dentry_iput()) and finally d_delete() changes its dentry->d_flags without spinlocks. ( dentry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED; ) ...(1) NOTE: the spinlocks of dentry->d_lock and dcache_lock are released in the dentry_iput(). At the same moment, __d_drop() which is called by d_invalidate() changes its dentry->d_flags with spinlocks. ( dentry->d_flags |= DCACHE_UNHASHED; ) ...(2) Therefore its dentry->d_flags is broken by d_delete() because (1) and (2) can run concurrently. do_lookup | sys_unlink -> do_revalidate | -> do_unlinkat -> d_invalidate | -> vfs_unlink | -> d_delete | spin_lock(&dcache_lock) | spin_lock(&dentry->d_lock) spin_lock(&dcache_lock) | . | if (atomic_read(&dentry->d_count) == 1) { . | -> dentry_iput . | spin_unlock(&dentry->d_lock); . | spin_unlock(&dcache_lock); spin_lock(&dentry->d_lock) | -> __d_drop | -> if (!(dentry->d_flags & | DCACHE_UNHASHED)) { | dentry->d_flags |= DCACHE_UNHASHED| -> dentry->d_flags &= | ~DCACHE_INOTIFY_PARENT_WATCHED This bug is fixed in linux-2.6.25. Report: http://lkml.org/lkml/2007/9/7/93 Patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0d71bd5993b630a989d15adc2562a9ffe41cd26d Version-Release number of selected component: Red Hat Enterprise Linux Version Number: 5 Release Number: 2 Architecture: x86_64 Kernel Version: 2.6.18-92.el5 How reproducible: Sometimes. Step to Reproduce: 1) Create many files on a directory (Ex. over 10000 files) (# cd <dir>; for ((i=0;i<10000;i++));do touch $i; done; cd) 2) Create a lot of processes which delete its directory and run them concurrently (# for ((i=0;i<10;i++));do find <dir> -type f -exec rm {} \; & done) NOTE: The system needs multiple CPUs. Many CPUs are welcome. Actual Results: Panic. Expected Results: Not panic. Additional Info: * The customer has provided a kernel dump that suggests that the aforementioned race condition could have been the culprit. * I provided the customer with a patched kernel to see if it stops the panic, still awaiting results. I will update this bz as soon as I receive feedback. * Upstream bug report: http://bugzilla.kernel.org/show_bug.cgi?id=8938