Kernel : vmlinux-2.4.21-52.ELsmp From the vmcore of a hung system and from sysrq-Ts on systems showing hung unmount processes, the hang appears to happen in the following part of the code in void __prune_dcache(int count, struct super_block *sb) while (skip > 0 && tmp != &dentry_unused && list_entry(tmp, struct dentry, d_lru)->d_sb != sb) { skip--; tmp = tmp->prev; } Using the vmcore from the hung problem, the problem appears to have happened because of a large number of unused dentries on this system. The system has 39 different mounts with 3 mounts having more then 100,000 unused dentries. The code goes through the list of the unused dentries and searches for dentries which belong to the superblock. Since the number of dentries listed are large, it is stuck in the loop for a long time holding the &dcache_lock spinlock. This bit of the code was added in the update * Wed Apr 18 2007 Ernie Petrides <petrides> kernel-2.4.21-49.EL - fix dput() crash regression caused in -47.5.EL (Eric Sandeen) Which was meant to fix a regression caused by the update * Wed Feb 7 2007 Ernie Petrides <petrides> kernel-2.4.21-47.5.EL - avoid more races between unmount and dcache pruning (Eric Sandeen)
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
So, as I understand it, this is not a permanent hang, but rather a long delay?
yes. The system is unresponsive when this happens. In one case, it was unresponsive for about 2 hours. In the vmcore we have, the 2 cpu machines has 2 umount processes running on the 2 cpus. One of these are holding the dcache_lock and is spinning in the while loop and the other umount process is waiting for the spinlock in the same __prune_dcache function.
Is this only seen when 2 filesystems are simultaneously unmounting? In theory at unmount time the dentries we wish to free should be at the end of the unused list; previously we had assumed that they *were* the *only* dentries at the end of the unused list, but no locking assured this, so we could wind up missing some dentries, and oopsing when they were not properly freed. IIRC simultaneous unmounts could make this more likely. The loop mentioned above simply makes sure we skip over dentries at the end of the list which are *not* for the unmounting fs in question; in general I would not expect that there would be too many to skip. Perhaps, though, 2 dueling unmounts could get into a ping-pong match where neither can complete in one pass? Does serializing the unmounts rather than doing them in parallel avoid the issue? You're quite certain that this did not happen prior to the mentioned updates? Is a vmcore taken during the long delay available? I'll refamiliarize myself with this codepath in the RHEL3 kernel. Thanks, -Eric
Created attachment 284251 [details] messages file showing multiple SysRq T Check the umount process with pid 3398
In the IT it was mentioned that there was a large number of unused dentries on the system. Was this also true for the non-problematic older kernel? Sachin has dumped out the remaining unused dentry list from the core. If we look for the 2 unmounting filesystems in the unused list: -bash-3.00$ grep "102749bc000\|103db655800" dentry.sb | uniq -c 7 d_sb = 0x102749bc000, 113542 d_sb = 0x103db655800, 28505 d_sb = 0x102749bc000, we see that the 2 unmounting filesystems are indeed interlaced on the unused list. The process which is currently in the "find my dentries" loop is looking for sb 0x103db655800, which is in the "middle" of the list (excluding the other various sbs) and there are 28k entries for the "other" (blocked) umount at the very tail which it must work through while it looks for "its" dentries. Right now what I think is happening is that __prune_dcache(count, sb) will look for "count" dentries on the unused list which match the given sb. In the process of searching, it will skip over up to the remaining "count" dentries which belong to other superblocks. However, when it finds a dentry to clear and calls prune_one_dentry, it drops & retakes the dcache_lock. I think this gives a window to the other umount process to add more of its dentries to the end of the list via select_parent. The for (;;) loop in _prune_dcache then starts over with the last entry on the list, and may have another big handful of entries to skip over. I think it is this ping-ponging contention that is causing the inefficiency. However, in the various sysrq-t outputs, I would then expect to see the 2 umount threads alternating, rather than one staying stuck. What I don't quite understand is that without this skip-loop patch, prune_dcache will happily free dentries for *other* sb's until it reaches it's "count" goal and proceed with umount, leaving busy dentries and busy inodes to be found when the umount completes. I'm not sure how this wasn't hit on the older kernels.
re: comment #6, strange, I only see 1 umount process. how far apart were the sysrq-t's taken?
Eric, The SysRq Ts were taken a few seconds apart. I have requested for test results from the 47.0.1 kernel. The customer has been using a -40 kernel which did not show the problem. When the problem is seen, the umounts generally take a few minutes to complete. Sachin Prabhu This event sent from IssueTracker by sprabhu issue 135397
Eric, The customer has not tested with the -47 kernel. However he has the -40 kernel on several of his production machine where he hasn't seen this issue. Sachin Prabhu This event sent from IssueTracker by sprabhu issue 135397
Created attachment 293756 [details] a alternate potential fix. Tumeya, Since you are having problems with my original fix please try this one and see if it gives you better results. My RHEL3 box seems to have gone off in the weeds so same disclaimer applies to this patch, I haven't compiled it or anything so if something goes wrong let me know exactly what it is so I can fix it.
(In reply to comment #20) > Created an attachment (id=293616) [edit] > correct potential fix. > > Ok I built and did some tests with this patch and it didn't seem to make > anything blow up, please have your customers test this. This patch makes my system unhappy. Hung while populating a directory with files full of random data ('for i in `seq 1 4096`; do dd if=/dev/urandom of=$i bs=1M count=1; done'). Also hung after rebooting, messing around on the filesystem, and trying to unmount it. Trying the patch in Comment #23 shortly.
Yeah this needs to be tested with heavy use, as this is also the same path for reclaiming memory under memory pressure. I wouldn't feel comfortable including this patch unless it had gone through a week of good hard testing.
racer tests finished okay looks like on all hosts, but only ran for about an hour and a half. not sure what's needed to make the test run longer. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by dwa issue 160842
I don't remember for sure how I made it run longer last time; maybe just put it in a loop. It sometimes took several (>10) hours to eventually trip. Was the test done on a multi (2 or hopefully 4)-cpu machine? Thanks, -Eric
(In reply to comment #35) > I don't remember for sure how I made it run longer last time; maybe just put it > in a loop. It sometimes took several (>10) hours to eventually trip. Was the > test done on a multi (2 or hopefully 4)-cpu machine? It was run on a few different systems, both i386 and x86_64. at least one of them was an 8-way system (4x dual cores). Take a look at the RHTS URLs in Comment #33 for a list of the systems it ran on. I'll ask the rhts guys tomorrow how to either make the test run longer or how to extract that test from RHTS and run it standalone on a couple of systems
I've run the racer test in a loop overnight - looks like it's run about 15 times so far, and the system hasn't fallen over. The system is an x86_64 with 8 cores and 1GB memory.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 294675 [details] patch posted to rhkernel list.
A patch addressing this issue has been included in kernel-2.4.21-54.EL.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0211.html