Bug 413731 - RHEL3: System hangs at unmount
Summary: RHEL3: System hangs at unmount
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.9
Hardware: All
OS: Linux
high
high
Target Milestone: rc
Assignee: Josef Bacik
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-06 10:44 UTC by Sachin Prabhu
Modified: 2018-11-14 09:45 UTC (History)
6 users (show)

Fixed In Version: RHSA-2008-0211
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-07 07:04:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
messages file showing multiple SysRq T (336.29 KB, application/octet-stream)
2007-12-11 15:43 UTC, Sachin Prabhu
no flags Details
patch posted to rhkernel list. (3.31 KB, patch)
2008-02-12 16:40 UTC, Josef Bacik
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0211 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2008-05-07 07:03:52 UTC

Description Sachin Prabhu 2007-12-06 10:44:49 UTC
Kernel : vmlinux-2.4.21-52.ELsmp

From the vmcore of a hung system and from sysrq-Ts on systems showing hung
unmount processes, the hang appears to happen in the following part of the code
in void __prune_dcache(int count, struct super_block *sb)

                     while (skip > 0 &&
                          tmp != &dentry_unused &&
                          list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
                              skip--;
                              tmp = tmp->prev;
                      }

Using the vmcore from the hung problem, the problem appears to have happened
because of a large number of unused dentries on this system. The system has 39 
different mounts with 3 mounts having more then 100,000 unused dentries.


The code goes through the list of the unused dentries and searches for dentries
which belong to the superblock. Since the number of dentries listed are large,
it is stuck in the loop for a long time holding the &dcache_lock spinlock.


This bit of the code was added in the update
* Wed Apr 18 2007 Ernie Petrides <petrides> kernel-2.4.21-49.EL
- fix dput() crash regression caused in -47.5.EL (Eric Sandeen)

Which was meant to fix a regression caused by the update
* Wed Feb  7 2007 Ernie Petrides <petrides> kernel-2.4.21-47.5.EL
- avoid more races between unmount and dcache pruning (Eric Sandeen)

Comment 1 RHEL Program Management 2007-12-06 11:45:39 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 2 Eric Sandeen 2007-12-06 13:27:07 UTC
So, as I understand it, this is not a permanent hang, but rather a long delay?

Comment 3 Sachin Prabhu 2007-12-06 13:42:55 UTC
yes. The system is unresponsive when this happens. In one case, it was
unresponsive for about 2 hours.

In the vmcore we have, the 2 cpu machines has 2 umount processes running on the
2 cpus. One of these are holding the dcache_lock and is spinning in the while
loop and the other umount process is waiting for the spinlock in the same
__prune_dcache function.



Comment 4 Eric Sandeen 2007-12-06 16:14:54 UTC
Is this only seen when 2 filesystems are simultaneously unmounting?

In theory at unmount time the dentries we wish to free should be at the end of
the unused list; previously we had assumed that they *were* the *only* dentries
at the end of the unused list, but no locking assured  this, so we could wind up
missing some dentries, and oopsing when they were not properly freed.  IIRC
simultaneous unmounts could make this more likely.

The loop mentioned above simply makes sure we skip over dentries at the end of
the list which are *not* for the unmounting fs in question; in general I would
not expect that there would be too many to skip.  Perhaps, though, 2 dueling
unmounts could get into a ping-pong match where neither can complete in one pass?

Does serializing the unmounts rather than doing them in parallel avoid the
issue?  You're quite certain that this did not happen prior to the mentioned
updates?

Is a vmcore taken during the long delay available?

I'll refamiliarize myself with this codepath in the RHEL3 kernel.

Thanks,
-Eric



Comment 6 Sachin Prabhu 2007-12-11 15:43:52 UTC
Created attachment 284251 [details]
messages file showing multiple SysRq T

Check the umount process with pid 3398

Comment 7 Eric Sandeen 2007-12-11 16:49:29 UTC
In the IT it was mentioned that there was a large number of unused dentries on
the system.  Was this also true for the non-problematic older kernel?

Sachin has dumped out the remaining unused dentry list from the core.  If we
look for the 2 unmounting filesystems in the unused list:

 -bash-3.00$ grep "102749bc000\|103db655800" dentry.sb | uniq -c
      7   d_sb = 0x102749bc000, 
 113542   d_sb = 0x103db655800, 
  28505   d_sb = 0x102749bc000, 

we see that the 2 unmounting filesystems are indeed interlaced on the unused list.

The process which is currently in the "find my dentries" loop is looking for sb
0x103db655800, which is in the "middle" of the list (excluding the other various
sbs) and there are 28k entries for the "other" (blocked) umount at the very tail
which it must work through while it looks for "its" dentries.

Right now what I think is happening is that __prune_dcache(count, sb) will look
for "count" dentries on the unused list which match the given sb.  In the
process of searching, it will skip over up to the remaining "count" dentries
which belong to other superblocks.  However, when it finds a dentry to clear and
calls prune_one_dentry, it drops & retakes the dcache_lock.  I think this gives
 a window to the other umount process to add more of its dentries to the end of
the list via select_parent.  The for (;;) loop in _prune_dcache then starts over
with the last entry on the list, and may have another big handful of entries to
skip over.  I think it is this ping-ponging contention that is causing the
inefficiency.  However, in the various sysrq-t outputs, I would then expect to
see the 2 umount threads alternating, rather than one staying stuck.

What I don't quite understand is that without this skip-loop patch, prune_dcache
will happily free dentries for *other* sb's until it reaches it's "count" goal
and proceed with umount, leaving busy dentries and busy inodes to be found when
the umount completes.  I'm not sure how this wasn't hit on the older kernels.

Comment 8 Eric Sandeen 2007-12-11 17:39:20 UTC
re: comment #6, strange, I only see 1 umount  process. how far apart were the
sysrq-t's taken?

Comment 9 Issue Tracker 2007-12-12 10:37:43 UTC
Eric,

The SysRq Ts were taken a few seconds apart.

I have requested for test results from the 47.0.1 kernel. The customer has
been using a -40 kernel which did not show the problem.

When the problem is seen, the umounts generally take a few minutes to
complete.

Sachin Prabhu


This event sent from IssueTracker by sprabhu 
 issue 135397

Comment 10 Issue Tracker 2007-12-13 07:37:49 UTC
Eric,

The customer has not tested with the -47 kernel. However he has the -40
kernel on several of his production machine where he hasn't seen this
issue. 

Sachin Prabhu


This event sent from IssueTracker by sprabhu 
 issue 135397

Comment 23 Josef Bacik 2008-02-01 18:49:47 UTC
Created attachment 293756 [details]
a alternate potential fix.

Tumeya,

Since you are having problems with my original fix please try this one and see
if it gives you better results.  My RHEL3 box seems to have gone off in the
weeds so same disclaimer applies to this patch, I haven't compiled it or
anything so if something goes wrong let me know exactly what it is so I can fix
it.

Comment 24 David Aquilina 2008-02-01 20:46:48 UTC
(In reply to comment #20)
> Created an attachment (id=293616) [edit]
> correct potential fix.
> 
> Ok I built and did some tests with this patch and it didn't seem to make
> anything blow up, please have your customers test this.

This patch makes my system unhappy. Hung while populating a directory with files
full of random data ('for i in `seq 1 4096`; do dd if=/dev/urandom of=$i bs=1M
count=1; done'). Also hung after rebooting, messing around on the filesystem,
and trying to unmount it. 

Trying the patch in Comment #23 shortly. 


Comment 28 Josef Bacik 2008-02-05 18:03:56 UTC
Yeah this needs to be tested with heavy use, as this is also the same path for reclaiming 
memory under memory pressure.  I wouldn't feel comfortable including this patch unless 
it had gone through a week of good hard testing.

Comment 34 Issue Tracker 2008-02-07 04:38:54 UTC
racer tests finished okay looks like on all hosts, but only ran for about
an hour and a half. not sure what's needed to make the test run longer. 

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by dwa 
 issue 160842

Comment 35 Eric Sandeen 2008-02-07 04:56:19 UTC
I don't remember for sure how I made it run longer last time; maybe just put it
in a loop.  It sometimes took several (>10) hours to eventually trip.  Was the
test done on a multi (2 or hopefully 4)-cpu machine?

Thanks,
-Eric

Comment 36 David Aquilina 2008-02-07 05:03:34 UTC
(In reply to comment #35)
> I don't remember for sure how I made it run longer last time; maybe just put it
> in a loop.  It sometimes took several (>10) hours to eventually trip.  Was the
> test done on a multi (2 or hopefully 4)-cpu machine?

It was run on a few different systems, both i386 and x86_64. at least one of
them was an 8-way system (4x dual cores). Take a look at the RHTS URLs in
Comment #33 for a list of the systems it ran on.

I'll ask the rhts guys tomorrow how to either make the test run longer or how to
extract that test from RHTS and run it standalone on a couple of systems



Comment 37 David Aquilina 2008-02-08 19:08:03 UTC
I've run the racer test in a loop overnight - looks like it's run about 15 times
so far, and the system hasn't fallen over. The system is an x86_64 with 8 cores
and 1GB memory. 

Comment 41 RHEL Program Management 2008-02-12 15:28:52 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 43 Josef Bacik 2008-02-12 16:40:08 UTC
Created attachment 294675 [details]
patch posted to rhkernel list.

Comment 52 Don Howard 2008-02-26 18:46:51 UTC
A patch addressing this issue has been included in kernel-2.4.21-54.EL.

Comment 57 errata-xmlrpc 2008-05-07 07:04:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0211.html



Note You need to log in before you can comment on or make changes to this bug.