Red Hat Bugzilla – Bug 237558
GFS2: problem with drop_inode logic in memory pressure situations
Last modified: 2007-11-30 17:07:43 EST
So I'm doing that screwy rm -rf on one node and du -h on the othernode test,
and then I get a hang on both nodes. The du -h node was under memory pressure
(it only has like 256 megs of ram) and hung like this
du D C57DE754 972 5455 4292 (NOTLB)
c57de768 00000096 00000002 c57de754 c57de750 00000000 c134122c 00000002
00000007 cace69b0 e5474f38 000009c6 00000713 cace6ad4 c1345de0 00000000
c526193c c134121c 00000296 009f0660 00000296 ffffffff 00000000 00000000
[<d0acbffd>] holder_wait+0x8/0xc [gfs2]
[<d0acbff0>] wait_on_holder+0x41/0x46 [gfs2]
[<d0acced8>] glock_wait_internal+0xf1/0x21e [gfs2]
[<d0acd176>] gfs2_glock_nq+0x171/0x1a6 [gfs2]
[<d0acd579>] gfs2_glock_nq_m+0x27/0x1a6 [gfs2]
[<d0ac4a2c>] do_strip+0x19e/0x3b4 [gfs2]
[<d0ac3817>] recursive_scan+0x108/0x193 [gfs2]
[<d0ac396f>] trunc_dealloc+0xcd/0xea [gfs2]
[<d0ac3998>] gfs2_file_dealloc+0xc/0xe [gfs2]
[<d0ad9ebe>] gfs2_delete_inode+0xdd/0x154 [gfs2]
[<d0ada065>] gfs2_drop_inode+0x33/0x35 [gfs2]
[<d0ab0ca4>] dlm_lowcomms_get_buffer+0xe0/0x150 [dlm]
[<d0aa934c>] _create_message+0x22/0x8a [dlm]
[<d0aa9415>] create_message+0x61/0x68 [dlm]
[<d0aab8ca>] _request_lock+0xfb/0x224 [dlm]
[<d0aaba5a>] request_lock+0x67/0x86 [dlm]
[<d0aad47f>] dlm_lock+0xcc/0x107 [dlm]
[<d0b185ce>] gdlm_do_lock+0x9b/0x12f [lock_dlm]
[<d0b18867>] gdlm_lock+0xf1/0xf9 [lock_dlm]
[<d0ad01f8>] gfs2_lm_lock+0x30/0x3a [gfs2]
[<d0acc8aa>] gfs2_glock_xmote_th+0xed/0x167 [gfs2]
[<d0accbd3>] run_queue+0x2af/0x36d [gfs2]
[<d0acd15e>] gfs2_glock_nq+0x159/0x1a6 [gfs2]
[<d0acea69>] gfs2_inode_lookup+0x160/0x1b4 [gfs2]
[<d0acebd5>] gfs2_lookupi+0x118/0x188 [gfs2]
[<d0ad8e79>] gfs2_lookup+0x1d/0x4d [gfs2]
Now what bothers me is that we are doing an iput, which is just so we can drop
the inode cache to free up some memory, but we are going down the deletion
path, because if there is an nlink count and the glock is being demoted (which
would be the case since the other node is likely trying to get ahold of the
lock as well) then we clear the nlink count, which makes us try to delete the
inode. This isn't good, we just want to be freeing the inode cache, not
hmm well i'm an idiot, chances are the other node had removed it and this is
doing what its supposed to, I'll try and figure out why we are hanging out.
Well at the point of the hang, we should be holding both the inode's glock and
the inode's iopen glock in exclusive mode. Also we are then trying to get the
lock on an rgrp in order to deallocate some of the inode's blocks and this is
whats causing the hang at this point.
So it would seem that something on the other node is hanging onto one (or more)
of the rgrp locks and refusing to release for some reason.
The other interesting point is that the reason that the inode in question is
being disposed of in the first place is that we are short of memory. The
allocation in question being a DLM message being sent to request an otherwise
unrelated lock. I wonder if we need to change the DLM to use GFP_NOFS...
although I still can't see exactly what the other node might be doing at this
point in time that prevents it from granting the locks that we need.
It would be interesting to know whether you still see this bug with the current
-nmw kernel, since it might be that the fix for bz #231910 has some bearing on this.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Needinfo status got dropped somehow.
A further thought - this seems to be caused by a memory allocation in the DLM
code causing GFS2 to try and push out inodes which then requires a lock which
is probably blocked against the memory allocation in the DLM.
So maybe there are some DLM allocations which need to be marked GFP_NOFS ?
Dave & Patrick, please take a quick look at the stack trace in this bug and let
me know if you agree with me (comment #6) as to the cause. I'm assuming that
probably the easiest fix would be to use GFP_NOFS but its always possible that
it might be ok for the DLM to recurse like this... I suspect from the bug report
that its not.
It looks like the allocation that is passed into lowcomms_get_buffer is hard
coded to GFP_KERNEL - which is not really very handy when you have a filesystem
above it in the stack.
The allocation policy should probably be lockspace-specific as it was in RHEL4.
Created attachment 156348 [details]
Patch to add
We can quibble about the names (and maybe the use of flags) but I think this is
sort of what's needed
Yes, that looks good. My only comment is that it would be nicer if we could
simply pass the allocation type to the DLM directly rather than inventing a new
flag for it.
Yeah, but that's an ABI change. Unless we add a new call to set it after creation.
posted to rhkernel
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla
reposted to rhkernel
*** Bug 243718 has been marked as a duplicate of this bug. ***
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.