237558 – GFS2: problem with drop_inode logic in memory pressure situations

Bug 237558 - GFS2: problem with drop_inode logic in memory pressure situations

Summary: GFS2: problem with drop_inode logic in memory pressure situations

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Christine Caulfield
QA Contact:	Dean Jansa
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	243718 (view as bug list)
Depends On:
Blocks:	204760
TreeView+	depends on / blocked

Reported:	2007-04-23 19:22 UTC by Josef Bacik
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2007-0959
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-07 19:47:44 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to add (3.62 KB, patch) 2007-06-06 13:41 UTC, Christine Caulfield	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0959	0	normal	SHIPPED_LIVE	Updated kernel packages for Red Hat Enterprise Linux 5 Update 1	2007-11-08 00:47:37 UTC

Description Josef Bacik 2007-04-23 19:22:50 UTC

So I'm doing that screwy rm -rf on one node and du -h on the othernode test, 
and then I get a hang on both nodes.  The du -h node was under memory pressure 
(it only has like 256 megs of ram) and hung like this

du            D C57DE754   972  5455   4292                     (NOTLB)
       c57de768 00000096 00000002 c57de754 c57de750 00000000 c134122c 00000002 
       00000007 cace69b0 e5474f38 000009c6 00000713 cace6ad4 c1345de0 00000000 
       c526193c c134121c 00000296 009f0660 00000296 ffffffff 00000000 00000000 
Call Trace:
 [<d0acbffd>] holder_wait+0x8/0xc [gfs2]
 [<c061d40d>] __wait_on_bit+0x36/0x5d
 [<c061d48f>] out_of_line_wait_on_bit+0x5b/0x63
 [<d0acbff0>] wait_on_holder+0x41/0x46 [gfs2]
 [<d0acced8>] glock_wait_internal+0xf1/0x21e [gfs2]
 [<d0acd176>] gfs2_glock_nq+0x171/0x1a6 [gfs2]
 [<d0acd579>] gfs2_glock_nq_m+0x27/0x1a6 [gfs2]
 [<d0ac4a2c>] do_strip+0x19e/0x3b4 [gfs2]
 [<d0ac3817>] recursive_scan+0x108/0x193 [gfs2]
 [<d0ac396f>] trunc_dealloc+0xcd/0xea [gfs2]
 [<d0ac3998>] gfs2_file_dealloc+0xc/0xe [gfs2]
 [<d0ad9ebe>] gfs2_delete_inode+0xdd/0x154 [gfs2]
 [<c048c7a2>] generic_delete_inode+0xa6/0x110
 [<c048c81e>] generic_drop_inode+0x12/0x130
 [<d0ada065>] gfs2_drop_inode+0x33/0x35 [gfs2]
 [<c048be0a>] iput+0x63/0x66
 [<c048a03d>] dentry_iput+0x88/0xa2
 [<c048ade4>] prune_one_dentry+0x42/0x65
 [<c048afb8>] prune_dcache+0xf0/0x138
 [<c048b019>] shrink_dcache_memory+0x19/0x31
 [<c046467a>] shrink_slab+0xd5/0x138
 [<c0464eed>] try_to_free_pages+0x163/0x22a
 [<c0460e1f>] __alloc_pages+0x1e3/0x2e4
 [<d0ab0ca4>] dlm_lowcomms_get_buffer+0xe0/0x150 [dlm]
 [<d0aa934c>] _create_message+0x22/0x8a [dlm]
 [<d0aa9415>] create_message+0x61/0x68 [dlm]
 [<d0aab8ca>] _request_lock+0xfb/0x224 [dlm]
 [<d0aaba5a>] request_lock+0x67/0x86 [dlm]
 [<d0aad47f>] dlm_lock+0xcc/0x107 [dlm]
 [<d0b185ce>] gdlm_do_lock+0x9b/0x12f [lock_dlm]
 [<d0b18867>] gdlm_lock+0xf1/0xf9 [lock_dlm]
 [<d0ad01f8>] gfs2_lm_lock+0x30/0x3a [gfs2]
 [<d0acc8aa>] gfs2_glock_xmote_th+0xed/0x167 [gfs2]
 [<d0accbd3>] run_queue+0x2af/0x36d [gfs2]
 [<d0acd15e>] gfs2_glock_nq+0x159/0x1a6 [gfs2]
 [<d0acea69>] gfs2_inode_lookup+0x160/0x1b4 [gfs2]
 [<d0acebd5>] gfs2_lookupi+0x118/0x188 [gfs2]
 [<d0ad8e79>] gfs2_lookup+0x1d/0x4d [gfs2]
 [<c0481a54>] do_lookup+0xa0/0x13d
 [<c0483823>] __link_path_walk+0x81f/0xc7a
 [<c0483cc9>] link_path_walk+0x4b/0xc0
 [<c0483ff3>] do_path_lookup+0x191/0x1e2
 [<c04847ee>] __user_walk_fd+0x32/0x44
 [<c047e397>] vfs_lstat_fd+0x18/0x3e
 [<c047e3fb>] vfs_lstat+0x11/0x13
 [<c047e411>] sys_lstat64+0x14/0x28
 [<c0404eec>] syscall_call+0x7/0xb
 =======================

Now what bothers me is that we are doing an iput, which is just so we can drop 
the inode cache to free up some memory, but we are going down the deletion 
path, because if there is an nlink count and the glock is being demoted (which 
would be the case since the other node is likely trying to get ahold of the 
lock as well) then we clear the nlink count, which makes us try to delete the 
inode.  This isn't good, we just want to be freeing the inode cache, not 
deleting files.

Comment 1 Josef Bacik 2007-04-23 19:40:37 UTC

hmm well i'm an idiot, chances are the other node had removed it and this is 
doing what its supposed to, I'll try and figure out why we are hanging out.

Comment 2 Steve Whitehouse 2007-04-25 08:41:31 UTC

Well at the point of the hang, we should be holding both the inode's glock and
the inode's iopen glock in exclusive mode. Also we are then trying to get the
lock on an rgrp in order to deallocate some of the inode's blocks and this is
whats causing the hang at this point.

So it would seem that something on the other node is hanging onto one (or more)
of the rgrp locks and refusing to release for some reason.

The other interesting point is that the reason that the inode in question is
being disposed of in the first place is that we are short of memory. The
allocation in question being a DLM message being sent to request an otherwise
unrelated lock. I wonder if we need to change the DLM to use GFP_NOFS...
although I still can't see exactly what the other node might be doing at this
point in time that prevents it from granting the locks that we need.

Comment 3 Steve Whitehouse 2007-05-10 11:21:40 UTC

It would be interesting to know whether you still see this bug with the current
-nmw kernel, since it might be that the fix for bz #231910 has some bearing on this.

Comment 4 RHEL Program Management 2007-05-10 11:44:51 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Steve Whitehouse 2007-05-10 15:57:17 UTC

Needinfo status got dropped somehow.

Comment 6 Steve Whitehouse 2007-06-05 16:16:46 UTC

A further thought - this seems to be caused by a memory allocation in the DLM
code  causing GFS2 to try and push out inodes which then requires a lock which
is probably blocked against the memory allocation in the DLM.

So maybe there are some DLM allocations which need to be marked GFP_NOFS ?

Comment 7 Steve Whitehouse 2007-06-05 19:47:15 UTC

Dave & Patrick, please take a quick look at the stack trace in this bug and let
me know if you agree with me (comment #6) as to the cause. I'm assuming that
probably the easiest fix would be to use GFP_NOFS but its always possible that
it might be ok for the DLM to recurse like this... I suspect from the bug report
that its not.

Comment 8 Christine Caulfield 2007-06-06 10:41:55 UTC

It looks like the allocation that is passed into lowcomms_get_buffer is hard
coded to GFP_KERNEL - which is not really very handy when you have a filesystem
above it in the stack.

The allocation policy should probably be lockspace-specific as it was in RHEL4.

Comment 9 Christine Caulfield 2007-06-06 13:41:38 UTC

Created attachment 156348 [details]
Patch to add

We can quibble about the names (and maybe the use of flags) but I think this is
sort of what's needed

Comment 10 Steve Whitehouse 2007-06-06 13:46:19 UTC

Yes, that looks good. My only comment is that it would be nicer if we could
simply pass the allocation type to the DLM directly rather than inventing a new
flag for it.

Comment 11 Christine Caulfield 2007-06-06 13:52:21 UTC

Yeah, but that's an ABI change. Unless we add a new call to set it after creation.

Comment 12 David Teigland 2007-06-06 14:34:18 UTC

posted to rhkernel
http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg00482.html

Comment 13 RHEL Program Management 2007-06-06 14:42:29 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 14 David Teigland 2007-06-12 17:35:13 UTC

reposted to rhkernel
http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg01197.html

Comment 15 David Teigland 2007-06-12 17:39:01 UTC

*** Bug 243718 has been marked as a duplicate of this bug. ***

Comment 16 Don Zickus 2007-06-16 00:32:17 UTC

in 2.6.18-27.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 19 errata-xmlrpc 2007-11-07 19:47:44 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html

Note You need to log in before you can comment on or make changes to this bug.