So I'm doing that screwy rm -rf on one node and du -h on the othernode test, and then I get a hang on both nodes. The du -h node was under memory pressure (it only has like 256 megs of ram) and hung like this du D C57DE754 972 5455 4292 (NOTLB) c57de768 00000096 00000002 c57de754 c57de750 00000000 c134122c 00000002 00000007 cace69b0 e5474f38 000009c6 00000713 cace6ad4 c1345de0 00000000 c526193c c134121c 00000296 009f0660 00000296 ffffffff 00000000 00000000 Call Trace: [<d0acbffd>] holder_wait+0x8/0xc [gfs2] [<c061d40d>] __wait_on_bit+0x36/0x5d [<c061d48f>] out_of_line_wait_on_bit+0x5b/0x63 [<d0acbff0>] wait_on_holder+0x41/0x46 [gfs2] [<d0acced8>] glock_wait_internal+0xf1/0x21e [gfs2] [<d0acd176>] gfs2_glock_nq+0x171/0x1a6 [gfs2] [<d0acd579>] gfs2_glock_nq_m+0x27/0x1a6 [gfs2] [<d0ac4a2c>] do_strip+0x19e/0x3b4 [gfs2] [<d0ac3817>] recursive_scan+0x108/0x193 [gfs2] [<d0ac396f>] trunc_dealloc+0xcd/0xea [gfs2] [<d0ac3998>] gfs2_file_dealloc+0xc/0xe [gfs2] [<d0ad9ebe>] gfs2_delete_inode+0xdd/0x154 [gfs2] [<c048c7a2>] generic_delete_inode+0xa6/0x110 [<c048c81e>] generic_drop_inode+0x12/0x130 [<d0ada065>] gfs2_drop_inode+0x33/0x35 [gfs2] [<c048be0a>] iput+0x63/0x66 [<c048a03d>] dentry_iput+0x88/0xa2 [<c048ade4>] prune_one_dentry+0x42/0x65 [<c048afb8>] prune_dcache+0xf0/0x138 [<c048b019>] shrink_dcache_memory+0x19/0x31 [<c046467a>] shrink_slab+0xd5/0x138 [<c0464eed>] try_to_free_pages+0x163/0x22a [<c0460e1f>] __alloc_pages+0x1e3/0x2e4 [<d0ab0ca4>] dlm_lowcomms_get_buffer+0xe0/0x150 [dlm] [<d0aa934c>] _create_message+0x22/0x8a [dlm] [<d0aa9415>] create_message+0x61/0x68 [dlm] [<d0aab8ca>] _request_lock+0xfb/0x224 [dlm] [<d0aaba5a>] request_lock+0x67/0x86 [dlm] [<d0aad47f>] dlm_lock+0xcc/0x107 [dlm] [<d0b185ce>] gdlm_do_lock+0x9b/0x12f [lock_dlm] [<d0b18867>] gdlm_lock+0xf1/0xf9 [lock_dlm] [<d0ad01f8>] gfs2_lm_lock+0x30/0x3a [gfs2] [<d0acc8aa>] gfs2_glock_xmote_th+0xed/0x167 [gfs2] [<d0accbd3>] run_queue+0x2af/0x36d [gfs2] [<d0acd15e>] gfs2_glock_nq+0x159/0x1a6 [gfs2] [<d0acea69>] gfs2_inode_lookup+0x160/0x1b4 [gfs2] [<d0acebd5>] gfs2_lookupi+0x118/0x188 [gfs2] [<d0ad8e79>] gfs2_lookup+0x1d/0x4d [gfs2] [<c0481a54>] do_lookup+0xa0/0x13d [<c0483823>] __link_path_walk+0x81f/0xc7a [<c0483cc9>] link_path_walk+0x4b/0xc0 [<c0483ff3>] do_path_lookup+0x191/0x1e2 [<c04847ee>] __user_walk_fd+0x32/0x44 [<c047e397>] vfs_lstat_fd+0x18/0x3e [<c047e3fb>] vfs_lstat+0x11/0x13 [<c047e411>] sys_lstat64+0x14/0x28 [<c0404eec>] syscall_call+0x7/0xb ======================= Now what bothers me is that we are doing an iput, which is just so we can drop the inode cache to free up some memory, but we are going down the deletion path, because if there is an nlink count and the glock is being demoted (which would be the case since the other node is likely trying to get ahold of the lock as well) then we clear the nlink count, which makes us try to delete the inode. This isn't good, we just want to be freeing the inode cache, not deleting files.
hmm well i'm an idiot, chances are the other node had removed it and this is doing what its supposed to, I'll try and figure out why we are hanging out.
Well at the point of the hang, we should be holding both the inode's glock and the inode's iopen glock in exclusive mode. Also we are then trying to get the lock on an rgrp in order to deallocate some of the inode's blocks and this is whats causing the hang at this point. So it would seem that something on the other node is hanging onto one (or more) of the rgrp locks and refusing to release for some reason. The other interesting point is that the reason that the inode in question is being disposed of in the first place is that we are short of memory. The allocation in question being a DLM message being sent to request an otherwise unrelated lock. I wonder if we need to change the DLM to use GFP_NOFS... although I still can't see exactly what the other node might be doing at this point in time that prevents it from granting the locks that we need.
It would be interesting to know whether you still see this bug with the current -nmw kernel, since it might be that the fix for bz #231910 has some bearing on this.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Needinfo status got dropped somehow.
A further thought - this seems to be caused by a memory allocation in the DLM code causing GFS2 to try and push out inodes which then requires a lock which is probably blocked against the memory allocation in the DLM. So maybe there are some DLM allocations which need to be marked GFP_NOFS ?
Dave & Patrick, please take a quick look at the stack trace in this bug and let me know if you agree with me (comment #6) as to the cause. I'm assuming that probably the easiest fix would be to use GFP_NOFS but its always possible that it might be ok for the DLM to recurse like this... I suspect from the bug report that its not.
It looks like the allocation that is passed into lowcomms_get_buffer is hard coded to GFP_KERNEL - which is not really very handy when you have a filesystem above it in the stack. The allocation policy should probably be lockspace-specific as it was in RHEL4.
Created attachment 156348 [details] Patch to add We can quibble about the names (and maybe the use of flags) but I think this is sort of what's needed
Yes, that looks good. My only comment is that it would be nicer if we could simply pass the allocation type to the DLM directly rather than inventing a new flag for it.
Yeah, but that's an ABI change. Unless we add a new call to set it after creation.
posted to rhkernel http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg00482.html
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
reposted to rhkernel http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg01197.html
*** Bug 243718 has been marked as a duplicate of this bug. ***
in 2.6.18-27.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html