Description of problem: If the unstuffing code in gfs2_adjust_quota ever runs it will likely cause a crash due to the incorrect locking order and also the fact that calling gfs2_alloc_get recursively isn't allowed. Also gfs2_adjust_quota gets called under a transaction and thus gfs2_inplace_reserve must not be called (since it locks rgrps, and the rgrps must be locked before the transaction is started). The simple solution is to just add a block to the reservation at the higher layer. This can be done unconditionally, even if an unstuff isn't needed since it will be released back to the rgrp if its not allocated during the transaction. Fixing this is required by the next step of the tree walking bz.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 296251 [details] patch to correct lock ordering in gfs2_adjust_quota
using the patch in comment #2, unstuffing the quota inode doesn't crash the machine. However, I'm seeing this withdraw message some time later. Any ideas? GFS2: fsid=dm-3.0: fatal: assertion "tr->tr_num_revoke <= tr->tr_revokes" failed GFS2: fsid=dm-3.0: function = gfs2_trans_end, file = fs/gfs2/trans.c, line = 102 GFS2: fsid=dm-3.0: about to withdraw this file system GFS2: fsid=dm-3.0: telling LM to withdraw GFS2: fsid=dm-3.0: withdrawn [<e0d1cfec>] gfs2_lm_withdraw+0x73/0x7f [gfs2] [<e0d2dcc5>] gfs2_assert_withdraw_i+0x1e/0x30 [gfs2] [<e0d2dae0>] gfs2_trans_end+0xc1/0x129 [gfs2] [<e0d20c51>] gfs2_write_cache_jdata+0x27e/0x32f [gfs2] [<c04e21e0>] __next_cpu+0x12/0x21 [<c041efff>] find_busiest_group+0x177/0x462 [<e0d21291>] gfs2_jdata_writepages+0x1d/0x46 [gfs2] [<c045a0ef>] do_writepages+0x20/0x32 [<c048d8a6>] __writeback_single_inode+0x170/0x2af [<c048dcbb>] sync_sb_inodes+0x170/0x213 [<c048df0a>] writeback_inodes+0x6a/0xb0 [<c045a52e>] wb_kupdate+0x7b/0xdb [<c045a94d>] pdflush+0x0/0x1a3 [<c045aa58>] pdflush+0x10b/0x1a3 [<c045a4b3>] wb_kupdate+0x0/0xdb [<c0435f05>] kthread+0xc0/0xeb [<c0435e45>] kthread+0x0/0xeb [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================= [<e0d2dccf>] gfs2_assert_withdraw_i+0x28/0x30 [gfs2] [<e0d2dae0>] gfs2_trans_end+0xc1/0x129 [gfs2] [<e0d20c51>] gfs2_write_cache_jdata+0x27e/0x32f [gfs2] [<c04e21e0>] __next_cpu+0x12/0x21 [<c041efff>] find_busiest_group+0x177/0x462 [<e0d21291>] gfs2_jdata_writepages+0x1d/0x46 [gfs2] [<c045a0ef>] do_writepages+0x20/0x32 [<c048d8a6>] __writeback_single_inode+0x170/0x2af [<c048dcbb>] sync_sb_inodes+0x170/0x213 [<c048df0a>] writeback_inodes+0x6a/0xb0 [<c045a52e>] wb_kupdate+0x7b/0xdb [<c045a94d>] pdflush+0x0/0x1a3 [<c045aa58>] pdflush+0x10b/0x1a3 [<c045a4b3>] wb_kupdate+0x0/0xdb [<c0435f05>] kthread+0xc0/0xeb [<c0435e45>] kthread+0x0/0xeb [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================= GFS2: fsid=dm-3.0: tr_num_revoke = 1, tr_revokes = 0 <4>GFS2: Transaction created at: gfs2_write_cache_jdata+0x15e/0x32f [gfs2]
Created attachment 296564 [details] Patch to add correct number of revokes Please try the following patch which fixes the number of revokes. Also I'd suggest checking that the size of the quota inode is being updated in the correct places since it would appear that the writepages code thought that the inode had been truncated.
Posted combined patch to rhkernel-list. http://post-office.corp.redhat.com/archives/rhkernel-list/2008-March/msg00240.html
in kernel-2.6.18-86.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html