From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050427 Red Hat/1.7.7-1.1.3.4 Description of problem: GFS deadlock due to the following lock sequence: (vfs) do_truncate { down(.... i_sem); down_write(.... i_alloc_sem); err = notify_change(dentry, &newattrs); -> call gfs_setattr -> glock up_write(.... i_alloc_sem); up(.... i_sem); } (gfs) do_write_direct { glock down(.... i_sem) __blockdev_direct_IO -> down(.... i_alloc_sem) -> up(.... i_alloc_sem) up(.... i_sem) } (gfs) gfs_read { glock _blockdev_direct_IO -> down(.... i_sem) -> up(.... i_sem) } Version-Release number of selected component (if applicable): 2.6.9-22.ELsmp How reproducible: Sometimes Steps to Reproduce: 1. (Running Oracle performance stress test) 2. Havn't tried using simpler test case. Actual Results: Oracle process hang. Additional info: oracle D 0000010061fdfe38 0 4588 1 4619 4571 (NOTLB) Call Trace:<ffffffff803034ff>{wait_for_completion+167} <ffffffff80132e8d>{default_wake_function+0} <ffffffffa019c83b>{:gfs:do_write_direct+1523} <ffffffff80132e8d>{default_wake_function+0} <ffffffffa01884a5>{:gfs:glock_wait_internal+350} <ffffffffa0188cd2>{:gfs:gfs_glock_nq+961} <ffffffffa0188efb>{:gfs:gfs_glock_nq_init+20} <ffffffffa01a09ae>{:gfs:gfs_setattr+75} <ffffffff80132ede>{__wake_up_common+67} <ffffffff80190126>{notify_change+340} <ffffffff801756ed>{do_truncate+135} <ffffffff801759be>{sys_ftruncate+248} oracle D 00000100e69f3270 0 4548 1 4550 4546 (NOTLB) Call Trace:<ffffffff8030353f>{wait_for_completion+231} <ffffffff8030353f>{wait_for_completion+231} <ffffffff80132e8d>{default_wake_function+0} <ffffffff80302637>{__down+147} <ffffffff80132e8d>{default_wake_function+0} <ffffffffa01884a5>{:gfs:glock_wait_internal+350} <ffffffff80303c13>{__down_failed+53} <ffffffffa019daa7>{:gfs:.text.lock.ops_file+15} <ffffffff801313f5>{recalc_task_prio+337} <ffffffff80131483>{activate_task+124} <ffffffff80131931>{try_to_wake_up+734} <ffffffffa019bcca>{:gfs:walk_vm+265} <ffffffffa019c248>{:gfs:do_write_direct+0} <ffffffffa019ce66>{:gfs:gfs_write+194} <ffffffff80177098>{vfs_write+207} <ffffffff80177275>{sys_pwrite64+86}
Got patch ready - tested by: 1. Run sct's Verify-data (I did a small tweak so it can run on top of multiple nodes) doing a forever write on GFS file. 2. Run a simple program that does a forever ftruncate() on the very same file as being written by Verify-data. The processes get locked up easily without the patch. With the patch, it seems to be able to run forever. Will send the RPMs to Oracle test to further verify the patch.
Created attachment 121385 [details] gfs_kernel_i_alloc.patch Base kernel patch.
Created attachment 121386 [details] gfs_i_alloc.patch GFS patch.
Code checked into CVS. Move into bugzilla into Modified state.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0234.html