Bug 173913 - GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)
Summary: GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Wendy Cheng
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks: 164915
TreeView+ depends on / blocked
 
Reported: 2005-11-22 15:40 UTC by Wendy Cheng
Modified: 2010-01-12 03:08 UTC (History)
1 user (show)

Fixed In Version: RHBA-2006-0234
Clone Of:
Environment:
Last Closed: 2006-03-09 19:46:25 UTC
Embargoed:


Attachments (Terms of Use)
gfs_kernel_i_alloc.patch (1.31 KB, patch)
2005-11-23 05:39 UTC, Wendy Cheng
no flags Details | Diff
gfs_i_alloc.patch (1.78 KB, patch)
2005-11-23 05:40 UTC, Wendy Cheng
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0234 0 normal SHIPPED_LIVE GFS-kernel bug fix update 2006-03-09 05:00:00 UTC

Description Wendy Cheng 2005-11-22 15:40:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050427 Red Hat/1.7.7-1.1.3.4

Description of problem:
GFS deadlock due to the following lock sequence:

(vfs) do_truncate {
        down(.... i_sem);
        down_write(.... i_alloc_sem);
        err = notify_change(dentry, &newattrs); -> call gfs_setattr -> glock
        up_write(.... i_alloc_sem);
        up(.... i_sem);
      }

(gfs) do_write_direct {
         glock
         down(.... i_sem)
         __blockdev_direct_IO -> down(.... i_alloc_sem) -> up(.... i_alloc_sem)
         up(.... i_sem)
       }
(gfs) gfs_read {
         glock
         _blockdev_direct_IO -> down(.... i_sem) -> up(.... i_sem)
       }
         



Version-Release number of selected component (if applicable):
2.6.9-22.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
1. (Running Oracle performance stress test)
2. Havn't tried using simpler test case.
  

Actual Results:  Oracle process hang. 

Additional info:

oracle        D 0000010061fdfe38     0  4588      1          4619  4571 (NOTLB)
Call Trace:<ffffffff803034ff>{wait_for_completion+167} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa019c83b>{:gfs:do_write_direct+1523} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa01884a5>{:gfs:glock_wait_internal+350} <ffffffffa0188cd2>{:gfs:gfs_glock_nq+961}
           <ffffffffa0188efb>{:gfs:gfs_glock_nq_init+20} <ffffffffa01a09ae>{:gfs:gfs_setattr+75}
           <ffffffff80132ede>{__wake_up_common+67} <ffffffff80190126>{notify_change+340}
           <ffffffff801756ed>{do_truncate+135} <ffffffff801759be>{sys_ftruncate+248}
oracle        D 00000100e69f3270     0  4548      1          4550  4546 (NOTLB)
Call Trace:<ffffffff8030353f>{wait_for_completion+231} <ffffffff8030353f>{wait_for_completion+231}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffff80302637>{__down+147}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffffa01884a5>{:gfs:glock_wait_internal+350}
           <ffffffff80303c13>{__down_failed+53} <ffffffffa019daa7>{:gfs:.text.lock.ops_file+15}
           <ffffffff801313f5>{recalc_task_prio+337} <ffffffff80131483>{activate_task+124}
           <ffffffff80131931>{try_to_wake_up+734} <ffffffffa019bcca>{:gfs:walk_vm+265}
           <ffffffffa019c248>{:gfs:do_write_direct+0} <ffffffffa019ce66>{:gfs:gfs_write+194}
           <ffffffff80177098>{vfs_write+207} <ffffffff80177275>{sys_pwrite64+86}

Comment 1 Wendy Cheng 2005-11-23 05:36:03 UTC
Got patch ready - tested by:

1. Run sct's Verify-data (I did a small tweak so it can run on top of multiple
nodes) doing a forever write on GFS file.
2. Run a simple program that does a forever ftruncate() on the very same file as
being written by Verify-data.

The processes get locked up easily without the patch. With the patch, it seems
to be able to run forever.

Will send the RPMs to Oracle test to further verify the patch.

Comment 2 Wendy Cheng 2005-11-23 05:39:55 UTC
Created attachment 121385 [details]
gfs_kernel_i_alloc.patch

Base kernel patch.

Comment 3 Wendy Cheng 2005-11-23 05:40:37 UTC
Created attachment 121386 [details]
gfs_i_alloc.patch

GFS patch.

Comment 4 Wendy Cheng 2005-12-14 20:43:45 UTC
Code checked into CVS. Move into bugzilla into Modified state. 

Comment 7 Red Hat Bugzilla 2006-03-09 19:46:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html



Note You need to log in before you can comment on or make changes to this bug.