Bug 173913

Summary: GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)
Product: [Retired] Red Hat Cluster Suite Reporter: Wendy Cheng <nobody+wcheng>
Component: gfsAssignee: Wendy Cheng <nobody+wcheng>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0234 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-09 19:46:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 164915    
Attachments:
Description Flags
gfs_kernel_i_alloc.patch
none
gfs_i_alloc.patch none

Description Wendy Cheng 2005-11-22 15:40:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050427 Red Hat/1.7.7-1.1.3.4

Description of problem:
GFS deadlock due to the following lock sequence:

(vfs) do_truncate {
        down(.... i_sem);
        down_write(.... i_alloc_sem);
        err = notify_change(dentry, &newattrs); -> call gfs_setattr -> glock
        up_write(.... i_alloc_sem);
        up(.... i_sem);
      }

(gfs) do_write_direct {
         glock
         down(.... i_sem)
         __blockdev_direct_IO -> down(.... i_alloc_sem) -> up(.... i_alloc_sem)
         up(.... i_sem)
       }
(gfs) gfs_read {
         glock
         _blockdev_direct_IO -> down(.... i_sem) -> up(.... i_sem)
       }
         



Version-Release number of selected component (if applicable):
2.6.9-22.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
1. (Running Oracle performance stress test)
2. Havn't tried using simpler test case.
  

Actual Results:  Oracle process hang. 

Additional info:

oracle        D 0000010061fdfe38     0  4588      1          4619  4571 (NOTLB)
Call Trace:<ffffffff803034ff>{wait_for_completion+167} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa019c83b>{:gfs:do_write_direct+1523} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa01884a5>{:gfs:glock_wait_internal+350} <ffffffffa0188cd2>{:gfs:gfs_glock_nq+961}
           <ffffffffa0188efb>{:gfs:gfs_glock_nq_init+20} <ffffffffa01a09ae>{:gfs:gfs_setattr+75}
           <ffffffff80132ede>{__wake_up_common+67} <ffffffff80190126>{notify_change+340}
           <ffffffff801756ed>{do_truncate+135} <ffffffff801759be>{sys_ftruncate+248}
oracle        D 00000100e69f3270     0  4548      1          4550  4546 (NOTLB)
Call Trace:<ffffffff8030353f>{wait_for_completion+231} <ffffffff8030353f>{wait_for_completion+231}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffff80302637>{__down+147}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffffa01884a5>{:gfs:glock_wait_internal+350}
           <ffffffff80303c13>{__down_failed+53} <ffffffffa019daa7>{:gfs:.text.lock.ops_file+15}
           <ffffffff801313f5>{recalc_task_prio+337} <ffffffff80131483>{activate_task+124}
           <ffffffff80131931>{try_to_wake_up+734} <ffffffffa019bcca>{:gfs:walk_vm+265}
           <ffffffffa019c248>{:gfs:do_write_direct+0} <ffffffffa019ce66>{:gfs:gfs_write+194}
           <ffffffff80177098>{vfs_write+207} <ffffffff80177275>{sys_pwrite64+86}

Comment 1 Wendy Cheng 2005-11-23 05:36:03 UTC
Got patch ready - tested by:

1. Run sct's Verify-data (I did a small tweak so it can run on top of multiple
nodes) doing a forever write on GFS file.
2. Run a simple program that does a forever ftruncate() on the very same file as
being written by Verify-data.

The processes get locked up easily without the patch. With the patch, it seems
to be able to run forever.

Will send the RPMs to Oracle test to further verify the patch.

Comment 2 Wendy Cheng 2005-11-23 05:39:55 UTC
Created attachment 121385 [details]
gfs_kernel_i_alloc.patch

Base kernel patch.

Comment 3 Wendy Cheng 2005-11-23 05:40:37 UTC
Created attachment 121386 [details]
gfs_i_alloc.patch

GFS patch.

Comment 4 Wendy Cheng 2005-12-14 20:43:45 UTC
Code checked into CVS. Move into bugzilla into Modified state. 

Comment 7 Red Hat Bugzilla 2006-03-09 19:46:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html