173913 – GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)

Bug 173913 - GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)

Summary: GFS deadlock - gfs_write (do_write_direct) and gfs_setattr (do_truncate)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Wendy Cheng
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	164915
TreeView+	depends on / blocked

Reported:	2005-11-22 15:40 UTC by Wendy Cheng
Modified:	2010-01-12 03:08 UTC (History)
CC List:	1 user (show)
Fixed In Version:	RHBA-2006-0234
Clone Of:
Environment:
Last Closed:	2006-03-09 19:46:25 UTC
Embargoed:

Attachments	(Terms of Use)
gfs_kernel_i_alloc.patch (1.31 KB, patch) 2005-11-23 05:39 UTC, Wendy Cheng	no flags	Details \| Diff
gfs_i_alloc.patch (1.78 KB, patch) 2005-11-23 05:40 UTC, Wendy Cheng	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0234	0	normal	SHIPPED_LIVE	GFS-kernel bug fix update	2006-03-09 05:00:00 UTC

Description Wendy Cheng 2005-11-22 15:40:27 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050427 Red Hat/1.7.7-1.1.3.4

Description of problem:
GFS deadlock due to the following lock sequence:

(vfs) do_truncate {
        down(.... i_sem);
        down_write(.... i_alloc_sem);
        err = notify_change(dentry, &newattrs); -> call gfs_setattr -> glock
        up_write(.... i_alloc_sem);
        up(.... i_sem);
      }

(gfs) do_write_direct {
         glock
         down(.... i_sem)
         __blockdev_direct_IO -> down(.... i_alloc_sem) -> up(.... i_alloc_sem)
         up(.... i_sem)
       }
(gfs) gfs_read {
         glock
         _blockdev_direct_IO -> down(.... i_sem) -> up(.... i_sem)
       }
         



Version-Release number of selected component (if applicable):
2.6.9-22.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
1. (Running Oracle performance stress test)
2. Havn't tried using simpler test case.
  

Actual Results:  Oracle process hang. 

Additional info:

oracle        D 0000010061fdfe38     0  4588      1          4619  4571 (NOTLB)
Call Trace:<ffffffff803034ff>{wait_for_completion+167} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa019c83b>{:gfs:do_write_direct+1523} <ffffffff80132e8d>{default_wake_function+0}
           <ffffffffa01884a5>{:gfs:glock_wait_internal+350} <ffffffffa0188cd2>{:gfs:gfs_glock_nq+961}
           <ffffffffa0188efb>{:gfs:gfs_glock_nq_init+20} <ffffffffa01a09ae>{:gfs:gfs_setattr+75}
           <ffffffff80132ede>{__wake_up_common+67} <ffffffff80190126>{notify_change+340}
           <ffffffff801756ed>{do_truncate+135} <ffffffff801759be>{sys_ftruncate+248}
oracle        D 00000100e69f3270     0  4548      1          4550  4546 (NOTLB)
Call Trace:<ffffffff8030353f>{wait_for_completion+231} <ffffffff8030353f>{wait_for_completion+231}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffff80302637>{__down+147}
           <ffffffff80132e8d>{default_wake_function+0} <ffffffffa01884a5>{:gfs:glock_wait_internal+350}
           <ffffffff80303c13>{__down_failed+53} <ffffffffa019daa7>{:gfs:.text.lock.ops_file+15}
           <ffffffff801313f5>{recalc_task_prio+337} <ffffffff80131483>{activate_task+124}
           <ffffffff80131931>{try_to_wake_up+734} <ffffffffa019bcca>{:gfs:walk_vm+265}
           <ffffffffa019c248>{:gfs:do_write_direct+0} <ffffffffa019ce66>{:gfs:gfs_write+194}
           <ffffffff80177098>{vfs_write+207} <ffffffff80177275>{sys_pwrite64+86}

Comment 1 Wendy Cheng 2005-11-23 05:36:03 UTC

Got patch ready - tested by:

1. Run sct's Verify-data (I did a small tweak so it can run on top of multiple
nodes) doing a forever write on GFS file.
2. Run a simple program that does a forever ftruncate() on the very same file as
being written by Verify-data.

The processes get locked up easily without the patch. With the patch, it seems
to be able to run forever.

Will send the RPMs to Oracle test to further verify the patch.

Comment 2 Wendy Cheng 2005-11-23 05:39:55 UTC

Created attachment 121385 [details]
gfs_kernel_i_alloc.patch

Base kernel patch.

Comment 3 Wendy Cheng 2005-11-23 05:40:37 UTC

Created attachment 121386 [details]
gfs_i_alloc.patch

GFS patch.

Comment 4 Wendy Cheng 2005-12-14 20:43:45 UTC

Code checked into CVS. Move into bugzilla into Modified state.

Comment 7 Red Hat Bugzilla 2006-03-09 19:46:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0234.html

Note You need to log in before you can comment on or make changes to this bug.