Bug 173203

Summary: Fix i_sem(aphore) logic in error code path
Product: [Retired] Red Hat Cluster Suite Reporter: Wendy Cheng <nobody+wcheng>
Component: gfsAssignee: Wendy Cheng <nobody+wcheng>
Status: CLOSED ERRATA QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-17 03:42:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wendy Cheng 2005-11-14 22:19:36 UTC
Description of problem:
The gfs_write() obtains inode i_sem(aphore) before passing the logic into lower
level routine such as do_write_direct(). A patch was added via bugzilla 171488
to solve a deadlock issue that drops the i_sem before requesting an exclusive
glock within do_write_direct(). It re-locks the i_sem after the exclusive glock.
The "down(&inode->i_sem)" call should have placed right after the
gfs_glock_nq_m() call but it is currently added after the if (error) clause:

 restart:
        up(&inode->i_sem);
                                                                                
        gfs_holder_init(ip->i_gl, state, 0, &ghs[num_gh]);
                                                                                
        error = gfs_glock_nq_m(num_gh + 1, ghs);
        if (error)
                goto out;
                                                                                
        down(&inode->i_sem);

If gfs_glock_nq_m() returns error (it rarely happens though), the call will
return back to gfs_write() without i_sem locked. This semaphore count will not
be correct after that. We need to add a new patch to correc this issue as:

--- gfs.old/src/gfs/ops_file.c  2005-11-11 10:03:09.000000000 -0500
+++ gfs.new/src/gfs/ops_file.c  2005-11-11 10:04:24.000000000 -0500
@@ -603,11 +603,12 @@ do_write_direct(struct file *file, char
        gfs_holder_init(ip->i_gl, state, 0, &ghs[num_gh]);
                                                                                
        error = gfs_glock_nq_m(num_gh + 1, ghs);
-       if (error)
-               goto out;
                                                                                
        down(&inode->i_sem);
 
+       if (error)
+               goto out;
+
        error = -EINVAL;
        if (gfs_is_jdata(ip))
                goto out_gunlock;


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Since feist has included the patch in bugzilla 171488 into his new build, can't
re-use 171488. Open this new bugzilla to log this change.

Comment 1 Wendy Cheng 2005-11-14 22:22:05 UTC
Found this issue while doing self code review.  

Comment 2 Wendy Cheng 2005-11-14 22:50:53 UTC
Changes checked into CVS RHEL 4 branch.