Bug 1943467 - AFR does not release eager locks for fsync when there is contention
Summary: AFR does not release eager locks for fsync when there is contention
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Karthik U S
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 2276826 (view as bug list)
Depends On:
Blocks: 1915037
TreeView+ depends on / blocked
 
Reported: 2021-03-26 06:10 UTC by Ravishankar N
Modified: 2024-12-20 19:49 UTC (History)
9 users (show)

Fixed In Version: glusterfs-6.0-57
Doc Type: Bug Fix
Doc Text:
Previously in AFR, the fsync FOP did not have the logic of releasing the active client's eager-lock on the inode whenever there was a conflicting request from another client. As a result, no call-bailing of FOPS due to the blocked locks on RHHI-V workloads which leads to paused VM. With this update, it is fixed by making AFR fsync aware of conflicting locks so that it releases the active lock if there is a contention.
Clone Of:
Environment:
Last Closed: 2021-10-05 07:56:26 UTC
Embargoed:


Attachments (Terms of Use)
C program to test fsync hang. (1.27 KB, text/plain)
2021-03-26 06:10 UTC, Ravishankar N
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:3729 0 None None None 2021-10-05 07:56:42 UTC

Description Ravishankar N 2021-03-26 06:10:11 UTC
Created attachment 1766492 [details]
C program to test fsync hang.

Description:

The issue was found by Xavi when working on customer BZ 1915037. See https://bugzilla.redhat.com/show_bug.cgi?id=1915037#c7 for more details. We do not know yet how the issue can lead to VM pause but nevertheless, the fix is needed as a first step in reducing the occurence of the problem.

Reproducer:
1.Create a 1x3 volume with RHHI volume options enabled (virt profile).
2. Run the attached program from two different fuse mounts on the same volume, on the same file.
3. The program writes + fsyncs the file in a loop. It can be observed that without the fix, fsync on the 2nd mount will be hung forever.

Comment 15 SATHEESARAN 2021-08-23 16:09:01 UTC
Verified with RHGS 3.5.5 interim build ( glusterfs-6.0-59.el8rhgs )
1. First part of verification with RHGS 3.5.4 ( glusterfs-6.0-56.2.el8rhgs )
       a. Ran the c program from one host with file name on the replicate gluster volume as argument
       b. Ran the c program from another host with the same file on the fuse mounted gluster volume
     Observed that the fsync is hung

2. Now upgraded the glusterfs to 6.0-59.el8rhgs, repeated the same test as above. fsync hang was not observed

3. Ran the regression tests

Comment 17 errata-xmlrpc 2021-10-05 07:56:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHGS 3.5.z Batch Update 5 glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3729

Comment 19 Mohit Agrawal 2024-04-25 04:56:41 UTC
*** Bug 2276826 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.