Bug 1776152 - glusterfsd do not release posix lock when multiple glusterfs client do flock -xo to the same file paralleled
Summary: glusterfsd do not release posix lock when multiple glusterfs client do flock ...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: 7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Susant Kumar Palai
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1779089 1851315
TreeView+ depends on / blocked
 
Reported: 2019-11-25 09:03 UTC by Hunang Shujun
Modified: 2020-06-26 06:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1779089 (view as bug list)
Environment:
Last Closed: 2020-03-12 14:25:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
this is gluster log and statedump, very easy to reproduce (79.84 KB, application/zip)
2019-11-25 09:03 UTC, Hunang Shujun
no flags Details
some analysis about this problem (25.50 KB, application/x-ole-storage)
2019-11-28 06:41 UTC, zhou lin
no flags Details
patch for this issue (1.32 KB, patch)
2019-11-29 09:17 UTC, zhou lin
no flags Details | Diff

Description Hunang Shujun 2019-11-25 09:03:27 UTC
Created attachment 1639407 [details]
this is gluster log and statedump, very easy to reproduce

Description of problem:
glusterfsd do not release posix lock when multiple glusterfs client do flock -xo to the same file paralleled

Version-Release number of selected component (if applicable):
glusterfs7.0

How reproducible:


Steps to Reproduce:
1. create a volume with one brick
   gluster volume create test3  192.168.0.14:/mnt/vol3-test force
2. mount the brick on two different node
  node name: node2
       mkdir /mnt/test-vol3
       mount -t glusterfs 192.168.0.14:/test3 /mnt/test-vol3
  node name: test
       mkdir /mnt/test-vol3
       mount -t glusterfs 192.168.0.14:/test3 /mnt/test-vol3

3.prepare same script to do flock on the two nodes
  [root@node2 ~]# vi flock.sh 

#!/bin/bash
file=/mnt/test-vol3/test.log
touch $file
(

         flock -xo 200
         echo "client1 do something" > $file
         sleep 1

 ) 200>$file
[root@node2 ~]# vi repeat_flock.sh 

#!/bin/bash
i=1
while [ "1" = "1" ]
do
    ./flock.sh
    ((i=i+1))
    echo $i
done
similar script on "test" node
[root@test ~]# vi flock.sh 

#!/bin/bash
file=/mnt/test-vol3/test.log
touch $file
(
         flock -xo 200
         echo "client2 do something" > $file
         sleep 1

 ) 200>$file

[root@test ~]# vi repeat_flock.sh 

#!/bin/bash
i=1
while [ "1" = "1" ]
do
    ./flock.sh
    ((i=i+1))
    echo $i
done

4. start repeat_flock.sh on two nodes
  not need much time, the two scripts will stuck, 

   [root@test ~]# ./repeat_flock.sh
2
3
4
5
6
7
   [root@node2 ~]# ./repeat_flock.sh
2
issue reproduced

5. do statedump on the volume test3
  gluster v statedump test3
[xlator.features.locks.test3-locks.inode]
path=/test.log
mandatory=0
posixlk-count=3
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 22752, owner=8c9cd93f8ee486a0, client=0x7f76e8082100, connection-id=CTX_ID:7da20ab3-cc70-41bd-ab83-955481288ba2-GRAPH_ID:0-PID:22649-HOST:node2-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:12, granted at 2019-11-25 08:30:12
posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 10928, owner=b42ee151db035df9, client=0x7f76e0006390, connection-id=CTX_ID:c4cf488c-2d8e-4f7c-87e9-a0cb1f2648cd-GRAPH_ID:0-PID:10850-HOST:test-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:12
posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22757, owner=f62dd9ff96cefaf5, client=0x7f76e8082100, connection-id=CTX_ID:7da20ab3-cc70-41bd-ab83-955481288ba2-GRAPH_ID:0-PID:22649-HOST:node2-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:13


Actual results:
both two repeat_flock.sh on two nodes will stuck, and the lock held forever

Expected results:
both two repeat_flock.sh on two nodes should not be stuck

Additional info:

Comment 1 zhou lin 2019-11-28 06:41:39 UTC
Created attachment 1640300 [details]
some analysis about this problem

Comment 2 zhou lin 2019-11-29 03:08:45 UTC
i tried to add un_ref in grant_blocked_locks just before stack unwind, seems it works.

Comment 3 zhou lin 2019-11-29 09:17:23 UTC
Created attachment 1640612 [details]
patch for this issue

please review patch for this issue

Comment 4 Worker Ant 2019-12-03 05:52:33 UTC
REVIEW: https://review.gluster.org/23794 (add clean local after grant lock) posted (#1) for review on master by None

Comment 5 Worker Ant 2019-12-04 01:32:10 UTC
REVISION POSTED: https://review.gluster.org/23794 (add clean local after grant lock) posted (#5) for review on master by None

Comment 6 Worker Ant 2020-03-12 14:25:17 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1046, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.