Bug 1776152
Summary: | glusterfsd do not release posix lock when multiple glusterfs client do flock -xo to the same file paralleled | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Hunang Shujun <shujun.huang> | ||||||||
Component: | locks | Assignee: | Susant Kumar Palai <spalai> | ||||||||
Status: | CLOSED UPSTREAM | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7 | CC: | bugs, spalai, zz.sh.cynthia | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1779089 (view as bug list) | Environment: | |||||||||
Last Closed: | 2020-03-12 14:25:17 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1779089, 1851315 | ||||||||||
Attachments: |
|
Created attachment 1640300 [details]
some analysis about this problem
i tried to add un_ref in grant_blocked_locks just before stack unwind, seems it works. Created attachment 1640612 [details]
patch for this issue
please review patch for this issue
REVIEW: https://review.gluster.org/23794 (add clean local after grant lock) posted (#1) for review on master by None REVISION POSTED: https://review.gluster.org/23794 (add clean local after grant lock) posted (#5) for review on master by None This bug is moved to https://github.com/gluster/glusterfs/issues/1046, and will be tracked there from now on. Visit GitHub issues URL for further details |
Created attachment 1639407 [details] this is gluster log and statedump, very easy to reproduce Description of problem: glusterfsd do not release posix lock when multiple glusterfs client do flock -xo to the same file paralleled Version-Release number of selected component (if applicable): glusterfs7.0 How reproducible: Steps to Reproduce: 1. create a volume with one brick gluster volume create test3 192.168.0.14:/mnt/vol3-test force 2. mount the brick on two different node node name: node2 mkdir /mnt/test-vol3 mount -t glusterfs 192.168.0.14:/test3 /mnt/test-vol3 node name: test mkdir /mnt/test-vol3 mount -t glusterfs 192.168.0.14:/test3 /mnt/test-vol3 3.prepare same script to do flock on the two nodes [root@node2 ~]# vi flock.sh #!/bin/bash file=/mnt/test-vol3/test.log touch $file ( flock -xo 200 echo "client1 do something" > $file sleep 1 ) 200>$file [root@node2 ~]# vi repeat_flock.sh #!/bin/bash i=1 while [ "1" = "1" ] do ./flock.sh ((i=i+1)) echo $i done similar script on "test" node [root@test ~]# vi flock.sh #!/bin/bash file=/mnt/test-vol3/test.log touch $file ( flock -xo 200 echo "client2 do something" > $file sleep 1 ) 200>$file [root@test ~]# vi repeat_flock.sh #!/bin/bash i=1 while [ "1" = "1" ] do ./flock.sh ((i=i+1)) echo $i done 4. start repeat_flock.sh on two nodes not need much time, the two scripts will stuck, [root@test ~]# ./repeat_flock.sh 2 3 4 5 6 7 [root@node2 ~]# ./repeat_flock.sh 2 issue reproduced 5. do statedump on the volume test3 gluster v statedump test3 [xlator.features.locks.test3-locks.inode] path=/test.log mandatory=0 posixlk-count=3 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 22752, owner=8c9cd93f8ee486a0, client=0x7f76e8082100, connection-id=CTX_ID:7da20ab3-cc70-41bd-ab83-955481288ba2-GRAPH_ID:0-PID:22649-HOST:node2-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:12, granted at 2019-11-25 08:30:12 posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 10928, owner=b42ee151db035df9, client=0x7f76e0006390, connection-id=CTX_ID:c4cf488c-2d8e-4f7c-87e9-a0cb1f2648cd-GRAPH_ID:0-PID:10850-HOST:test-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:12 posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22757, owner=f62dd9ff96cefaf5, client=0x7f76e8082100, connection-id=CTX_ID:7da20ab3-cc70-41bd-ab83-955481288ba2-GRAPH_ID:0-PID:22649-HOST:node2-PC_NAME:test3-client-0-RECON_NO:-0, blocked at 2019-11-25 08:30:13 Actual results: both two repeat_flock.sh on two nodes will stuck, and the lock held forever Expected results: both two repeat_flock.sh on two nodes should not be stuck Additional info: