1411338 – [GNFS+EC] lock is being granted to 2 different client for the same data range at a time after performing lock acquire/release from the clients1

Bug 1411338 - [GNFS+EC] lock is being granted to 2 different client for the same data range at a time after performing lock acquire/release from the clients1

Summary: [GNFS+EC] lock is being granted to 2 different client for the same data range...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	locks
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.3.1
Assignee:	Pranith Kumar K
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	1411344 1444515 1455049 1462121
Blocks:	1475687
TreeView+	depends on / blocked

Reported:	2017-01-09 14:17 UTC by Manisha Saini
Modified:	2018-08-16 08:49 UTC (History)
CC List:	15 users (show)
Fixed In Version:	glusterfs-3.8.4-46
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1472961 (view as bug list)
Environment:
Last Closed:	2017-11-29 03:29:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
annotated tshark output while looking for incorrect behaviour in gNFS/NLM (6.75 KB, text/plain) 2017-07-19 09:16 UTC, Niels de Vos	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:3276	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2017-11-29 08:28:52 UTC

Description Manisha Saini 2017-01-09 14:17:27 UTC

Description of problem:
When the lock is taken by client1 and the other i.e client2 tries to take the lock,the lock is being held(blocked) for client2 as it is already granted to client1

Now release the lock from client1.Lock got granted to client 2.
Now again try taking lock from client1.Lock is granted,which should not.As the file is already being locked by client 2.

Version-Release number of selected component (if applicable):
glusterfs-3.8.4-10.el7rhgs.x86_64

How reproducible:
Consistently

Steps to Reproduce:
1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it
2.Mount the volume from single server to 2 different client
3.Create 512 Bytes of file from 1 client on mount point
4.Take lock from client 1.Lock is acquired
5.Try taking lock from client 2.Lock is blocked (as already being taken by client 1)
6.Release lock from client1.Take lock from client2
7.Again try taking lock from client 1.

Actual results:
Lock is being granted to client1

Expected results:
Lock should not be granted to client 1 as lock is currently being held by client 2
Additional info:

Comment 2 Niels de Vos 2017-01-10 11:01:59 UTC

Is this is a regression compared to older versions? I also would like to know if this happens with any type of volume/configuration, or only with disperse?

Comment 3 Manisha Saini 2017-01-10 13:45:32 UTC

Neils,

The issue is only observed with EC Volume.
I tried the same lock test case on Distributed-Replicate+GNFS,this issue was not observed

Comment 4 Manisha Saini 2017-01-10 13:52:26 UTC

Even tested with EC+Fuse Mount,issue was not observed.

This issue is specific to EC+GNFS mount

Comment 5 surabhi 2017-01-10 14:01:19 UTC

We are not sure if this is a regression or not as very extensive lock tests were not done on gNFS from past few releases.

Comment 6 Atin Mukherjee 2017-01-12 03:37:47 UTC

Pranith - Could you check if the RCA of this BZ is same as BZ 1408705?

Comment 7 Atin Mukherjee 2017-01-31 11:25:12 UTC

I had a chat with Pranith and he claims that this BZ is not related to 1408705 as the same test passes with Fuse.

Comment 11 Manisha Saini 2017-04-21 13:45:08 UTC

While verifying the issue still persist with latest gluster bits,I hit the bug -https://bugzilla.redhat.com/show_bug.cgi?id=1444515 ,in which it got stuck in step 1 itself . 

Due to this I am unable to proceed further for verification.

Comment 12 Niels de Vos 2017-05-15 10:16:58 UTC

(In reply to Manisha Saini from comment #11)
> While verifying the issue still persist with latest gluster bits,I hit the
> bug -https://bugzilla.redhat.com/show_bug.cgi?id=1444515 ,in which it got
> stuck in step 1 itself . 

Same here...

Comment 14 Pranith Kumar K 2017-06-17 06:49:22 UTC

Manisha,
   I tried to verify this using nfs, but I get the following error:
root@dhcp35-190 - /mnt/nfs 
12:11:36 :) ⚡ /root/a.out a
opening a
opened; hit Enter to lock... 
locking
fcntl failed (Permission denied) <<<----
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking



On fuse mounts I am able to verify that it worked fine after the fix to https://bugzilla.redhat.com/show_bug.cgi?id=1444515.

This is how I mounted:
mount -t nfs -o vers=3 localhost.localdomain:/ec2 /mnt/nfs

As per the documentation(https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Setting%20Up%20Clients/) we don't need to do anything specially for NLM to be in action.
Do let me know if I did anything wrong here.

Comment 15 Manisha Saini 2017-06-19 12:53:28 UTC

Hey Pranith,
We need to open the NLM port on client for locking.


# rpcinfo -p
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp    662  status
    100024    1   tcp    662  status
    100005    1   udp    892  mountd
    100005    1   tcp    892  mountd
    100005    2   udp    892  mountd
    100005    2   tcp    892  mountd
    100005    3   udp    892  mountd
    100005    3   tcp    892  mountd
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    3   udp   2049  nfs_acl
    100021    1   udp  43091  nlockmgr
    100021    3   udp  43091  nlockmgr
    100021    4   udp  43091  nlockmgr
    100021    1   tcp  42369  nlockmgr
    100021    3   tcp  42369  nlockmgr
    100021    4   tcp  42369  nlockmgr


[root@dhcp37-192 home]# firewall-cmd --add-port=42369/tcp
success
[root@dhcp37-192 home]# firewall-cmd --add-port=42369/tcp --permanent
success
[root@dhcp37-192 home]# firewall-cmd --add-port=43091/udp
success
[root@dhcp37-192 home]# firewall-cmd --add-port=43091/udp --permanent

# ./a.out /mnt/gnfs/1G 
opening /mnt/gnfs/1G
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking

Comment 16 Manisha Saini 2017-06-21 07:35:18 UTC

 Tested the use case reported in this bug  with the scratch rpm's provided by pranith.

I was able to take the lock from 2 clients at a time on the same file (Which should not allow to take the lock at the same time)

1st client->Took a lock on file 1 ->Lock Granted
2nd Client->Tried taking lock on same file ->Lock Hold
1st Client->Released the lock
2nd Client->Lock got acquired
1st Client->Tried taking Lock on same file ->Lock got acquired (even when it was there with client 2)
1st client->Release the lock
2nd client->fcntl UNLOCK failed (No locks available)

Even GNFS got crashed as already reported issue -https://bugzilla.redhat.com/show_bug.cgi?id=1411344




Client 1:

[root@dhcp37-192 home]# ./a.out /mnt/gnfs_scratch/1G
opening /mnt/gnfs_scratch/1G
opened; hit Enter to lock...
locking
locked; hit Enter to write...
Write succeeeded
locked; hit Enter to unlock...
unlocking
[root@dhcp37-192 home]# ./a.out /mnt/gnfs_scratch/1G
opening /mnt/gnfs_scratch/1G
opened; hit Enter to lock...
locking
locked; hit Enter to write...
Write succeeeded
locked; hit Enter to unlock...
unlocking


Client 2:

[root@dhcp37-142 gnfs_scratch]# ls
1G
[root@dhcp37-142 gnfs_scratch]# cd /home/
[root@dhcp37-142 home]# ./a.out /mnt/gnfs_scratch/1G
opening /mnt/gnfs_scratch/1G
opened; hit Enter to lock...
locking
locked; hit Enter to write...
Write succeeeded
locked; hit Enter to unlock...
unlocking
fcntl UNLOCK failed (No locks available)

Comment 18 Niels de Vos 2017-07-05 07:56:38 UTC

With the changes posted for bug 1411344, this test passes as well (for me).

Comment 19 Atin Mukherjee 2017-07-05 10:19:55 UTC

nfs: make nfs3_call_state_t refcounted
- https://review.gluster.org/17696

nfs/nlm: unref fds in nlm_client_free()
- https://review.gluster.org/17697

nfs/nlm: handle reconnect for non-NLM4_LOCK requests
- https://review.gluster.org/17698

nfs/nlm: use refcounting for nfs3_call_state_t
- https://review.gluster.org/17699

nfs/nlm: keep track of the call-state and frame for notifications
- https://review.gluster.org/17700

Comment 20 Atin Mukherjee 2017-07-09 13:42:05 UTC

Backports posted for downstream:

https://code.engineering.redhat.com/gerrit/#/q/project:rhs-glusterfs+branch:rhgs-3.3.0+topic:bug-1411344

Comment 22 Manisha Saini 2017-07-17 13:34:06 UTC


I can reproduce the same issue with glusterfs-3.8.4-33.el7rhgs.x86_64 build 

Steps:
1.Mount the volume to two different clients.
2.Take lock from client 1 on 1G file ->Lock Granted

[root@dhcp37-192 home]# ./a.out /mnt/GNFS_33/1G 
opening /mnt/GNFS_33/1G
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock...

3.Take the lock on same file from client2 ->Lock is blocked

[root@dhcp37-142 home]# ./a.out /mnt/GNFS_33/1G 
opening /mnt/GNFS_33/1G
opened; hit Enter to lock... 
locking


4.Release the lock from client 1.Lock will be granted to client 2

Client 1:
locked; hit Enter to unlock... 
unlocking

Client 2:
[root@dhcp37-142 home]# ./a.out /mnt/GNFS_33/1G 
opening /mnt/GNFS_33/1G
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 

5.Try taking Lock from client 1 on same file -> Lock is Granted,which should not,As client 2 is still holding lock on same file

Client 2:
[root@dhcp37-142 home]# ./a.out /mnt/GNFS_33/1G 
opening /mnt/GNFS_33/1G
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 

Client 1:
[root@dhcp37-192 home]# ./a.out /mnt/GNFS_33/1G 
opening /mnt/GNFS_33/1G
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 



Following messages are observed in nfs.log file-

[2017-07-17 12:55:17.343118] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-disperseVol-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
[2017-07-17 13:02:41.467613] N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'
[2017-07-17 13:02:41.468088] N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'
[2017-07-17 13:06:12.032620] N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'
[2017-07-17 13:06:12.033221] N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'
[2017-07-17 13:14:43.251547] W [socket.c:595:__socket_rwv] 0-NLM-client: readv on 10.70.37.142:43882 failed (No data available)
[2017-07-17 13:20:22.011378] W [socket.c:595:__socket_rwv] 0-NLM-client: readv on 10.70.37.192:35323 failed (No data available)
[2017-07-17 13:27:48.099835] N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'
The message "N [MSGID: 122055] [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'" repeated 63 times between [2017-07-17 13:27:48.099835] and [2017-07-17 13:27:48.109673]


# rpm -qa | grep gluster
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch

Comment 23 Manisha Saini 2017-07-17 13:37:43 UTC

Tested on  2 x (4 + 2)  Distributed-Disperse Volume

Comment 24 Niels de Vos 2017-07-17 14:34:26 UTC

Hi Manisha,

because there are specific EC log messages, I wonder if this problem does occur on non-ec volumes too? If the problem is in gNFS/NLM, you should also not see this on FUSE mounts.


Ashish, Sunil, could you explain what the following log message means?

[ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in answers of 'GF_FOP_LK'

I'd like to know if the lock gets granted in this case, or not. It might be interesting for us to know how this can happen. I do not understand why different bricks would return different answers on the locking FOP.

Thanks!
Niels

Comment 25 Pranith Kumar K 2017-07-17 15:10:10 UTC

(In reply to Niels de Vos from comment #24)
> Hi Manisha,
> 
> because there are specific EC log messages, I wonder if this problem does
> occur on non-ec volumes too? If the problem is in gNFS/NLM, you should also
> not see this on FUSE mounts.
> 
> 
> Ashish, Sunil, could you explain what the following log message means?
> 
> [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in
> answers of 'GF_FOP_LK'

This means different bricks responded differently. This can happen if any of the flock parameters are different. You may want to get a tcpdump of what happened.

> 
> I'd like to know if the lock gets granted in this case, or not. It might be
> interesting for us to know how this can happen. I do not understand why
> different bricks would return different answers on the locking FOP.

This code path gets executed when the locks are granted i.e. op_ret is '0'

My guess is that some of the locks are either merged/cut may be? Because the earlier locks which we thought are unlocked are not unlocked?

We will definitely know more if we have tcpdumps.

I am clearing the needinfo on Ashish and Sunil.

> 
> Thanks!
> Niels

Comment 27 Manisha Saini 2017-07-18 09:06:07 UTC

(In reply to Niels de Vos from comment #24)
> Hi Manisha,
> 
> because there are specific EC log messages, I wonder if this problem does
> occur on non-ec volumes too? If the problem is in gNFS/NLM, you should also
> not see this on FUSE mounts.
> 
> 
> Ashish, Sunil, could you explain what the following log message means?
> 
> [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in
> answers of 'GF_FOP_LK'
> 
> I'd like to know if the lock gets granted in this case, or not. It might be
> interesting for us to know how this can happen. I do not understand why
> different bricks would return different answers on the locking FOP.
> 
> Thanks!
> Niels

Niels,

I tested the same on 

--->Dist-replicate Voume + GNFS -> Issue is not observed.Seems to be problem with only EC Volume

--->Dist-replicate Voume + Fuse -> Works fine.Issue is not observed.

--->EC+Fuse Mount-> 

Took lock on File from Client 1 -> Lock Granted
Tried Taking lock from Client 2-> Lock is Not Granted
Release the Lock from Client 1-> Unable to Release the Lock from Client 1
Seems again the issue with EC Volume

Client 1:

# ./a.out /mnt/GNFS_33/1G_newFile 
opening /mnt/GNFS_33/1G_newFile
opened; hit Enter to lock... 
locking
locked; hit Enter to write... 
Write succeeeded 
locked; hit Enter to unlock... 
unlocking


Client 2:

# cd /home/
[root@dhcp37-142 home]# ./a.out /mnt/GNFS_33/1G_newFile 
opening /mnt/GNFS_33/1G_newFile
opened; hit Enter to lock... 
locking

Comment 29 Ashish Pandey 2017-07-18 11:56:15 UTC

(In reply to Manisha Saini from comment #27)
> (In reply to Niels de Vos from comment #24)
> > Hi Manisha,
> > 
> > because there are specific EC log messages, I wonder if this problem does
> > occur on non-ec volumes too? If the problem is in gNFS/NLM, you should also
> > not see this on FUSE mounts.
> > 
> > 
> > Ashish, Sunil, could you explain what the following log message means?
> > 
> > [ec-locks.c:926:ec_combine_lk] 0-disperseVol-disperse-0: Mismatching lock in
> > answers of 'GF_FOP_LK'
> > 
> > I'd like to know if the lock gets granted in this case, or not. It might be
> > interesting for us to know how this can happen. I do not understand why
> > different bricks would return different answers on the locking FOP.
> > 
> > Thanks!
> > Niels
> 
> Niels,
> 
> I tested the same on 
> 
> --->Dist-replicate Voume + GNFS -> Issue is not observed.Seems to be problem
> with only EC Volume
> 
> --->Dist-replicate Voume + Fuse -> Works fine.Issue is not observed.
> 
> --->EC+Fuse Mount-> 
> 
> Took lock on File from Client 1 -> Lock Granted
> Tried Taking lock from Client 2-> Lock is Not Granted
> Release the Lock from Client 1-> Unable to Release the Lock from Client 1
> Seems again the issue with EC Volume
> 
> Client 1:
> 
> # ./a.out /mnt/GNFS_33/1G_newFile 
> opening /mnt/GNFS_33/1G_newFile
> opened; hit Enter to lock... 
> locking
> locked; hit Enter to write... 
> Write succeeeded 
> locked; hit Enter to unlock... 
> unlocking
> 
> 
> Client 2:
> 
> # cd /home/
> [root@dhcp37-142 home]# ./a.out /mnt/GNFS_33/1G_newFile 
> opening /mnt/GNFS_33/1G_newFile
> opened; hit Enter to lock... 
> locking

I checked it found that for fuse mount client version was old. So it was not having our fix.
I have also verified just now and found that It is working finw with latest upstream .

Comment 36 Niels de Vos 2017-07-19 09:16:18 UTC

Created attachment 1300929 [details]
annotated tshark output while looking for incorrect behaviour in gNFS/NLM

The attached NOTES.txt contains the details of the tcpdump that was taken on the gNFS server. This tcpdump contains all the network packets that are required for the analysis (both NFS-clients mount from the same gNFS server).

From what I can gather out of the tcpdump, the NLM part of gNFS just passes the replies to/back from the EC volume. It does not look like EC denies the 2nd lock request of the 1st client, eventhough the 2nd client already has the lock.

I suspect that there is a missing check on the lk_owner somewhere in EC. The lk_owner is checked in EC at different places, so it is not trivial (for me) to find out where this could be missing.

Could someone from the EC team have a look and verify my analysis?

Comment 37 Niels de Vos 2017-07-19 09:34:41 UTC

Actually, because the LK requests are sent to the bricks, this might be a problem in the locks xlator there?

Comment 38 Pranith Kumar K 2017-07-19 13:45:43 UTC

(In reply to Niels de Vos from comment #37)
> Actually, because the LK requests are sent to the bricks, this might be a
> problem in the locks xlator there?

Here is the RC for the bug:
Frame 552 in the tcpdump sends the following request:
Type: GF_LK_F_WRLCK (1)
Whence: SEEK_SET (0)
Start: 0
Length: 10
PID: 5 <<<<<<<<------------------------
Owner: 35406468637033372d3134322e6c61622e656e672e626c72...

Response in frame 656:
Type: GF_LK_F_WRLCK (1)
Whence: SEEK_SET (0)
Start: 0
Length: 10
PID: 1 <<<<<<<<------------------------
Owner: 35406468637033372d3134322e6c61622e656e672e626c72...

Rest of the 5 responses have PID: 5

The PID of request and response are different. This leads to EC failing lk request with EIO,

ec_lock_check() has the following code to reject a response where multiple groups exist with op_ret >= 0. Since the pids of the responses are different this code gets executed.
    list_for_each_entry(ans, &fop->cbk_list, list) {
        if (ans->op_ret >= 0) {
            if (locked != 0) {
                error = EIO;
            }
            locked |= ans->mask;
            cbk = ans;

This leads to unlock of the granted locks frames 673-678. That is the reason other client is able to take the lock.

Locks xlator uses frame->root->pid and flock->l_pid interchangeably assuming those pids would be same. Where as in NFS frame->root->pid is '1' and flock->l_pid is 5. This is the reason why the issue appears only with gNFS and not with Fuse. Because replicate is not strict about pid, it doesn't treat it as error, so this works fine in gNFS+afr as well.

Credentials
Flavor: AUTH_GLUSTERFS (390039)
Length: 60
PID: 1
...


I did this test in fuse and this is how it appears:

Thread 21 "glusterfsd" hit Breakpoint 1, pl_lk (frame=0x7fbd54001e20, this=0x7fbd90016df0, fd=0x7fbd70002880, cmd=6, flock=0x7fbd7000ad30, xdata=0x0) at posix.c:2231
2231	        pl_inode_t   *pl_inode   = NULL;
(gdb) p frame->root->pid 
$1 = 8585
(gdb) p flock->l_pid 
$2 = 8585

Both gNFS and EC are doing the right thing, so the only potential fix has to be in locks xlator. The only problem is, the usage of frame->root->pid vs flock->l_pid is all over the place in locks xlator. So if we have to make that change, then we will need confidence that it doesn't break anything else. To reduce risk, I made the change in a localised way and we can worry about other places a bit later, may be for 3.4.0?

This is the patch:
diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c
index 95aa749fe..08950a89d 100644
--- a/xlators/features/locks/src/common.c
+++ b/xlators/features/locks/src/common.c
@@ -921,7 +921,8 @@ __grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode, struct list_head *g
                         conf->frame = l->frame;
                         l->frame = NULL;
 
-                        posix_lock_to_flock (l, &conf->user_flock);
+                        memcpy (&conf->user_flock, &l->user_flock,
+                                sizeof(conf->user_flock));
 
                         gf_log (this->name, GF_LOG_TRACE,
                                 "%s (pid=%d) lk-owner:%s %"PRId64" - %"PRId64" => Granted",

Manisha,
       How can we increase confidence that this fix is not breaking anything else? What tests do you do for posix-locks?

Pranith

Comment 40 Manisha Saini 2017-07-19 17:18:51 UTC

(In reply to Pranith Kumar K from comment #38)
> (In reply to Niels de Vos from comment #37)
> > Actually, because the LK requests are sent to the bricks, this might be a
> > problem in the locks xlator there?
> 
> Here is the RC for the bug:
> Frame 552 in the tcpdump sends the following request:
> Type: GF_LK_F_WRLCK (1)
> Whence: SEEK_SET (0)
> Start: 0
> Length: 10
> PID: 5 <<<<<<<<------------------------
> Owner: 35406468637033372d3134322e6c61622e656e672e626c72...
> 
> Response in frame 656:
> Type: GF_LK_F_WRLCK (1)
> Whence: SEEK_SET (0)
> Start: 0
> Length: 10
> PID: 1 <<<<<<<<------------------------
> Owner: 35406468637033372d3134322e6c61622e656e672e626c72...
> 
> Rest of the 5 responses have PID: 5
> 
> The PID of request and response are different. This leads to EC failing lk
> request with EIO,
> 
> ec_lock_check() has the following code to reject a response where multiple
> groups exist with op_ret >= 0. Since the pids of the responses are different
> this code gets executed.
>     list_for_each_entry(ans, &fop->cbk_list, list) {
>         if (ans->op_ret >= 0) {
>             if (locked != 0) {
>                 error = EIO;
>             }
>             locked |= ans->mask;
>             cbk = ans;
> 
> This leads to unlock of the granted locks frames 673-678. That is the reason
> other client is able to take the lock.
> 
> Locks xlator uses frame->root->pid and flock->l_pid interchangeably assuming
> those pids would be same. Where as in NFS frame->root->pid is '1' and
> flock->l_pid is 5. This is the reason why the issue appears only with gNFS
> and not with Fuse. Because replicate is not strict about pid, it doesn't
> treat it as error, so this works fine in gNFS+afr as well.
> 
> Credentials
> Flavor: AUTH_GLUSTERFS (390039)
> Length: 60
> PID: 1
> ...
> 
> 
> I did this test in fuse and this is how it appears:
> 
> Thread 21 "glusterfsd" hit Breakpoint 1, pl_lk (frame=0x7fbd54001e20,
> this=0x7fbd90016df0, fd=0x7fbd70002880, cmd=6, flock=0x7fbd7000ad30,
> xdata=0x0) at posix.c:2231
> 2231	        pl_inode_t   *pl_inode   = NULL;
> (gdb) p frame->root->pid 
> $1 = 8585
> (gdb) p flock->l_pid 
> $2 = 8585
> 
> Both gNFS and EC are doing the right thing, so the only potential fix has to
> be in locks xlator. The only problem is, the usage of frame->root->pid vs
> flock->l_pid is all over the place in locks xlator. So if we have to make
> that change, then we will need confidence that it doesn't break anything
> else. To reduce risk, I made the change in a localised way and we can worry
> about other places a bit later, may be for 3.4.0?
> 
> This is the patch:
> diff --git a/xlators/features/locks/src/common.c
> b/xlators/features/locks/src/common.c
> index 95aa749fe..08950a89d 100644
> --- a/xlators/features/locks/src/common.c
> +++ b/xlators/features/locks/src/common.c
> @@ -921,7 +921,8 @@ __grant_blocked_locks (xlator_t *this, pl_inode_t
> *pl_inode, struct list_head *g
>                          conf->frame = l->frame;
>                          l->frame = NULL;
>  
> -                        posix_lock_to_flock (l, &conf->user_flock);
> +                        memcpy (&conf->user_flock, &l->user_flock,
> +                                sizeof(conf->user_flock));
>  
>                          gf_log (this->name, GF_LOG_TRACE,
>                                  "%s (pid=%d) lk-owner:%s %"PRId64" -
> %"PRId64" => Granted",
> 
> Manisha,
>        How can we increase confidence that this fix is not breaking anything
> else? What tests do you do for posix-locks?
> 
> Pranith

Pranith

As a sanity we can run cthon lock suit, fs sanity lock test and posix compliance

Comment 43 Atin Mukherjee 2017-07-20 04:09:41 UTC

upstream patch : https://review.gluster.org/#/c/17826/

Comment 52 Pranith Kumar K 2017-09-26 07:44:20 UTC

(In reply to Atin Mukherjee from comment #43)
> upstream patch : https://review.gluster.org/#/c/17826/

To take this patch in, we also need to port: https://review.gluster.org/16838 as a dependent patch.

Comment 56 Manisha Saini 2017-10-06 05:29:17 UTC

Verified this bug on glusterfs-3.8.4-46.el7rhgs.x86_64

The issue reported in this bug is not reproducible with this bug.Hence moving this bug to verified state

Comment 59 errata-xmlrpc 2017-11-29 03:29:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3276

Note You need to log in before you can comment on or make changes to this bug.