Bug 1035491 - File lock propagates on failover but not failback
Summary: File lock propagates on failover but not failback
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-27 22:17 UTC by Mike Watkins
Modified: 2015-12-03 17:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-03 17:10:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
test C lock program (2.94 KB, text/x-c++src)
2013-11-27 22:17 UTC, Mike Watkins
no flags Details

Description Mike Watkins 2013-11-27 22:17:13 UTC
Created attachment 829924 [details]
test C lock program

Description of problem:

This bug seems related to BZ 856985 - Lock migration
And, current testing was done with a similar locktest.c program (attached to this case)

I've uploaded videos here to show the behavior
https://drive.google.com/?tab=wo&authuser=0#folders/0B6xfdVrKfseUQld2Z0NDTF94eFE

Specifically, look at the + failback video examples.

The setup is that there are 2 VMs (ie servers) both running the locktest program. vm1 has the lock and counts to 200, while the other vm is running locktest...waiting to aquire the lock on vm1 failure.

Both VMs are mounting (via glusterfs native client)
mount -t glusterfs 192.168.122.135:/amqtest /mnt/amqrhss
mount -t glusterfs 192.168.122.135:/amqdist /mnt/amqrhss

I'm testing with 2 RHS2.1 nodes:

host120=192.168.122.135 (aka rhs-node1)
host217=192.168.122.237 (aka rhs-node2)

[root@host120 ~]# gluster volume info
 
Volume Name: amqdist
Type: Distribute
Volume ID: 1e24295b-e83a-438e-85a3-9f0e0645f1cf
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.122.135:/gluster_brick/dist1
Brick2: 192.168.122.237:/gluster_brick/dist2
Options Reconfigured:
network.ping-timeout: 42
 
Volume Name: amqtest
Type: Replicate
Volume ID: 61162100-39d1-4124-b9cc-1ba7f262a0b7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.122.135:/gluster_brick/node1
Brick2: 192.168.122.237:/gluster_brick/node2
Options Reconfigured:
server.statedump-path: /tmp
[root@host120 ~]#

I'm my test, vm1 is holding lock, the other vm is waiting to aquire lock...and both are mounting with gluster native client using IP of 1st RHS node to mount.

When rhs-node1 goes offline, vm1 holds lock.
When rhs-node1 comes back online, vm1 holds lock.
When rhs-node2 goes offline, vm1 lock gets corrupted







Version-Release number of selected component (if applicable):

RHS 2.1 nodes:

[root@host217 ~]# rpm -qa gluster*
glusterfs-geo-replication-3.4.0.33rhs-1.el6rhs.x86_64
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-libs-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-3.4.0.33rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.33rhs-1.el6rhs.x86_64
gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch
gluster-swift-account-1.8.0-6.11.el6rhs.noarch
glusterfs-rdma-3.4.0.33rhs-1.el6rhs.x86_64
gluster-swift-plugin-1.8.0-7.el6rhs.noarch
glusterfs-api-3.4.0.33rhs-1.el6rhs.x86_64
gluster-swift-1.8.0-6.11.el6rhs.noarch
glusterfs-fuse-3.4.0.33rhs-1.el6rhs.x86_64
gluster-swift-object-1.8.0-6.11.el6rhs.noarch
[root@host217 ~]# 



Client running gluster 2.1 native client:

[root@vm3 ~]# mount
/dev/mapper/vg_vm3-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/vda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
192.168.122.135:/amqtest on /mnt/amqrhss type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
[root@vm3 ~]# 


[root@vm3 ~]# rpm -qa gluster*
glusterfs-fuse-3.4.0.36rhs-1.el6.x86_64
glusterfs-libs-3.4.0.36rhs-1.el6.x86_64
glusterfs-3.4.0.36rhs-1.el6.x86_64
[root@vm3 ~]# 


How reproducible:

See above description and link


Actual results:
lock gets lost resulting in fnctl() Errno(77) and (107)

Expected results:
RHS nodes and go offline, comeback online in any order and the master client retains file lock

Additional info:

Comment 2 Anand Avati 2013-12-11 08:46:33 UTC
I had a look at the vidoes. From what I understood, failover and failback of IO and locks is happening on distributed-replicated volume (as expected), but not working over distributed (non-replicated) volume. This is all as expected.

Depending on how the hashing algorithm has distributed the files, locktest will surely fail when the node storing the file is powered off. For high availability distributed-replicated volume is a requirement, and from what I understand of the videos, that is working.

Unless I have missed something, we can close this bug as NOTABUG.

Comment 3 Mike Watkins 2014-01-03 14:38:32 UTC
Anand,

I tested 2-node replicated and 2-node distributed volumes for my inital tests (and videos). This is evident from my gluster volume info output in this BZ. 

However, based on your comment about full HA requiring a distributed-replicated volume, I re-did my testing with this volume type.

[root@host120 ~]# gluster volume info amq-dist-rep-volume
 
Volume Name: amq-dist-rep-volume
Type: Distributed-Replicate
Volume ID: f55427c1-ef24-4c2c-86d4-2afc92675ee5
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 192.168.122.135:/gluster_brick/distrep
Brick2: 192.168.122.237:/gluster_brick/distrep
Brick3: 192.168.122.53:/gluster_brick/distrep
Brick4: 192.168.122.178:/gluster_brick/distrep
[root@host120 ~]# 

Note here the following
host120 = 192.168.122.135
host217 = 192.168.122.237
host300 = 192.168.122.53
host400 = 192.168.122.178

I then mounted the client...

[root@vm3 ~]# mount -t glusterfs 192.168.122.135:/amq-dist-rep-volume /mnt/amqrhss

Note here that I'm using 2 nodes for the client, running the locktest C program, namely vm1 and vm3.

I then ran the locktest C program on vm3, then on vm1 (waiting for lock) and powered off/on RHS host120, host217, host300 and host400 with the locktest program behaving as expected.

Please look at the new uploaded video in the gdrive link, named "C locktest with distrib-replic gluster volume + testing multiple single-node failures.webm"



So...this does work :) But I need a very technical explaination as to why this works with distributed-replicated volumes only. And, fails with distributed volume and with replicated volume 2-nodes setups.


Thanks,
Mike

Comment 4 Anand Avati 2014-01-03 19:31:34 UTC
Mike,
I think the video of replicated volume with failover (and failback) show that it works? But you mention " And, fails with distributed volume and with replicated volume 2-nodes setups." - but I could not find any failures in the replicated video.. Can you point to the exact filename and timestamp in the video which shows the failure?

Avati

Comment 5 Mike Watkins 2014-01-03 21:37:40 UTC
DISTRIBUTED VOLUME TEST
-----------------------
Video: C locktest with distributed gluster volume.webm
Status: Doesn't show failure since only powering off (then back on) 1st RHS node, and not running test long enough
I can run this test longer to see of ./locktest prog on other VM picks up the lock (as expected)

Video: C locktest with distributed gluster volume + failback.webm
Status: Fails ~4:38 mark


REPLICATED VOLUME TEST
----------------------
Video: C locktest with replicated gluster volume.webm
Status: Doesn't show failure since only powering off (then back on) 1st RHS node, and not running test long enough
I can run this test longer to see of ./locktest prog on other VM picks up the lock (as expected)

Video: C locktest with replicated gluster volume + failback.webm
Status: Fails ~5:57 mark. The locktest program is running on vm3 (holds lock, counts to 200). During this time, locktest is also running on vm1 (waiting to grab the lock when vm3 releases after count=200). I power off/on the first RHS node (host120), lock holds. Then, I power off/on 2nd RHS node (host217)...the lock appears to be still holding, but when the locktest counter=200, vm1 DOES NOT grab the lock as expected


DISTRIBUTED-REPLICATED VOLUME TEST
----------------------------------
Video: C locktest with distrib-replic gluster volume + testing multiple single-node failures.webm
Status: Active-lock VM (vm3) and waiting-for-lock VM (vm1) works as expected, EVEN after power off/on (ie failing) SEVERAL RHS nodes individually




If you can explain how lock behavior works on different volume types (distributed, replicated and distributed-replicated)...that will help. 

From my tests, you can see that the locktest program behaves differently for different volumes types.


Thanks,
Mike

Comment 6 Vivek Agarwal 2015-12-03 17:10:51 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.


Note You need to log in before you can comment on or make changes to this bug.