Description of problem: I have HA VM with a lease on gluster storage domain. When the host running the VM shuts down unexpectedly, the VM gets stuck in unknown status instead of failing over to a different and active host. I tried the same test both on an engine with libgfapi enabled and disabled. In both cases, the HA VM failed to start on another host. Version-Release number of selected component (if applicable): rhvm-4.2.2.1-0.1.el7.noarch vdsm-4.20.19-1.el7ev.x86_64 qemu-kvm-rhev-2.10.0-21.el7.x86_64 libvirt-3.9.0-13.el7.x86_64 glusterfs-3.8.4-54.el7.x86_64 glusterfs-api-3.8.4-54.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. create VM with a disk on gluster domain and HA lease on gluster domain 2. start the VM 3. block connection between engine and host running the VM, and between the same host and the storage domain with the VM's disk and lease Actual results: VM is stuck in UNKOWN status Expected results: VM should start on another active host Additional info: engine.log 2018-02-27 19:57:31,003+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-4418) [] EVENT_ID: VM_SET_TO_UNKNOWN_STATUS(142), VM vm_0_TestCase17618_2719513854 was set to the Unknown status. vdsm.log 2018-02-27 19:59:36,244+0200 ERROR (vm/4bea0481) [virt.vm] (vmId='4bea0481-65d3-461a-88f0-c78ed49a81ef') The vm start process failed (vm:940) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2813, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: internal error: process exited while connecting to monitor: 2018-02-27T17:59:32.923114Z qemu-kvm: -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__jenkins__ge5__volume__2/fd6e9ad0-51ff-447e-bc91-038ef7cb736e/images/99de1119-aa46-4d10-a1a4-f6371d24a8f9/565d618f-800e-4749-9404-5d526e1c5f5e,format=qcow2,if=none,id=drive-virtio-disk0,serial=99de1119-aa46-4d10-a1a4-f6371d24a8f9,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead 2018-02-27T17:59:33.199650Z qemu-kvm: -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__jenkins__ge5__volume__2/fd6e9ad0-51ff-447e-bc91-038ef7cb736e/images/99de1119-aa46-4d10-a1a4-f6371d24a8f9/565d618f-800e-4749-9404-5d526e1c5f5e,format=qcow2,if=none,id=drive-virtio-disk0,serial=99de1119-aa46-4d10-a1a4-f6371d24a8f9,cache=none,werror=stop,rerror=stop,aio=threads: Failed to get "write" lock Is another process using the image?
Created attachment 1401747 [details] logs
Seems like the same issue as bug 1547095 [only here it's gluster and there it's NFS v3]
(In reply to Lilach Zitnitski from comment #0) > Steps to Reproduce: > 1. create VM with a disk on gluster domain and HA lease on gluster domain > 2. start the VM > 3. block connection between engine and host running the VM, and between the > same host and the storage domain with the VM's disk and lease This is similar but not the tests we requested. Blocking access to storage may cause qemu to kill the qemu process, and in this case the NFS lock may be released, hiding the issue we want to test. We want to make sure the NFS lock on the client side exists on the original host. Please test also this: 3. power off the host, simulating power loss. Using a nested host would be best for this test. In this case other host can detect that the sanlock lease has expired, and engine will start the VM on another host.
Seems that the root cause is bug 1547095. Setting the bug to depend on that until we find a workaround.
Sahina, can you get someone from the gluster team to explain what is the expected behavior in glusterfs in this case? We have too use cases: - host becomes disconnected from network, so the host cannot reach any of the gluster nodes, and engine cannot reach the host. This is what was tested here, see comment 0. - host loose power. The vm is not running on the host, but no code on the host can update the gluster nodes about this. We we want to know what is the behaviour of the fcntl based locks used by qemu on the images. Do we have NFSv3 behavior when the locks remain until the client recover and connect again to the storage nodes, or NFSv4 behavior, when the lock expire after some time without any activity from the client? Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" mount option?
I've done some testing with this from the qemu commandline, and it is readily reproducible with qemu when using fuse-mounted gluster shares (as the description shows: "[...] file=/rhev/data-center/mnt/glusterSD/gluster01.scl.la[...]"). For instance, when doing the following: 1. Launch qemu against a raw image on a fuse-mounted share: # qemu-kvm -enable-kvm -drive file=/home/jcody/gfs/f25.img,format=raw,if=virtio,aio=native,cache=none,discard=unmap [...] 2. Disconnect host from network 3. Run qemu-io against that image from a different machine, with the same fuse mount: # qemu-io /home/jcody/gfs/f25.img can't open device /home/jcody/gfs/f25.img: Failed to get "write" lock Is another process using the image? However, if I wait ~10-15 minutes (sorry, I did not time it exactly), I do have a timeout on the fcntl lock, and I am able to then open the image from the same machine as step 3. Does this accurately capture the issue you are encountering? From the Description: > I tried the same test both on an engine with libgfapi enabled and disabled. > In both cases, the HA VM failed to start on another host. Does this mean the HA VM was using the gluster:// protocol, and still encountering this issue? The file locking is via fcntl in the posix file layer, so it does not make sense to me for it to be seen if the native gluster protocol is being used by qemu.
(In reply to Lilach Zitnitski from comment #0) > I tried the same test both on an engine with libgfapi enabled and disabled. > In both cases, the HA VM failed to start on another host. Lilach, the logs seems to contain only the libgfapi_disable tests: $ ls -lh libgfapi_disabled total 5.4M -rw-r--r--. 1 nsoffer nsoffer 182K Feb 27 20:15 engine.log -rw-r--r--. 1 nsoffer nsoffer 1.8M Feb 27 20:15 host_mixed_1_vdsm.log -rw-r--r--. 1 nsoffer nsoffer 3.4M Feb 27 20:16 host_mixed_2_vdsm.log Looking in the logs, the vm is *not* using libgfapi: <disk device="disk" snapshot="no" type="file"> <source file="/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__jenkins__ge5__volume__2/fd6e9ad0-51ff-447e-bc91-038ef7cb736e/images/99de1119-aa46-4d10-a1a4-f6371d24a8f9/565d618f-800e-4749-9404-5d526e1c5f5e"/> <target bus="virtio" dev="vda"/> <serial>99de1119-aa46-4d10-a1a4-f6371d24a8f9</serial> <boot order="1"/> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="qcow2"/> </disk> Please check again that the vm was started with a *network* disk and not with a *file*. If this fail with libgfapi, lets file another bug for this.
(In reply to Nir Soffer from comment #5) > Sahina, can you get someone from the gluster team to explain what is the > expected > behavior in glusterfs in this case? > > We have too use cases: > - host becomes disconnected from network, so the host cannot reach any of > the > gluster nodes, and engine cannot reach the host. This is what was tested > here, > see comment 0. > - host loose power. The vm is not running on the host, but no code on the > host > can update the gluster nodes about this. > > We we want to know what is the behaviour of the fcntl based locks used by > qemu > on the images. Do we have NFSv3 behavior when the locks remain until the > client > recover and connect again to the storage nodes, or NFSv4 behavior, when the > lock > expire after some time without any activity from the client? On fuse mount locks are relinquished the moment the client dies/disconnects. Do you have brick logs or setup where we can get statedump of the bricks when this issue happens, because it looks like that didn't happen when the host died in your setup? > > Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" > mount > option? Not that I know of but it is not difficult to implement.
Can this be relevant? https://review.gluster.org/#/c/16731/ Note that hard-shutdown of client may not be identified by brick without keep-alive/user-timeouts monitoring connection status on the brick side.
(In reply to Nir Soffer from comment #5) > We we want to know what is the behaviour of the fcntl based locks used by > qemu on the images. Do we have NFSv3 behavior when the locks remain until the > client recover and connect again to the storage nodes, or NFSv4 behavior, when > the lock expire after some time without any activity from the client? In GlusterFS we definitely do not remove locks on account of idle client connection. But the behaviour on disconnection is based on two volume set options namely 'lock-heal' and 'grace-timeout' which are off by default. These options and corresponding logic has been removed(from v4.0) due to its incompleteness implementation and lack of test coverage. The logic that existed till v4.0 is briefly explained in the following doc under the sections " What happens on client side after disconnect?:" and "What happens on server side after disconnect?:" https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt > Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" > mount option? No. @Pranith, Please feel free to add/comment.
(In reply to Anoop C S from comment #10) > (In reply to Nir Soffer from comment #5) > > We we want to know what is the behaviour of the fcntl based locks used by > > qemu on the images. Do we have NFSv3 behavior when the locks remain until the > > client recover and connect again to the storage nodes, or NFSv4 behavior, when > the lock expire after some time without any activity from the client? > > In GlusterFS we definitely do not remove locks on account of idle client > connection. But the behaviour on disconnection is based on two volume set > options namely 'lock-heal' and 'grace-timeout' which are off by default. > These options and corresponding logic has been removed(from v4.0) due to its > incompleteness implementation and lack of test coverage. The logic that > existed till v4.0 is briefly explained in the following doc under the > sections " > What happens on client side after disconnect?:" and "What happens on server > side after disconnect?:" > > https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt > > > Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" > > mount option? > > No. > > @Pranith, > Please feel free to add/comment. Nope, I am just waiting for the setup to see what the issue is.
(In reply to Pranith Kumar K from comment #11) > (In reply to Anoop C S from comment #10) > > (In reply to Nir Soffer from comment #5) > > > We we want to know what is the behaviour of the fcntl based locks used by > > > qemu on the images. Do we have NFSv3 behavior when the locks remain until the > > > client recover and connect again to the storage nodes, or NFSv4 behavior, when > the lock expire after some time without any activity from the client? > > > > In GlusterFS we definitely do not remove locks on account of idle client > > connection. But the behaviour on disconnection is based on two volume set > > options namely 'lock-heal' and 'grace-timeout' which are off by default. > > These options and corresponding logic has been removed(from v4.0) due to its > > incompleteness implementation and lack of test coverage. The logic that > > existed till v4.0 is briefly explained in the following doc under the > > sections " > > What happens on client side after disconnect?:" and "What happens on server > > side after disconnect?:" > > > > https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt > > > > > Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" > > > mount option? > > > > No. > > > > @Pranith, > > Please feel free to add/comment. > > Nope, I am just waiting for the setup to see what the issue is. I mean I don't have any further comments :-)
(In reply to Pranith Kumar K from comment #11) > (In reply to Anoop C S from comment #10) > > (In reply to Nir Soffer from comment #5) > > > We we want to know what is the behaviour of the fcntl based locks used by > > > qemu on the images. Do we have NFSv3 behavior when the locks remain until the > > > client recover and connect again to the storage nodes, or NFSv4 behavior, when > the lock expire after some time without any activity from the client? > > > > In GlusterFS we definitely do not remove locks on account of idle client > > connection. But the behaviour on disconnection is based on two volume set > > options namely 'lock-heal' and 'grace-timeout' which are off by default. > > These options and corresponding logic has been removed(from v4.0) due to its > > incompleteness implementation and lack of test coverage. The logic that > > existed till v4.0 is briefly explained in the following doc under the > > sections " > > What happens on client side after disconnect?:" and "What happens on server > > side after disconnect?:" > > > > https://anoopcs.fedorapeople.org/Lock%20recovery%20in%20GlusterFS.txt v4.0 in my above reply points to GlusterFS v4.0 *NOT* NFS protocol version. Sorry for the confusion. > > > Finally, do we have a way to disable remote locks, similar to NFSv3 "nolock" > > > mount option? > > > > No. > > > > @Pranith, > > Please feel free to add/comment. > > Nope, I am just waiting for the setup to see what the issue is.
Retargetting to RHV 4.1.10 (tentatively), as we did with bug 1550127. If the research here results in a conclusion that platform changes in RHEL 7.5 require vdsm changes, they should also be backported to 4.1.z.
Looking at the logs from Lilach, this issue is related only to using fuse mounts and not to the gluster native connection (libgfapi) which is the default. Lilach, please update the bug if you can reproduce this with libgfapi enabled.
Based on comments from glsuterfs (starting at comment 8) - glsuterfs locks do not behave as nfs v4 leases - I could not understand how they do behave - I don't think we can fix anything in vdsm - I think qemu should use local locks for glusterfs fuse mounts
(In reply to Nir Soffer from comment #15) > and not to the gluster native connection (libgfapi) which is the default. Turns out that libgfapi is the default only on RHHI systems (hyperconverged setup). On oVirt/RHV the default is still fuse mount, because live storage migration with gluster native disk is partly broken because of bug 1306562. So this bug effects all oVirt/RHV installation using glusterfs.
Lilach, in comment 6 Jef reports that he could acquire a lock on another host after 10-15 minutes. How much time did you wait in your test for the VM to start on another host? Can you run the test again, and wait 20 minutes after powering off the original host? I think engine will retry to run the VM on another host every few minutes.
Sas, could you provide a reproducer to Pranith as asked for in Comment 11
Nir, Can you please change the bug to be on qemu-kvm-rhev? as this is on RHEL side
(In reply to Raz Tamir from comment #20) > Nir, > > Can you please change the bug to be on qemu-kvm-rhev? as this is on RHEL side sorry for typo - *if* this is on RHEL side
Ademar, this bug is similar to 1547095. I don't think we can do anything about this in oVirt, and it should move to qemu.
(In reply to Nir Soffer from comment #3) > (In reply to Lilach Zitnitski from comment #0) > > Steps to Reproduce: > > 1. create VM with a disk on gluster domain and HA lease on gluster domain > > 2. start the VM > > 3. block connection between engine and host running the VM, and between the > > same host and the storage domain with the VM's disk and lease > > This is similar but not the tests we requested. > > Blocking access to storage may cause qemu to kill the qemu process, and in > this > case the NFS lock may be released, hiding the issue we want to test. We want > to > make sure the NFS lock on the client side exists on the original host. > > Please test also this: > > 3. power off the host, simulating power loss. Using a nested host would be > best > for this test. > > In this case other host can detect that the sanlock lease has expired, and > engine > will start the VM on another host. I ran some tests. When powering off the host, the engine starts the VM on a different host. I saw it when LibgfApi was enabled as well when it was disabled. I also tested it with killing the qemu process of the host's VM (using nested host) and the VM started on a different host as well. I'm adding the logs for all those tests.
Created attachment 1404803 [details] host poweroff logs Name of tested VM for searching the logs - vm_test_glusterfs
(In reply to Lilach Zitnitski from comment #23) > When powering off the host, the engine starts the VM on a different host. I > saw it when LibgfApi was enabled as well when it was disabled. > I also tested it with killing the qemu process of the host's VM (using > nested host) > and the VM started on a different host as well. Thanks, Lilach, I tested also on my setup, and oVirt is starting the vm successfuly after a host was powered off (simulating fatal failure like lost power). Here is my gluster volume sate (taken from gluster get-state): Volume73.name: nir1 Volume73.id: c1228d3f-2998-485b-8fb6-d5149a351a3b Volume73.type: Replicate Volume73.transport_type: tcp Volume73.status: Started Volume73.brickcount: 3 Volume73.Brick1.path: gluster01.scl.lab.tlv.redhat.com:/gluster_volumes/nir1 Volume73.Brick1.hostname: gluster01.scl.lab.tlv.redhat.com Volume73.Brick1.port: 49224 Volume73.Brick1.rdma_port: 0 Volume73.Brick1.status: Started Volume73.Brick1.spacefree: 168958840832Bytes Volume73.Brick1.spacetotal: 696518180864Bytes Volume73.Brick2.path: gluster02.scl.lab.tlv.redhat.com:/gluster_volumes/nir1 Volume73.Brick2.hostname: gluster02.scl.lab.tlv.redhat.com Volume73.Brick3.path: gluster03.scl.lab.tlv.redhat.com:/gluster_volumes/nir1 Volume73.Brick3.hostname: gluster03.scl.lab.tlv.redhat.com Volume73.snap_count: 0 Volume73.stripe_count: 1 Volume73.replica_count: 3 Volume73.subvol_count: 1 Volume73.arbiter_count: 0 Volume73.disperse_count: 0 Volume73.redundancy_count: 0 Volume73.quorum_status: not_applicable Volume73.snapd_svc.online_status: Offline Volume73.snapd_svc.inited: True Volume73.rebalance.id: 00000000-0000-0000-0000-000000000000 Volume73.rebalance.status: not_started Volume73.rebalance.failures: 0 Volume73.rebalance.skipped: 0 Volume73.rebalance.lookedup: 0 Volume73.rebalance.files: 0 Volume73.rebalance.data: 0Bytes Volume73.time_left: 0 Volume73.gsync_count: 0 Volume73.options.storage.owner-uid: 36 Volume73.options.storage.owner-gid: 36 Volume73.options.transport.address-family: inet Volume73.options.nfs.disable: on Seems that gluster has a mechanism to breaks locks when a host that took the locks was powered off. Is this behavior expected? if this is expected, what is the timeout used by gluster to break a lock after a client became idle? This seems to contradict comment 10, so maybe there was some misunderstanding.
Created attachment 1404933 [details] locking test script I tested locking behavior when a host is powered off: Run lock_test.py on host 1: # python lock_test.py glusterfs locking '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' got the lock in 0.00 seconds, sleeping Run same command on another host: # date; python lock_test.py glusterfs Tue Mar 6 18:19:10 IST 2018 locking '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' [Power off host 1 here] got the lock in 38.92 seconds, sleeping So it seems that glusterfs break the lock 40 seconds after a client stop communicating with the server.
(In reply to Lilach Zitnitski from comment #0) > Steps to Reproduce: > 1. create VM with a disk on gluster domain and HA lease on gluster domain > 2. start the VM > 3. block connection between engine and host running the VM, and between the > same host and the storage domain with the VM's disk and lease How did you block the connection to storage?
I tested locking behavior when blocking access to glusterfs storage. The gluster volume contains 3 bricks: Volume73.Brick1.hostname: gluster01.scl.lab.tlv.redhat.com Volume73.Brick2.hostname: gluster02.scl.lab.tlv.redhat.com Volume73.Brick3.hostname: gluster03.scl.lab.tlv.redhat.com 1. I started the locking test script (see attachment 1404933 [details]) on one host: [root@voodoo8 ~]# date; python lock_test.py glusterfs Tue Mar 6 19:27:26 IST 2018 locking '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' got the lock in 0.02 seconds, sleeping 2. Then started same script trying to acquire the lock on another host: [root@voodoo6 ~]# date; python lock_test.py glusterfs Tue Mar 6 19:27:43 IST 2018 locking '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' 3. Then blocked access to *all* bricks on the first host: [root@voodoo8 ~]# date; iptables -A OUTPUT -p tcp -d gluster01.scl.lab.tlv.redhat.com,gluster02.scl.lab.tlv.redhat.com,gluster03.scl.lab.tlv.redhat.com -j DROP Tue Mar 6 19:27:41 IST 2018 [root@voodoo8 ~]# iptables -L ... Chain OUTPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- anywhere anywhere udp dpt:6081 DROP tcp -- anywhere gluster01.scl.lab.tlv.redhat.com DROP tcp -- anywhere gluster02.scl.lab.tlv.redhat.com DROP tcp -- anywhere gluster03.scl.lab.tlv.redhat.com 4. After about 40 seconds, the second host got the lock: got the lock in 36.63 seconds, sleeping So it seems that test described in comment 0 was inccorrect. I did the same test with running vm: 1. Started vm on a host 2. On engine host, blocked access to the host: [root@voodoo4 ~]# date; iptables -A OUTPUT -p tcp -d voodoo8.tlv.redhat.com -j DROP Tue Mar 6 19:43:49 IST 2018 3. On the host, block access to all bricks: [root@voodoo8 ~]# date; iptables -A OUTPUT -p tcp -d gluster01.scl.lab.tlv.redhat.com,gluster02.scl.lab.tlv.redhat.com,gluster03.scl.lab.tlv.redhat.com -j DROP Tue Mar 6 19:44:29 IST 2018 4. On the host running the vm, we could see that sanlock lease was removed after about 80 seconds 5. The vm was paused: [root@voodoo8 ~]# virsh -r list Id Name State ---------------------------------------------------- 1 v42-ha-glusterfs paused 6. Engine started the vm on another host at 19:46:58 Lilach, can you confirm that blocking access to storage was not correct in you test?
(In reply to Sahina Bose from comment #19) > Sas, could you provide a reproducer to Pranith as asked for in Comment 11 Sahina, I have tried to reproduce with qemu-kvm-rhev & gluster with the following steps. 1. Created 3 VMs to run RHGS 3.3.1 async, and create replica 3 volume 2. Provisioned 2 baremetal machines with RHEL 7.5 RC 3. Fuse mount replica 3 volume on 2 physical hosts provisioned in 2 4. Created VM and started OS installation 5. While installation is in progress, turned off network ( systemctl stop network ),[ later repeated the experiment with powering off the node abruptly ] 6. Momentarily when I did 'qemu-io' on the same disk image from another host, I saw the lock # qemu-io /mnt/test/rhel75.img can't open device /mnt/test/rhel75.img: Failed to get "write" lock Is another process using the image? But in the very next attempt, the lock was released and I could access the file. I have also tried to rerun the test by powering off the host abruptly, blocking all IO from one of the host to all gluster bricks. In all such cases, I haven't seen any locks
(In reply to SATHEESARAN from comment #30) > (In reply to Sahina Bose from comment #19) > > Sas, could you provide a reproducer to Pranith as asked for in Comment 11 > > Sahina, > > I have tried to reproduce with qemu-kvm-rhev & gluster with the following > steps. > > 1. Created 3 VMs to run RHGS 3.3.1 async, and create replica 3 volume > 2. Provisioned 2 baremetal machines with RHEL 7.5 RC > 3. Fuse mount replica 3 volume on 2 physical hosts provisioned in 2 > 4. Created VM and started OS installation > 5. While installation is in progress, turned off network ( systemctl stop > network ),[ later repeated the experiment with powering off the node > abruptly ] > 6. Momentarily when I did 'qemu-io' on the same disk image from another > host, I saw the lock > > # qemu-io /mnt/test/rhel75.img > > can't open device /mnt/test/rhel75.img: Failed to get "write" lock > Is another process using the image? > > But in the very next attempt, the lock was released and I could access the > file. > > I have also tried to rerun the test by powering off the host abruptly, > blocking all IO from one of the host to all gluster bricks. In all such > cases, I haven't seen any locks Tested with glusterfs-3.8.4-54.el7rhgs qemu-kvm-rhev-2.10.0-21.el7.x86_64 libvirt-daemon-driver-qemu-3.9.0-13.el7.x86_64 qemu-kvm-common-rhev-2.10.0-21.el7.x86_64 qemu-img-rhev-2.10.0-21.el7.x86_64 kernel-3.10.0-858.el7.x86_64
(In reply to Nir Soffer from comment #27) > Created attachment 1404933 [details] > locking test script > > I tested locking behavior when a host is powered off: > > Run lock_test.py on host 1: > > # python lock_test.py glusterfs > locking > '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' > got the lock in 0.00 seconds, sleeping > > Run same command on another host: > > # date; python lock_test.py glusterfs > Tue Mar 6 18:19:10 IST 2018 > locking > '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' > > [Power off host 1 here] This indicates a clear situation where client tcp connection with server is lost(client being idle is a different case). > got the lock in 38.92 seconds, sleeping > So it seems that glusterfs break the lock 40 seconds after a client stop > communicating with the server. Locks are not flushed unless server receives a RPCSVC_EVENT_DISCONNECT event post a client disconnection. I think the delay in receiving *_DISCONNECT event in server is due to the effect of socket-keepalive interval time following client being powered off(may not get enough time for graceful socket cleanup). Are you able to match the time at which host was powered off with entries in brick logs under /var/log/glusterfs/bricks/ containing keywords "..disconnecting connection from .." ? If you could see time difference between two then my assumption can be confirmed.
(In reply to Nir Soffer from comment #29) > I tested locking behavior when blocking access to glusterfs storage. > > The gluster volume contains 3 bricks: > > Volume73.Brick1.hostname: gluster01.scl.lab.tlv.redhat.com > Volume73.Brick2.hostname: gluster02.scl.lab.tlv.redhat.com > Volume73.Brick3.hostname: gluster03.scl.lab.tlv.redhat.com > > 1. I started the locking test script (see attachment 1404933 [details]) on > one host: > > [root@voodoo8 ~]# date; python lock_test.py glusterfs > Tue Mar 6 19:27:26 IST 2018 > locking > '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' > got the lock in 0.02 seconds, sleeping > > 2. Then started same script trying to acquire the lock on another host: > > [root@voodoo6 ~]# date; python lock_test.py glusterfs > Tue Mar 6 19:27:43 IST 2018 > locking > '/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_nir1/lock' > > 3. Then blocked access to *all* bricks on the first host: > > [root@voodoo8 ~]# date; iptables -A OUTPUT -p tcp -d > gluster01.scl.lab.tlv.redhat.com,gluster02.scl.lab.tlv.redhat.com,gluster03. > scl.lab.tlv.redhat.com -j DROP > Tue Mar 6 19:27:41 IST 2018 > [root@voodoo8 ~]# iptables -L > ... > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > ACCEPT udp -- anywhere anywhere udp dpt:6081 > DROP tcp -- anywhere gluster01.scl.lab.tlv.redhat.com > DROP tcp -- anywhere gluster02.scl.lab.tlv.redhat.com > DROP tcp -- anywhere gluster03.scl.lab.tlv.redhat.com > > 4. After about 40 seconds, the second host got the lock: > > got the lock in 36.63 seconds, sleeping > > So it seems that test described in comment 0 was inccorrect. > > > I did the same test with running vm: > > 1. Started vm on a host > > 2. On engine host, blocked access to the host: > > [root@voodoo4 ~]# date; iptables -A OUTPUT -p tcp -d voodoo8.tlv.redhat.com > -j DROP > Tue Mar 6 19:43:49 IST 2018 > > 3. On the host, block access to all bricks: > > [root@voodoo8 ~]# date; iptables -A OUTPUT -p tcp -d > gluster01.scl.lab.tlv.redhat.com,gluster02.scl.lab.tlv.redhat.com,gluster03. > scl.lab.tlv.redhat.com -j DROP > Tue Mar 6 19:44:29 IST 2018 > > 4. On the host running the vm, we could see that sanlock lease was removed > after about 80 seconds > > 5. The vm was paused: > > [root@voodoo8 ~]# virsh -r list > Id Name State > ---------------------------------------------------- > 1 v42-ha-glusterfs paused > > 6. Engine started the vm on another host at 19:46:58 > > > Lilach, can you confirm that blocking access to storage was not correct in > you test? Yes, you're right in the automation test it blocked only one brick instead of all. I'll run it again this time I'll block all of the bricks.
Lilach, any updates?
(In reply to Yaniv Kaul from comment #34) > Lilach, any updates? Tested on 4.2.2.2-0.1 I see that with the revised test (see details below) VM is moved to another host successfully. 1) Block all 3 gluster bricks on VDSM host running the VM Blocked bricks: gluster01.scl.lab.tlv.redhat.com gluster02.scl.lab.tlv.redhat.com gluster03.scl.lab.tlv.redhat.com 2) Block connection between engine -> VDSM host running the VM result: VM is moved to another host successfully.
(In reply to Avihai from comment #35) Base on this I think we can safely close this as NOTABUG.
Fine by me