Hide Forgot
Created attachment 1034199 [details] logs+image Description of problem: Had 2 Gluster domains and a RHEL VM with 2 disks, one on each domain. While writing to one of the disks, I stopped one of the volumes, on the gluster server (volume stop) where one of the domains resides on. It caused the VM to became in paused mode, then I started the volume and the domain reported as active once again. After that I tried to start the VM and it failed. The main problem was that the vdsm did not create a symbolic link to the mount point of the Gluster domain after it was activated again. Version-Release number of selected component (if applicable): ovirt-engine-3.6.0-0.0.master.20150519172219.git9a2e2b3.el6.noarch vdsm-4.17.0-822.git9b11a18.el7.noarch How reproducible: 100% Steps to Reproduce: setup: 2 Gluster storage domain 1. Create a VM + 1 disk and install OS 2. Add a second disk from the other domain and write to it by dd operation 3. While the writing is performed, stop the second domain's volume from Gluster server (gluster volume stop GlusterDomain2) and wait until the VM is paused 4. From the Gluster server start the volume (gluster volume start GlusterDomain2) 5. Once the domain is active, try to start the VM Actual results: Failed to run the VM. Checked under /rhev/data-center, the symbolic link doesn't exist. Failure in vdsm.log: Thread-392780::ERROR::2015-06-03 10:36:05,414::vm::741::vm.Vm::(_startUnderlyingVm) vmId=`884277c1-2d89-40ea-b23a-650b7812a229`::The vm start process failed From engine.log: 2015-06-03 10:36:08,131 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-12-thread-2) [] Correlation ID: 66711c01, Job ID: a63e9547-fa8f-40c3-a9dc-916c2b0efc72, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vmGluster (User: admin@internal). Expected results: The symbolic link of the Gluster domain should be created with its activation
This sounds very familiar. Adam - didn't you handle something similar in 3.5?
(In reply to Allon Mureinik from comment #1) > This sounds very familiar. Adam - didn't you handle something similar in 3.5? I don't think so but I'll take a look anyway :)
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
Will the fuse mount of multiple Gluster nodes resolve the issue here as well?
Not really. Assuming no changes done in this area and considering the changes I had to fix the mount multiple servers issue, we will still have the issue. Keep in mind that the fix supports replica 1 and 3 so even with that fix, if volume is replica 1, we will fail and if volume is replica 3, we will have the issue after stopping all replicas.
Idan/Ala, isn't this a subset of bug 1271771?
This a subset of bug 1271771 but there some extra work to do. After some investigations we found that thew issue is only related to gluster. Tried the following tests: 1. created a gluster storage with gluster volume and created a vm with a disk on that sd. Then, stopped the volume ==> the vm paused, the vm stayed paused even after the volume started. 2. created a posix compliant fs using gluster volume and tried same steps as before ==> the vm successfully resumed from its paused status after starting gluster volume.
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Cannot reproduce. Probably fixed as a result of the work done in Gluster area.
Tested with the following code: --------------------------------------- vdsm-4.18.4-2.el7ev.x86_64 rhevm-4.0.2-0.2.rc1.el7ev.noarch Tested using the following scenario: --------------------------------------- 1. Create a VM + 1 disk and install OS 2. Add a second disk from the other domain and write to it by dd operation 3. While the writing is performed, stop the second domain's volume from Gluster server (gluster volume stop GlusterDomain2) and wait until the VM is paused 4. From the Gluster server start the volume (gluster volume start GlusterDomain2) 5. Once the domain is active, try to start the VM >>>>> VM starts successfully! Actual results: VM started successfully The symbolic link to the gluster storage domain is in tact. Moving tot VERIFIED!