Bug 1227665 - Symbolic link of Gluster domain is not recreated once the domain is activated from failure
Summary: Symbolic link of Gluster domain is not recreated once the domain is activated...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.0.0-rc
: 4.18.0
Assignee: Ala Hino
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1271771
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-03 09:11 UTC by lkuchlan
Modified: 2016-08-01 12:27 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-01 12:27:25 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.0.0+
rule-engine: planning_ack+
rule-engine: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)
logs+image (1.13 MB, application/x-gzip)
2015-06-03 09:11 UTC, lkuchlan
no flags Details

Description lkuchlan 2015-06-03 09:11:54 UTC
Created attachment 1034199 [details]
logs+image

Description of problem:
Had 2 Gluster domains and a RHEL VM with 2 disks, one on each domain. While writing to one of the disks, I stopped one of the volumes, on the gluster server (volume stop) where one of the domains resides on. It caused the VM to became in paused mode, then I started the volume and the domain reported as active once again. After that I tried to start the VM and it failed. The main problem was that the vdsm did not create a symbolic link to the mount point of the Gluster domain after it was activated again.

Version-Release number of selected component (if applicable):
ovirt-engine-3.6.0-0.0.master.20150519172219.git9a2e2b3.el6.noarch
vdsm-4.17.0-822.git9b11a18.el7.noarch

How reproducible:
100%

Steps to Reproduce:

setup:
2 Gluster storage domain 

1. Create a VM + 1 disk and install OS
2. Add a second disk from the other domain and write to it by dd operation 
3. While the writing is performed, stop the second domain's volume from Gluster server (gluster volume stop GlusterDomain2) and wait until the VM is paused
4. From the Gluster server start the volume (gluster volume start GlusterDomain2)
5. Once the domain is active, try to start the VM

Actual results:
Failed to run the VM. Checked under /rhev/data-center, the symbolic link doesn't exist. Failure in vdsm.log: Thread-392780::ERROR::2015-06-03 10:36:05,414::vm::741::vm.Vm::(_startUnderlyingVm) vmId=`884277c1-2d89-40ea-b23a-650b7812a229`::The vm start process failed


From engine.log:

2015-06-03 10:36:08,131 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-12-thread-2) [] Correlation ID: 66711c01, Job ID: a63e9547-fa8f-40c3-a9dc-916c2b0efc72, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vmGluster (User: admin@internal).

Expected results:
The symbolic link of the Gluster domain should be created with its activation

Comment 1 Allon Mureinik 2015-06-04 12:35:00 UTC
This sounds very familiar. Adam - didn't you handle something similar in 3.5?

Comment 2 Adam Litke 2015-06-17 13:48:29 UTC
(In reply to Allon Mureinik from comment #1)
> This sounds very familiar. Adam - didn't you handle something similar in 3.5?

I don't think so but I'll take a look anyway :)

Comment 3 Red Hat Bugzilla Rules Engine 2015-10-19 10:53:46 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Sandro Bonazzola 2015-10-26 12:37:46 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 5 Yaniv Lavi 2015-10-29 09:16:22 UTC
Will the fuse mount of multiple Gluster nodes resolve the issue here as well?

Comment 6 Ala Hino 2015-10-29 09:35:35 UTC
Not really.
Assuming no changes done in this area and considering the changes I had to fix the mount multiple servers issue, we will still have the issue. 
Keep in mind that the fix supports replica 1 and 3 so even with that fix, if volume is replica 1, we will fail and if volume is replica 3, we will have the issue after stopping all replicas.

Comment 7 Allon Mureinik 2016-01-13 14:34:44 UTC
Idan/Ala, isn't this a subset of bug 1271771?

Comment 8 Ala Hino 2016-01-17 14:23:17 UTC
This a subset of bug 1271771 but there some extra work to do.

After some investigations we found that thew issue is only related to gluster.
Tried the following tests:

1. created a gluster storage with gluster volume and created a vm with a disk on that sd. Then, stopped the volume ==> the vm paused, the vm stayed paused even after the volume started.

2. created a posix compliant fs using gluster volume and tried same steps as before ==> the vm successfully resumed from its paused status after starting gluster volume.

Comment 9 Sandro Bonazzola 2016-05-02 09:55:28 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 10 Yaniv Lavi 2016-05-23 13:16:58 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 11 Yaniv Lavi 2016-05-23 13:23:25 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 12 Ala Hino 2016-05-25 11:56:38 UTC
Cannot reproduce. Probably fixed as a result of the work done in Gluster area.

Comment 13 Kevin Alon Goldblatt 2016-07-21 12:51:19 UTC
Tested with the following code:
---------------------------------------
vdsm-4.18.4-2.el7ev.x86_64
rhevm-4.0.2-0.2.rc1.el7ev.noarch

Tested using the following scenario:
---------------------------------------
1. Create a VM + 1 disk and install OS
2. Add a second disk from the other domain and write to it by dd operation 
3. While the writing is performed, stop the second domain's volume from Gluster server (gluster volume stop GlusterDomain2) and wait until the VM is paused
4. From the Gluster server start the volume (gluster volume start GlusterDomain2)
5. Once the domain is active, try to start the VM >>>>> VM starts successfully!

Actual results:
VM started successfully
The symbolic link to the gluster storage domain is in tact.


Moving tot VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.