1093924 – Connect to storage and refresh pool when a domain returns visible

Bug 1093924 - Connect to storage and refresh pool when a domain returns visible

Summary: Connect to storage and refresh pool when a domain returns visible

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Liron Aravot
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:	storage
Depends On:	1119852 1121420
Blocks:	1102782 rhev3.5beta 1156165
TreeView+	depends on / blocked

Reported:	2014-05-03 11:35 UTC by Federico Simoncelli
Modified:	2016-02-10 19:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:	vt1.3
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1102782 (view as bug list)
Environment:
Last Closed:	2015-02-16 19:08:49 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	27523	0	master	MERGED	core: intrdoucing host immediate domain recovery mechanism	Never

Description Federico Simoncelli 2014-05-03 11:35:36 UTC

Description of problem:
When a data domain is inactive it is possible to activate new hosts even if they're not able to connect to the relevant storage.

When the storage will become visible again these new hosts will be moved to NonOperational (since they failed to connect at the activation time).

Currently this is resolved after another 5 minutes with the autorecovery but preventing this situation in the first place would best.

The solution would be to try and reconnect to the storage and send a refreshStoragePool when the domain moves from Inactive to Active.
This approach would fix also bug 1086210 (and few other similar ones).

Version-Release number of selected component (if applicable):
rhevm-backend-3.4.0-0.16.rc.el6ev.noarch.rpm

How reproducible:
100%

Steps to Reproduce:
I tested this using NFS storage domains so I suggest to start reproducing with those and then move to block domains.

1. activate 1 host and 2 data storage domains: DomainA (master) DomainB (regular)
2. block connectivity to DomainB (no reconstructMaster), wait for the domain to become Inactive
3. activate a second host (it must not be able to reach DomainB as well)
4. restore connectivity to DomainB on both hosts
5. the second host is not connected to DomainB and in 5 minutes will be moved to Inactive

Actual results:
The second host is not connected to DomainB and in 5 minutes will be moved to Inactive

Expected results:
Engine, as soon ad DomainB is visible again, should make sure that the hosts are connected to DomainB and send a refreshStoragePool.

Additional info:

Comment 1 Liron Aravot 2014-05-04 10:25:00 UTC

On that scenario, the host shouldn't move to non operational as we still have domain monitoring on the unreachable domain (which will fail to produce it).
So the host will remain UP even after the domain connection returns in that scenario, as then we'll manage to produce the domain.

The issue is that we won't have link to the domain so operations related to it (like accessing disks on that domain from that specific host) should fail. 

Fede, seems to me like the solution is to create the links always which will solve that issue and will also lead us to be in the same situation on hosts in which the link was created before the domain became unreachable and between hosts that were later on connected to the pool. calling refresh from the engine on each domain that returns for all the hosts just to rebuild the links seems unneeded.

Comment 2 Federico Simoncelli 2014-05-04 11:37:48 UTC

(In reply to Liron Aravot from comment #1)
> On that scenario, the host shouldn't move to non operational as we still
> have domain monitoring on the unreachable domain (which will fail to produce
> it).

I am not sure what is "that" scenario but I assume it's the one this bug is referring to.

The monitoring domain is there but the mountpoint is *not* mounted because it's not reachable when "mount" (connectStorageServer) is issued on the second host.

> So the host will remain UP even after the domain connection returns in that
> scenario, as then we'll manage to produce the domain.

We won't be able to produce the domain if the connectStorageServer won't be issued once again since the host failed to mount it when it was unreachable.

> The issue is that we won't have link to the domain so operations related to
> it (like accessing disks on that domain from that specific host) should
> fail. 

That's a different problem that is not worth solving because even if we have the links the share is not mounted.

> Fede, seems to me like the solution is to create the links always which will
> solve that issue and will also lead us to be in the same situation on hosts
> in which the link was created before the domain became unreachable and
> between hosts that were later on connected to the pool. calling refresh from
> the engine on each domain that returns for all the hosts just to rebuild the
> links seems unneeded.

Agreed, calling refreshStoragePool without connectStorageServer may solve bug 1086210 but not the one of this bz (unneeded).

Although since we need to cover this bz scenario (which includes bug 1086210) we may as well fix them both at once.

Comment 6 Kevin Alon Goldblatt 2014-08-17 08:51:49 UTC

Ran the scenario from above. Both hosts connect successfully to the storage when it becomes available again. Moving to Verify

Comment 7 Kevin Alon Goldblatt 2014-08-17 08:56:25 UTC

The GetStoragePool function now updates the status of the domain.

Comment 8 Allon Mureinik 2015-02-16 19:08:49 UTC

RHEV-M 3.5.0 has been released, closing this bug.

Note You need to log in before you can comment on or make changes to this bug.