Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 840838

Summary:

[rhevm] [engine-core] reconstruct should be called in case fenceSpm() failed

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Haim <hateya>

Component:

ovirt-engine

Assignee:

Liron Aravot <laravot>

Status:

CLOSED WONTFIX

QA Contact:

Haim <hateya>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.1.0

CC:

abaron, amureini, dyasny, ewarszaw, fsimonce, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul

Target Milestone:

---

Target Release:

3.1.5

Hardware:

x86_64

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-11-07 10:00:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine and vdsm logs	none

Description Haim 2012-07-17 11:03:05 UTC

Description of problem:

my case:

- 2 hosts, host A (spm), host B (hsm) - not connected (in maintenance)
- host A went to non-responsive
- active host B, connectStoragePool fails on cannot find master domain
- using web-admin, right click on host A >> 'confirm host has been rebooted'

the action fails on vdsm side with the following error:

Thread-1909::INFO::2012-07-17 16:49:37,742::logUtils::37::dispatcher::(wrapper) Run and protect: fenceSpmStorage(spUUID='cacbdf16-d006-11e1-b98a-001a4a16970e', lastOwner=None, lastLver=None, options=None)
Thread-1909::ERROR::2012-07-17 16:49:37,742::task::853::TaskManager.Task::(_setError) Task=`bd036ea7-e322-4fe2-b51a-b014af6763cc`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2976, in fenceSpmStorage
    pool = self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 263, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('cacbdf16-d006-11e1-b98a-001a4a16970e',)

the only option left on this scenario is that reconstruct should be called to the available domains, and fenceSpm() will succeed after wards.

Comment 1 Haim 2012-07-17 11:09:14 UTC

Created attachment 598605 [details]
engine and vdsm logs

Comment 3 Ayal Baron 2012-08-06 09:54:53 UTC

Indeed in case only 1 host is available (being activated) and connect failed engine can try to reconstruct.

Comment 4 Federico Simoncelli 2012-09-27 09:52:54 UTC

I'm not sure what flow/use-case fenceSpm was used to cover but now I can't see it's point. If all the active hosts can't reach the master domain we should elect one to use reconstruct (as suggested in comment 3).

Comment 5 Ayal Baron 2012-09-27 10:11:37 UTC

From checking the code + trying to reproduce it appears that the system does fix itself after a couple of minutes.
Considering the above and the fact that changing the behaviour requires really complex and risky changes I will close the bug as wontfix unless you can show that the system does indeed reach a state it cannot get out of.

Comment 6 Haim 2012-10-11 11:35:23 UTC

(In reply to comment #5)
> From checking the code + trying to reproduce it appears that the system does
> fix itself after a couple of minutes.
> Considering the above and the fact that changing the behaviour requires
> really complex and risky changes I will close the bug as wontfix unless you
> can show that the system does indeed reach a state it cannot get out of.

well it can't, the only thing I can do is recovery procedure which will destroy my other domains.