Bug 954301

Summary: [TEXT] Event "rhevm failed to elect SPM for a DC" is misleading
Product: Red Hat Enterprise Virtualization Manager Reporter: Ilanit Stein <istein>
Component: ovirt-engineAssignee: Mooli Tayer <mtayer>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, acathrow, amureini, bazulay, iheim, istein, jkt, lpeer, mtayer, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: is7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
south05
none
south01
none
south07 none

Description Ilanit Stein 2013-04-22 08:18:55 UTC
Description of problem:

In these 2 scenarios [1] "rhevm failed to elect SPM for a DC" event is not generated:

1.
A single data center, with 2 hosts. One of them was SPM. 1 nfs storage domain. Put spm host in maintenance, then while second host in in 'contentending' status, I've blocked it from storage server (iptable DROP). 
while second host was in this 'contending' status -
The DC/host remained at that state for 12min.
Then, host status became normal, along with "Failed to Reconstruct Master Domain for Data Center Default." event. 

2.
A single datacenter with 2 hosts, and 1 data nfs domain.
all up, and 1 of the hosts is SPM.
Blocking the storage server on both hosts, by iptables DROP:
the host that was SPM, became non operational as it cannot connect to the
service.
The second host is contending and eventually turns to SPM while the storage
domain itself turns to unknown state.

Comments:

1. The [1] event is generated only if there is a network exception, that prevents the connection to vdsm, and in addition, fencing fail from some reason (single host on data center).

Version-Release number of selected component (if applicable):
SF13.1

How reproducible:
allways

Comment 1 Liron Aravot 2013-06-10 11:27:55 UTC
Ilanit,
I'm not sure what we want to solve here, can you please elaborate?
From what it seems to be, that AuditLog is being shown only on a very specific case (and possibly - we should edit the it's text regardless).
In case of problem, the engine will try to select an spm indefinitely, so i guess that we don't want to flood the log with that event or on what phase do we need to log it, it seems to me like it would just be annoying to the user and will flood the event log.

Comment 2 Ilanit Stein 2013-06-11 06:43:47 UTC
The main point in this bug is that there is no consistency on when "rhevm failed to elect SPM for a DC" event is sent:
For network exception, that prevents the connection to vdsm, we send it,
On the other hand for the 2 cases, described in the bug description, we don't.
Why should it differ?

If, for example, this event is reported in event notification, for some cases of  SPM election failure, the user will be notified, and for others not.

Comment 6 Ilanit Stein 2013-09-08 13:18:15 UTC
What is scenario to generate this message please?

Comment 7 David Botzer 2013-09-11 08:15:51 UTC
"Fencing failed on Storage Pool Manager for
Data Center ${StoragePoolName}. Setting status to Non-Operational"

how to reproduce the scenario that generates the above error ?

Comment 8 Yair Zaslavsky 2013-09-11 08:28:57 UTC
David - where do you see this? Do you see ${StoragePoolName} in the UI ?
If so , this is a different bug.

Comment 9 David Botzer 2013-09-11 08:54:15 UTC
Hi,

The message I should see and now I can see is->
Fencing failed on Storage Pool Manager south-01.xx.xx.xx.com for Data Center DC33-is12-Tabs. Setting status to Non-Operational.

But what scenario should I use ?

Comment 10 David Botzer 2013-09-11 09:14:11 UTC
3.3/is12 & is13
----------------------
The main problem I see disregarding the Text issue,
Is the recovering process - it fails

With two hosts ,1 DC
I removed all iptables rules
I tried "Select SPM" - failed
     "Host south-01.lab.bos.redhat.com was force selected by admin@internal"
Only the first host that was SPM succeeds eventualy
- Even after I recover host1 to SPM
  I try to "Select SPM" for the second, and it fails
   "Host south-01.lab.bos.redhat.com was force selected by admin@internal"
- after couple of tries it succeeds to become SPM

with 1 host 1 DC
I removed all iptables rules
I tried "Select SPM" - failed
host is active but -
   "Failed to connect Host south-05.lab.bos.redhat.com to Storage Servers"
   "Failed to Reconstruct Master Domain for Data Center DC33-is13-BOS."

Comment 11 David Botzer 2013-09-11 09:14:59 UTC
Created attachment 796301 [details]
south05

Comment 12 David Botzer 2013-09-11 09:15:41 UTC
Created attachment 796302 [details]
south01

Comment 13 David Botzer 2013-09-11 09:16:27 UTC
Created attachment 796303 [details]
south07

Comment 15 Itamar Heim 2014-01-21 22:22:58 UTC
Closing - RHEV 3.3 Released

Comment 16 Itamar Heim 2014-01-21 22:24:06 UTC
Closing - RHEV 3.3 Released

Comment 17 Itamar Heim 2014-01-21 22:27:50 UTC
Closing - RHEV 3.3 Released