Bug 954301 - [TEXT] Event "rhevm failed to elect SPM for a DC" is misleading
Summary: [TEXT] Event "rhevm failed to elect SPM for a DC" is misleading
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Mooli Tayer
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-22 08:18 UTC by Ilanit Stein
Modified: 2016-02-10 19:00 UTC (History)
13 users (show)

Fixed In Version: is7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
south05 (962.10 KB, application/x-gzip)
2013-09-11 09:14 UTC, David Botzer
no flags Details
south01 (263.67 KB, application/x-gzip)
2013-09-11 09:15 UTC, David Botzer
no flags Details
south07 (460.55 KB, application/x-gzip)
2013-09-11 09:16 UTC, David Botzer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 17407 0 None None None Never

Description Ilanit Stein 2013-04-22 08:18:55 UTC
Description of problem:

In these 2 scenarios [1] "rhevm failed to elect SPM for a DC" event is not generated:

1.
A single data center, with 2 hosts. One of them was SPM. 1 nfs storage domain. Put spm host in maintenance, then while second host in in 'contentending' status, I've blocked it from storage server (iptable DROP). 
while second host was in this 'contending' status -
The DC/host remained at that state for 12min.
Then, host status became normal, along with "Failed to Reconstruct Master Domain for Data Center Default." event. 

2.
A single datacenter with 2 hosts, and 1 data nfs domain.
all up, and 1 of the hosts is SPM.
Blocking the storage server on both hosts, by iptables DROP:
the host that was SPM, became non operational as it cannot connect to the
service.
The second host is contending and eventually turns to SPM while the storage
domain itself turns to unknown state.

Comments:

1. The [1] event is generated only if there is a network exception, that prevents the connection to vdsm, and in addition, fencing fail from some reason (single host on data center).

Version-Release number of selected component (if applicable):
SF13.1

How reproducible:
allways

Comment 1 Liron Aravot 2013-06-10 11:27:55 UTC
Ilanit,
I'm not sure what we want to solve here, can you please elaborate?
From what it seems to be, that AuditLog is being shown only on a very specific case (and possibly - we should edit the it's text regardless).
In case of problem, the engine will try to select an spm indefinitely, so i guess that we don't want to flood the log with that event or on what phase do we need to log it, it seems to me like it would just be annoying to the user and will flood the event log.

Comment 2 Ilanit Stein 2013-06-11 06:43:47 UTC
The main point in this bug is that there is no consistency on when "rhevm failed to elect SPM for a DC" event is sent:
For network exception, that prevents the connection to vdsm, we send it,
On the other hand for the 2 cases, described in the bug description, we don't.
Why should it differ?

If, for example, this event is reported in event notification, for some cases of  SPM election failure, the user will be notified, and for others not.

Comment 6 Ilanit Stein 2013-09-08 13:18:15 UTC
What is scenario to generate this message please?

Comment 7 David Botzer 2013-09-11 08:15:51 UTC
"Fencing failed on Storage Pool Manager for
Data Center ${StoragePoolName}. Setting status to Non-Operational"

how to reproduce the scenario that generates the above error ?

Comment 8 Yair Zaslavsky 2013-09-11 08:28:57 UTC
David - where do you see this? Do you see ${StoragePoolName} in the UI ?
If so , this is a different bug.

Comment 9 David Botzer 2013-09-11 08:54:15 UTC
Hi,

The message I should see and now I can see is->
Fencing failed on Storage Pool Manager south-01.xx.xx.xx.com for Data Center DC33-is12-Tabs. Setting status to Non-Operational.

But what scenario should I use ?

Comment 10 David Botzer 2013-09-11 09:14:11 UTC
3.3/is12 & is13
----------------------
The main problem I see disregarding the Text issue,
Is the recovering process - it fails

With two hosts ,1 DC
I removed all iptables rules
I tried "Select SPM" - failed
     "Host south-01.lab.bos.redhat.com was force selected by admin@internal"
Only the first host that was SPM succeeds eventualy
- Even after I recover host1 to SPM
  I try to "Select SPM" for the second, and it fails
   "Host south-01.lab.bos.redhat.com was force selected by admin@internal"
- after couple of tries it succeeds to become SPM

with 1 host 1 DC
I removed all iptables rules
I tried "Select SPM" - failed
host is active but -
   "Failed to connect Host south-05.lab.bos.redhat.com to Storage Servers"
   "Failed to Reconstruct Master Domain for Data Center DC33-is13-BOS."

Comment 11 David Botzer 2013-09-11 09:14:59 UTC
Created attachment 796301 [details]
south05

Comment 12 David Botzer 2013-09-11 09:15:41 UTC
Created attachment 796302 [details]
south01

Comment 13 David Botzer 2013-09-11 09:16:27 UTC
Created attachment 796303 [details]
south07

Comment 15 Itamar Heim 2014-01-21 22:22:58 UTC
Closing - RHEV 3.3 Released

Comment 16 Itamar Heim 2014-01-21 22:24:06 UTC
Closing - RHEV 3.3 Released

Comment 17 Itamar Heim 2014-01-21 22:27:50 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.