Bug 1609565

Summary: Engine restart cause the DC to become non-responsive for a few seconds
Product: [oVirt] ovirt-engine Reporter: Michael Burman <mburman>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED WONTFIX QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.5.1CC: bugs, khakimi, ratamir, reliezer
Target Milestone: ---Keywords: Automation, Reopened
Target Release: ---Flags: tnisan: devel_ack-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-22 08:13:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log none

Description Michael Burman 2018-07-29 14:35:36 UTC
Description of problem:
Engine restart cause the DC to become non-responsive for a few seconds

It smells the same report that has been closed few months ago BZ 1482454, but it is now 100% reproducible and happens every time that engine is restarted
This bug affecting automation tests and failing some tests. 
Vms failing to start because of invalid DC status.

Version-Rel4.2.6_SNAPSHOT-84.gad3de30.0.scratch.master.el7evease number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Restart engine

Actual results:
DCs become non-responsive for few seconds. VMs can't start with Detail: [Cannot run VM. Unknown Data Center status.]

Expected results:
Engine restart shouldn't affect the DC state at all.

Additional info:
See also closed won't fix BZ 1482454
Please fix this issue as this is affecting out tests

Comment 1 Michael Burman 2018-07-29 14:38:14 UTC
Created attachment 1471371 [details]
engine log

Comment 2 Martin Perina 2018-07-30 14:08:26 UTC
Moving to storage team for investigation

Comment 3 Tal Nisan 2018-07-31 11:17:41 UTC
As Allon commented in the second bug "As far as I can see, the engine just marks statuses as "unknown" until it gets confirmation that they are up.", this is still true.
The automation need to be fixed in a way that running the VM should start only after the DC status is validated to be up, any other way the automation runs is wrong by definition.

Comment 4 Michael Burman 2018-07-31 11:24:49 UTC
(In reply to Tal Nisan from comment #3)
> As Allon commented in the second bug "As far as I can see, the engine just
> marks statuses as "unknown" until it gets confirmation that they are up.",
> this is still true.
> The automation need to be fixed in a way that running the VM should start
> only after the DC status is validated to be up, any other way the automation
> runs is wrong by definition.

Hi
I don't accept this answer.
The fact is when trying to start VM exactly after engine restart you will fail with Data center status is unknown.
How this is not a bug exactly?? you may say you won't fix this, but this is a real bug.
It make no sense that on each engine restart the DCs state will be change. 
The resolution is not acceptable by QE.

Comment 5 Michael Burman 2018-07-31 11:26:26 UTC
Raz,
Tal want to close this report as not a bug, which is not true. 
This is a bug. They may close this as not fixed, but they can't say this isn't a bug. 
Engine restart shouldn't change the DCs state never.

Comment 6 Michael Burman 2018-07-31 11:28:52 UTC
Tal,
Next time please talk to QE before closing any bug. Thanks!

Comment 7 Raz Tamir 2018-08-01 09:38:43 UTC
I strongly agree we have a bug here.
The DC (and it's entities) should not be affected by a restart of the ovirt-engine service

Comment 8 Tal Nisan 2018-08-02 08:09:51 UTC
If this behavior is by design then it is not a bug, we already stated that in the previous bug, the proper solution will be to adjust the test to the actual flow

Comment 9 Tal Nisan 2018-09-16 12:19:25 UTC
The data center status is moved to Non Responsive 10 seconds after Engine start up as an initialization flow.
We recommend to add a sleep ofat least 10 seconds on all the Engine restart automation tests flows before checking the DC status

Comment 10 Roni 2018-10-03 08:16:22 UTC
I also don't agree to close this BZ, it deftly an Engine bug
By default, all Engine states should be set to None, Unstable, etc, on start
and before it made its validation.

The real bug title should be: "Engine returns a fake DC status on start"

Comment 11 Kobi Hakimi 2018-10-09 06:39:45 UTC
I don't like sleep as a solution in tests.
if we need to add it we should have this issue as a limitation of our product in release notes same as I wrote in:
https://bugzilla.redhat.com/show_bug.cgi?id=1608790#c10

As a client when I write a script that checks the DC status after restart gets Up status I expected to keep with my operations. NOT to wait while in the background all the objects change their status.

its looks unreliable product that returns one status and changing this status without any notice or other operation that done.

Comment 12 Tal Nisan 2018-10-22 08:13:13 UTC
The system has underwent a restart and require initialization time, this is the flow

Comment 13 Raz Tamir 2018-11-07 09:21:06 UTC
A fix without a sleep is in and seems like we are W/A this bug