Description of problem: Engine restart cause the DC to become non-responsive for a few seconds It smells the same report that has been closed few months ago BZ 1482454, but it is now 100% reproducible and happens every time that engine is restarted This bug affecting automation tests and failing some tests. Vms failing to start because of invalid DC status. Version-Rel4.2.6_SNAPSHOT-84.gad3de30.0.scratch.master.el7evease number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Restart engine Actual results: DCs become non-responsive for few seconds. VMs can't start with Detail: [Cannot run VM. Unknown Data Center status.] Expected results: Engine restart shouldn't affect the DC state at all. Additional info: See also closed won't fix BZ 1482454 Please fix this issue as this is affecting out tests
Created attachment 1471371 [details] engine log
Moving to storage team for investigation
As Allon commented in the second bug "As far as I can see, the engine just marks statuses as "unknown" until it gets confirmation that they are up.", this is still true. The automation need to be fixed in a way that running the VM should start only after the DC status is validated to be up, any other way the automation runs is wrong by definition.
(In reply to Tal Nisan from comment #3) > As Allon commented in the second bug "As far as I can see, the engine just > marks statuses as "unknown" until it gets confirmation that they are up.", > this is still true. > The automation need to be fixed in a way that running the VM should start > only after the DC status is validated to be up, any other way the automation > runs is wrong by definition. Hi I don't accept this answer. The fact is when trying to start VM exactly after engine restart you will fail with Data center status is unknown. How this is not a bug exactly?? you may say you won't fix this, but this is a real bug. It make no sense that on each engine restart the DCs state will be change. The resolution is not acceptable by QE.
Raz, Tal want to close this report as not a bug, which is not true. This is a bug. They may close this as not fixed, but they can't say this isn't a bug. Engine restart shouldn't change the DCs state never.
Tal, Next time please talk to QE before closing any bug. Thanks!
I strongly agree we have a bug here. The DC (and it's entities) should not be affected by a restart of the ovirt-engine service
If this behavior is by design then it is not a bug, we already stated that in the previous bug, the proper solution will be to adjust the test to the actual flow
The data center status is moved to Non Responsive 10 seconds after Engine start up as an initialization flow. We recommend to add a sleep ofat least 10 seconds on all the Engine restart automation tests flows before checking the DC status
I also don't agree to close this BZ, it deftly an Engine bug By default, all Engine states should be set to None, Unstable, etc, on start and before it made its validation. The real bug title should be: "Engine returns a fake DC status on start"
I don't like sleep as a solution in tests. if we need to add it we should have this issue as a limitation of our product in release notes same as I wrote in: https://bugzilla.redhat.com/show_bug.cgi?id=1608790#c10 As a client when I write a script that checks the DC status after restart gets Up status I expected to keep with my operations. NOT to wait while in the background all the objects change their status. its looks unreliable product that returns one status and changing this status without any notice or other operation that done.
The system has underwent a restart and require initialization time, this is the flow
A fix without a sleep is in and seems like we are W/A this bug