Bug 1369415

Summary: [z-stream clone - 3.6.9] [InClusterUpgrade] Possible race condition with large amount of VMs in cluster
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED ERRATA QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.8CC: achareka, ahadas, fdelorey, gklein, jcoscia, kshukla, lsurette, mavital, melewis, mgoldboi, michal.skrivanek, mlibra, mtessun, rbalakri, Rhev-m-bugs, srevivo, ykaul
Target Milestone: ovirt-3.6.9Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the update of the compatibility version of a cluster with many running virtual machines that are installed with the guest-agent caused a deadlock that caused the update to fail. In some cases, these clusters could not be upgraded to a newer compatibility version. Now, the deadlock in the database has been prevented so that a cluster with many running virtual machines that are installed with the guest-agent can be upgraded to newer compatibility version.
Story Points: ---
Clone Of: 1366786 Environment:
Last Closed: 2016-09-21 18:06:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1366786    
Bug Blocks:    

Comment 2 sefi litmanovich 2016-08-29 17:02:24 UTC
Verified with rhevm-3.6.9-0.1.el6.noarch.

Had a cluster with 126 Vms running.
Changed cluster compatibility version from 3.5 to 3.6 (hosts were 3.6 all the time) and monitored the updateVm calls with tail on engine log.
Repeated this several times (each time setting the cluster compatibility back to 3.5 via DB.
Ran the upgrade for 5 times, no race has occurred.
Please advise if this test isn't sufficient.

Comment 3 sefi litmanovich 2016-08-30 09:04:47 UTC
I see this test was approved in the 4.0.4 version of it. 10x.

Comment 5 errata-xmlrpc 2016-09-21 18:06:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1929.html

Comment 6 Frank DeLorey 2016-10-26 20:12:50 UTC
I have a customer running 3.6.9 that just hit this in a cluster that only has 30 VMs. I am gathering all the data and will open a new BZ.

Regards,

Frank

Comment 8 Frank DeLorey 2016-10-27 10:06:40 UTC
This is happening on every attempt to upgrade the cluster. I will grab the engine log and post the related errors into this BZ.

Frank

Comment 9 Michal Skrivanek 2016-10-29 06:14:34 UTC
(In reply to Frank DeLorey from comment #8)
> This is happening on every attempt to upgrade the cluster. I will grab the
> engine log and post the related errors into this BZ.
> 
> Frank

It is a different issue, tracked in bug 1389764 now