Bug 1369418

Summary: [z-stream clone - 4.0.3] [InClusterUpgrade] Possible race condition with large amount of VMs in cluster
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.8CC: achareka, ahadas, gklein, jcoscia, kshukla, lsurette, mavital, mgoldboi, michal.skrivanek, mlibra, rbalakri, rgolan, Rhev-m-bugs, sbonazzo, srevivo, tjelinek, ykaul
Target Milestone: ovirt-4.0.3Keywords: ZStream
Target Release: 4.0.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Update of compatibility version of a cluster with many running VMs that are installed with guest-agent could lead to a deadlock that fails the update. Consequence: In some cases, such clusters could not be upgraded to a newer compatibility version. Fix: Prevent the deadlock in the database from happening. Result: Cluster with many running VMs installed with guest-agent can be upgraded to newer compatibility version.
Story Points: ---
Clone Of: 1366786 Environment:
Last Closed: 2016-09-15 08:06:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1366786    
Bug Blocks:    

Comment 2 Sandro Bonazzola 2016-08-25 11:55:27 UTC
Arik can you check target milestone? This bug is referenced in 4.0.3 changelog.

Comment 3 sefi litmanovich 2016-08-29 15:08:25 UTC
Verified with rhevm-4.0.3-0.1.el7ev.noarch.

Had a cluster with 144 Vms running.
Set the cluster to InClusterUpgrade - this change didn't invoke update of vm's configuration.
Then changed cluster compatibility version from 3.6 to 4.0 (hosts were 4.0 all the time) and monitored the updateVm calls with tail on engine log and after it's done check in DB that indeed all the vms were updated.
Repeated this several times (each time setting the cluster compatibility back to 3.6 via DB and the vm's custom compatibility version in vm_static from '3.6' to null - a bit "cheating there").
Ran the upgrade for 5 times, no race has occurred.
Please advise if this test isn't sufficient.

Comment 4 sefi litmanovich 2016-08-29 15:09:13 UTC
Didn't mention in comment 3 - but all the vms had rhevm-guest-agent running.

Comment 5 Arik 2016-08-30 08:00:20 UTC
(In reply to sefi litmanovich from comment #3)
This test is good.