Bug 1099081

Summary: [scale] monitoring: separate VDS and VM monitoring
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-engineAssignee: Roy Golan <rgolan>
Status: CLOSED ERRATA QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: agkesos, amureini, bazulay, gklein, iheim, istein, jentrena, lpeer, michal.skrivanek, mkalinin, nsimsolo, rbalakri, rgolan, Rhev-m-bugs, scohen, sherold, tdosek, yeylon, yobshans
Target Milestone: ovirt-3.6.0-rc   
Target Release: 3.6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-3.6.0_qa1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1099068 Environment:
Last Closed: 2016-03-09 20:45:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1099068    
Bug Blocks: 1081962    

Description Roman Hodain 2014-05-19 12:55:08 UTC
+++ This bug was initially created as a clone of Bug #1099068 +++

Description of problem:

stalling calls to VDSM from withing a monitoring cycle might delay other important monitoring stuff, such as storgae domain monitoring.


Version-Release number of selected component (if applicable):


How reproducible:
always - e.g when a VM was shutdown and VURTI needs to send a destory to VDSM 
and the call stalls then the whole VURTI thread is stuck

Steps to Reproduce:
1. create some timout in the destroy call and see domain monitroing isn't being called while at it
2.
3.

Actual results:
other calls to VDSM couldn't be called, while the vds manager lock is held and 1 out of 2 connections to VDSM is not available

Expected results:
VURTI thread shouldn't stall on call to VDSM for VM realted stuff.
VURTI shall contain VDS only related logic and thus won't need to call VDSM for other VM related call
VdsManager lock should be free while VDSM calls are in progree and not complete (i.e throughout the lifetime of the network use)

Additional info:

Comment 5 Michal Skrivanek 2015-01-14 08:22:21 UTC
done upstream.
this is definitely not recommended to push to 3.5.z

Comment 6 Allon Mureinik 2015-04-27 13:00:13 UTC
(In reply to Michal Skrivanek from comment #5)
> done upstream.
> this is definitely not recommended to push to 3.5.z

Michal - I don't see any patch references so I can't tell by myself, but is this included in the last build delivered to QE?

Comment 7 Michal Skrivanek 2015-04-28 13:35:01 UTC
if you mean 3.6 then yes.
I nack 3.5.z backport

Comment 8 Allon Mureinik 2015-04-29 08:29:39 UTC
Based on this comment:

(In reply to Michal Skrivanek from comment #7)
> if you mean 3.6 then yes.
Moving to ON_QA

> I nack 3.5.z backport
Removing 3.5.z flag

Comment 13 Nisim Simsolo 2015-08-12 11:49:34 UTC
I accidently removed need info flag from yobshans
New flag created. please look at comment 12.

Comment 14 Yuri Obshansky 2015-08-26 07:24:00 UTC
Unfortunately, we cannot reproduce that bug using regular RHEV-M scale setup and load test. We ran load tests with 100 concurrent threads which performed REST API calls ShutdownVM and StartVM. There was not detected any errors related to Storage during the test execution. 
RHEV-M setup: 1 Data Center, 1 Cluster, 1/2 Hosts, 10 Storage Domains, 100 VMs.
Storage Domain is NFS.
You need provide more clear scenario how to reproduce it
(possible from customer experience).

Comment 15 Eldad Marciano 2015-11-17 13:56:01 UTC
any updates? see comment 14

Comment 16 Allon Mureinik 2016-01-04 13:41:14 UTC
(In reply to Eldad Marciano from comment #15)
> any updates? see comment 14

I don't have a clearer scenario.
If we can't reproduce, I suggest closing based on the work done in 3.6.0.

Comment 17 Michal Skrivanek 2016-01-20 11:05:33 UTC
documented in bug 1099068

Comment 18 Gil Klein 2016-02-23 17:05:49 UTC
This bz is verified based on the verification results of bz #1099068

https://bugzilla.redhat.com/show_bug.cgi?id=1099068#c6

Comment 20 errata-xmlrpc 2016-03-09 20:45:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html