Bug 1099081 - [scale] monitoring: separate VDS and VM monitoring
Summary: [scale] monitoring: separate VDS and VM monitoring
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Roy Golan
QA Contact: Eldad Marciano
URL:
Whiteboard:
Depends On: 1099068
Blocks: 1081962
TreeView+ depends on / blocked
 
Reported: 2014-05-19 12:55 UTC by Roman Hodain
Modified: 2019-06-13 08:01 UTC (History)
19 users (show)

Fixed In Version: ovirt-engine-3.6.0_qa1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1099068
Environment:
Last Closed: 2016-03-09 20:45:45 UTC
oVirt Team: Virt


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0376 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.6.0 2016-03-10 01:20:52 UTC

Description Roman Hodain 2014-05-19 12:55:08 UTC
+++ This bug was initially created as a clone of Bug #1099068 +++

Description of problem:

stalling calls to VDSM from withing a monitoring cycle might delay other important monitoring stuff, such as storgae domain monitoring.


Version-Release number of selected component (if applicable):


How reproducible:
always - e.g when a VM was shutdown and VURTI needs to send a destory to VDSM 
and the call stalls then the whole VURTI thread is stuck

Steps to Reproduce:
1. create some timout in the destroy call and see domain monitroing isn't being called while at it
2.
3.

Actual results:
other calls to VDSM couldn't be called, while the vds manager lock is held and 1 out of 2 connections to VDSM is not available

Expected results:
VURTI thread shouldn't stall on call to VDSM for VM realted stuff.
VURTI shall contain VDS only related logic and thus won't need to call VDSM for other VM related call
VdsManager lock should be free while VDSM calls are in progree and not complete (i.e throughout the lifetime of the network use)

Additional info:

Comment 5 Michal Skrivanek 2015-01-14 08:22:21 UTC
done upstream.
this is definitely not recommended to push to 3.5.z

Comment 6 Allon Mureinik 2015-04-27 13:00:13 UTC
(In reply to Michal Skrivanek from comment #5)
> done upstream.
> this is definitely not recommended to push to 3.5.z

Michal - I don't see any patch references so I can't tell by myself, but is this included in the last build delivered to QE?

Comment 7 Michal Skrivanek 2015-04-28 13:35:01 UTC
if you mean 3.6 then yes.
I nack 3.5.z backport

Comment 8 Allon Mureinik 2015-04-29 08:29:39 UTC
Based on this comment:

(In reply to Michal Skrivanek from comment #7)
> if you mean 3.6 then yes.
Moving to ON_QA

> I nack 3.5.z backport
Removing 3.5.z flag

Comment 13 Nisim Simsolo 2015-08-12 11:49:34 UTC
I accidently removed need info flag from yobshans@redhat.com
New flag created. please look at comment 12.

Comment 14 Yuri Obshansky 2015-08-26 07:24:00 UTC
Unfortunately, we cannot reproduce that bug using regular RHEV-M scale setup and load test. We ran load tests with 100 concurrent threads which performed REST API calls ShutdownVM and StartVM. There was not detected any errors related to Storage during the test execution. 
RHEV-M setup: 1 Data Center, 1 Cluster, 1/2 Hosts, 10 Storage Domains, 100 VMs.
Storage Domain is NFS.
You need provide more clear scenario how to reproduce it
(possible from customer experience).

Comment 15 Eldad Marciano 2015-11-17 13:56:01 UTC
any updates? see comment 14

Comment 16 Allon Mureinik 2016-01-04 13:41:14 UTC
(In reply to Eldad Marciano from comment #15)
> any updates? see comment 14

I don't have a clearer scenario.
If we can't reproduce, I suggest closing based on the work done in 3.6.0.

Comment 17 Michal Skrivanek 2016-01-20 11:05:33 UTC
documented in bug 1099068

Comment 18 Gil Klein 2016-02-23 17:05:49 UTC
This bz is verified based on the verification results of bz #1099068

https://bugzilla.redhat.com/show_bug.cgi?id=1099068#c6

Comment 20 errata-xmlrpc 2016-03-09 20:45:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html


Note You need to log in before you can comment on or make changes to this bug.