Bug 1198248 - [performance] bad getVMList output creates unnecessary calls from Engine
Summary: [performance] bad getVMList output creates unnecessary calls from Engine
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 3.5.1
Assignee: Piotr Kliczewski
QA Contact: Eldad Marciano
URL:
Whiteboard: virt
: 1198680 (view as bug list)
Depends On: 1196327 1196735 1202360 1203305
Blocks: 1193058
TreeView+ depends on / blocked
 
Reported: 2015-03-03 16:10 UTC by Michal Skrivanek
Modified: 2022-07-09 07:08 UTC (History)
24 users (show)

Fixed In Version: vt14.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1196735
Environment:
Last Closed: 2015-04-28 18:49:22 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0888 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.5.1 update 2015-04-28 22:40:04 UTC
oVirt gerrit 38172 0 None None None Never
oVirt gerrit 38271 0 None None None Never
oVirt gerrit 38304 0 None None None Never
oVirt gerrit 38321 0 None None None Never
oVirt gerrit 38322 0 None None None Never
oVirt gerrit 38448 0 ovirt-engine-3.5.2 MERGED getVMList: introducing onlyUUID parameter Never
oVirt gerrit 38462 0 master MERGED getVMList: introducing onlyUUID parameter Never
oVirt gerrit 38463 0 ovirt-engine-3.5 MERGED getVMList: introducing onlyUUID parameter Never

Comment 1 Michal Skrivanek 2015-03-03 16:11:08 UTC
all in

Comment 4 Gil Klein 2015-03-05 19:06:47 UTC
Verification results were mistakenly added to the VDSM clone BZ.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c11

build vt13.12 successfully installed with no crashes or abnormal behavior, and the following method statistics benchmark was taken.

build vt13.7 vs build vt13.12 (both engine & host).

use case:
sampling 600sec 1 host 1 vm.


method invocations count results:

build vt13.7 reproducing the bug.
19 - getAllVmStats
76 - list
76 - getVmStats 

-----------------------------
build vt13.12 fixing the bug.
20 - getAllVmStats
116 - list
0 - getVmStats

Comment 5 Gil Klein 2015-03-06 06:01:19 UTC
Failed QE as this fix has introduce a regression that is now documented in new BZ #1198680

Comment 8 Eyal Edri 2015-03-10 08:36:39 UTC
moving back to modified until build will be delivered.

Comment 9 Eyal Edri 2015-03-10 15:29:57 UTC
clearing failedQA since a new build will be delivered with vt13.14

Comment 10 Gil Klein 2015-03-11 09:02:54 UTC
I've setup an VT13.11 engine with the latest vdsm build VT13.14, that should fix this issue. Now, I don't see the error that we saw when using VT13.12 build (Host moved to "ERROR" state) but I keep seeing this error in vdsm.log:

Thread-964::ERROR::2015-03-11 09:52:44,842::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 278, in _dynamicMethod
    ret = retfield(result)
  File "/usr/share/vdsm/rpc/Bridge.py", line 341, in Host_getVMList_Ret
    return [v['vmId'] for v in ret['vmList']]
TypeError: string indices must be integers, not str
Thread-964::DEBUG::2015-03-11 09:52:44,843::stompReactor::163::yajsonrpc.StompServer::(send) Sending response

Comment 11 Michal Skrivanek 2015-03-12 13:30:55 UTC
this is not a failure of engine, no other changes needed there, only in the complementary vdsm bug 119735
so moving back to ON_QA

Comment 12 Eldad Marciano 2015-03-12 15:37:14 UTC
verified.
build vt13.15 applied on engine & vdsm both.

use case:
sampling 500 sec 1 host 1 vm.


method invocations count results:

16 - getAllVmStats
61 - list
0 - getVmStats

Comment 13 Francesco Romani 2015-03-12 16:42:56 UTC
Please also check MOM is working properly after this change. If not, we'll need a MOM update.

Comment 14 Gil Klein 2015-03-16 11:47:07 UTC
Re-opening, due to possible incompatibility issue with MOM,  based on:

https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20

If no code change is needed, feel free to move this BZ back to ON_QA

Comment 15 Oved Ourfali 2015-03-16 13:04:34 UTC
(In reply to Gil Klein from comment #14)
> Re-opening, due to possible incompatibility issue with MOM,  based on:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20
> 
> If no code change is needed, feel free to move this BZ back to ON_QA

Shouldn't we create a bug on MOM, and move this one to back MODIFIED?

Comment 16 Gil Klein 2015-03-16 13:08:02 UTC
(In reply to Oved Ourfali from comment #15)
> (In reply to Gil Klein from comment #14)
> > Re-opening, due to possible incompatibility issue with MOM,  based on:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20
> > 
> > If no code change is needed, feel free to move this BZ back to ON_QA
> 
> Shouldn't we create a bug on MOM, and move this one to back MODIFIED?
Yes, that possible. Do you know what is the exact flow that breaks MOM, so we can try it out and document it in a new BZ?

Comment 17 Oved Ourfali 2015-03-16 13:11:23 UTC
Francesco, can you share the exact issue or just open a MOM bug on that?

Comment 18 Michal Skrivanek 2015-03-16 13:14:59 UTC
see the sister bug 119735
Engine doesn't need any changes

Comment 19 Francesco Romani 2015-03-16 13:32:42 UTC
(In reply to Oved Ourfali from comment #17)
> Francesco, can you share the exact issue or just open a MOM bug on that?

Sure. Will file new bug later if needed.

Problem is: with current VDSM 3.5.x, when any VM is booted, MOM crashes like this

2015-03-12 18:20:36,996 - mom - INFO - MOM starting
2015-03-12 18:20:37,038 - mom.HostMonitor - INFO - Host Monitor starting
2015-03-12 18:20:37,042 - mom - INFO - hypervisor interface vdsm
2015-03-12 18:20:37,068 - mom.GuestManager - INFO - Guest Manager starting
2015-03-12 18:20:37,075 - mom.Policy - INFO - Loaded policy '00-defines'
2015-03-12 18:20:37,102 - mom.Policy - INFO - Loaded policy '02-balloon'
2015-03-12 18:20:37,136 - mom.Policy - INFO - Loaded policy '03-ksm'
2015-03-12 18:20:37,190 - mom.Policy - INFO - Loaded policy '04-cputune'
2015-03-12 18:20:37,194 - mom.PolicyEngine - INFO - Policy Engine starting
2015-03-12 18:20:37,195 - mom.RPCServer - INFO - RPC Server is disabled
2015-03-12 18:20:37,197 - mom.HostMonitor - INFO - HostMonitor is ready
2015-03-12 18:20:42,081 - mom.GuestManager - ERROR - Guest Manager crashed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 86, in run
    self._step()
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 106, in _step
    self._spawn_guest_monitors(domain_list)
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 122, in _spawn_guest_monitors
    info = self.hypervisor_iface.getVmInfo(id)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 144, in getVmInfo
    data['pid'] = self.getVmPid(id)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 67, in getVmPid
    return response['vmList'][0]['pid']
TypeError: string indices must be integers
2015-03-12 18:20:42,200 - mom - ERROR - Thread 'GuestManager' has exited
2015-03-12 18:20:42,202 - mom.HostMonitor - INFO - Host Monitor ending
2015-03-12 18:20:47,220 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 run:0 sleep_millisecs:0
2015-03-12 18:20:47,223 - mom.PolicyEngine - INFO - Policy Engine ending
2015-03-12 18:20:47,223 - mom - INFO - MOM ending
2015-03-12 18:20:50,662 - mom.RPCServer - INFO - getStatistics()

due to the changes introduced in VDSM internal API, which MOM happens to use when running in VDSM.

It is worth to be noted that while this breaks MOM, VDSM continues to run unchanged. This is also one of the reasons why this breakage gone unnoticed during verification of the VDSM changes: one has to specifically look for MOM failures in /var/log/vdsm/mom.log

Comment 20 Michal Skrivanek 2015-03-18 11:38:26 UTC
*** Bug 1198680 has been marked as a duplicate of this bug. ***

Comment 21 Eldad Marciano 2015-03-19 12:23:45 UTC
looks like VT14 missing this patch.

engine logged the following exceptions related the structure of list:

2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Failed in ListVDS method, for vds: fake477; host: fake477
2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Command ListVDSCommand(HostName = fake477, HostId = 79879619-988e-4740-a37f-db59d9bc80b7, vds=Host[fake477,79879619-988e-4740-a37f-db59d9bc80b7]) execution failed. Exception: ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.String

Comment 22 Michal Skrivanek 2015-03-19 13:26:21 UTC
(In reply to Eldad Marciano from comment #21)
> looks like VT14 missing this patch.

yes, sorry, it's only in vt14.1

Comment 24 Eldad Marciano 2015-03-26 09:38:20 UTC
verified on vt14.1

Comment 25 errata-xmlrpc 2015-04-28 18:49:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0888.html


Note You need to log in before you can comment on or make changes to this bug.