Bug 1198248

Summary: [performance] bad getVMList output creates unnecessary calls from Engine
Product: Red Hat Enterprise Virtualization Manager Reporter: Michal Skrivanek <michal.skrivanek>
Component: ovirt-engineAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED ERRATA QA Contact: Eldad Marciano <emarcian>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.5.0CC: bazulay, bkorren, bugs, ebenahar, ecohen, eedri, emarcian, fdeutsch, fromani, gklein, lpeer, lsurette, mavital, mgoldboi, michal.skrivanek, ofrenkel, oourfali, pdwyer, rbalakri, rgolan, Rhev-m-bugs, sherold, ycui, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: vt14.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1196735 Environment:
Last Closed: 2015-04-28 18:49:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1196327, 1196735, 1202360, 1203305    
Bug Blocks: 1193058    

Comment 1 Michal Skrivanek 2015-03-03 16:11:08 UTC
all in

Comment 4 Gil Klein 2015-03-05 19:06:47 UTC
Verification results were mistakenly added to the VDSM clone BZ.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c11

build vt13.12 successfully installed with no crashes or abnormal behavior, and the following method statistics benchmark was taken.

build vt13.7 vs build vt13.12 (both engine & host).

use case:
sampling 600sec 1 host 1 vm.


method invocations count results:

build vt13.7 reproducing the bug.
19 - getAllVmStats
76 - list
76 - getVmStats 

-----------------------------
build vt13.12 fixing the bug.
20 - getAllVmStats
116 - list
0 - getVmStats

Comment 5 Gil Klein 2015-03-06 06:01:19 UTC
Failed QE as this fix has introduce a regression that is now documented in new BZ #1198680

Comment 8 Eyal Edri 2015-03-10 08:36:39 UTC
moving back to modified until build will be delivered.

Comment 9 Eyal Edri 2015-03-10 15:29:57 UTC
clearing failedQA since a new build will be delivered with vt13.14

Comment 10 Gil Klein 2015-03-11 09:02:54 UTC
I've setup an VT13.11 engine with the latest vdsm build VT13.14, that should fix this issue. Now, I don't see the error that we saw when using VT13.12 build (Host moved to "ERROR" state) but I keep seeing this error in vdsm.log:

Thread-964::ERROR::2015-03-11 09:52:44,842::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest
    res = method(**params)
  File "/usr/share/vdsm/rpc/Bridge.py", line 278, in _dynamicMethod
    ret = retfield(result)
  File "/usr/share/vdsm/rpc/Bridge.py", line 341, in Host_getVMList_Ret
    return [v['vmId'] for v in ret['vmList']]
TypeError: string indices must be integers, not str
Thread-964::DEBUG::2015-03-11 09:52:44,843::stompReactor::163::yajsonrpc.StompServer::(send) Sending response

Comment 11 Michal Skrivanek 2015-03-12 13:30:55 UTC
this is not a failure of engine, no other changes needed there, only in the complementary vdsm bug 119735
so moving back to ON_QA

Comment 12 Eldad Marciano 2015-03-12 15:37:14 UTC
verified.
build vt13.15 applied on engine & vdsm both.

use case:
sampling 500 sec 1 host 1 vm.


method invocations count results:

16 - getAllVmStats
61 - list
0 - getVmStats

Comment 13 Francesco Romani 2015-03-12 16:42:56 UTC
Please also check MOM is working properly after this change. If not, we'll need a MOM update.

Comment 14 Gil Klein 2015-03-16 11:47:07 UTC
Re-opening, due to possible incompatibility issue with MOM,  based on:

https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20

If no code change is needed, feel free to move this BZ back to ON_QA

Comment 15 Oved Ourfali 2015-03-16 13:04:34 UTC
(In reply to Gil Klein from comment #14)
> Re-opening, due to possible incompatibility issue with MOM,  based on:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20
> 
> If no code change is needed, feel free to move this BZ back to ON_QA

Shouldn't we create a bug on MOM, and move this one to back MODIFIED?

Comment 16 Gil Klein 2015-03-16 13:08:02 UTC
(In reply to Oved Ourfali from comment #15)
> (In reply to Gil Klein from comment #14)
> > Re-opening, due to possible incompatibility issue with MOM,  based on:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20
> > 
> > If no code change is needed, feel free to move this BZ back to ON_QA
> 
> Shouldn't we create a bug on MOM, and move this one to back MODIFIED?
Yes, that possible. Do you know what is the exact flow that breaks MOM, so we can try it out and document it in a new BZ?

Comment 17 Oved Ourfali 2015-03-16 13:11:23 UTC
Francesco, can you share the exact issue or just open a MOM bug on that?

Comment 18 Michal Skrivanek 2015-03-16 13:14:59 UTC
see the sister bug 119735
Engine doesn't need any changes

Comment 19 Francesco Romani 2015-03-16 13:32:42 UTC
(In reply to Oved Ourfali from comment #17)
> Francesco, can you share the exact issue or just open a MOM bug on that?

Sure. Will file new bug later if needed.

Problem is: with current VDSM 3.5.x, when any VM is booted, MOM crashes like this

2015-03-12 18:20:36,996 - mom - INFO - MOM starting
2015-03-12 18:20:37,038 - mom.HostMonitor - INFO - Host Monitor starting
2015-03-12 18:20:37,042 - mom - INFO - hypervisor interface vdsm
2015-03-12 18:20:37,068 - mom.GuestManager - INFO - Guest Manager starting
2015-03-12 18:20:37,075 - mom.Policy - INFO - Loaded policy '00-defines'
2015-03-12 18:20:37,102 - mom.Policy - INFO - Loaded policy '02-balloon'
2015-03-12 18:20:37,136 - mom.Policy - INFO - Loaded policy '03-ksm'
2015-03-12 18:20:37,190 - mom.Policy - INFO - Loaded policy '04-cputune'
2015-03-12 18:20:37,194 - mom.PolicyEngine - INFO - Policy Engine starting
2015-03-12 18:20:37,195 - mom.RPCServer - INFO - RPC Server is disabled
2015-03-12 18:20:37,197 - mom.HostMonitor - INFO - HostMonitor is ready
2015-03-12 18:20:42,081 - mom.GuestManager - ERROR - Guest Manager crashed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 86, in run
    self._step()
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 106, in _step
    self._spawn_guest_monitors(domain_list)
  File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 122, in _spawn_guest_monitors
    info = self.hypervisor_iface.getVmInfo(id)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 144, in getVmInfo
    data['pid'] = self.getVmPid(id)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 67, in getVmPid
    return response['vmList'][0]['pid']
TypeError: string indices must be integers
2015-03-12 18:20:42,200 - mom - ERROR - Thread 'GuestManager' has exited
2015-03-12 18:20:42,202 - mom.HostMonitor - INFO - Host Monitor ending
2015-03-12 18:20:47,220 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 run:0 sleep_millisecs:0
2015-03-12 18:20:47,223 - mom.PolicyEngine - INFO - Policy Engine ending
2015-03-12 18:20:47,223 - mom - INFO - MOM ending
2015-03-12 18:20:50,662 - mom.RPCServer - INFO - getStatistics()

due to the changes introduced in VDSM internal API, which MOM happens to use when running in VDSM.

It is worth to be noted that while this breaks MOM, VDSM continues to run unchanged. This is also one of the reasons why this breakage gone unnoticed during verification of the VDSM changes: one has to specifically look for MOM failures in /var/log/vdsm/mom.log

Comment 20 Michal Skrivanek 2015-03-18 11:38:26 UTC
*** Bug 1198680 has been marked as a duplicate of this bug. ***

Comment 21 Eldad Marciano 2015-03-19 12:23:45 UTC
looks like VT14 missing this patch.

engine logged the following exceptions related the structure of list:

2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Failed in ListVDS method, for vds: fake477; host: fake477
2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Command ListVDSCommand(HostName = fake477, HostId = 79879619-988e-4740-a37f-db59d9bc80b7, vds=Host[fake477,79879619-988e-4740-a37f-db59d9bc80b7]) execution failed. Exception: ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.String

Comment 22 Michal Skrivanek 2015-03-19 13:26:21 UTC
(In reply to Eldad Marciano from comment #21)
> looks like VT14 missing this patch.

yes, sorry, it's only in vt14.1

Comment 24 Eldad Marciano 2015-03-26 09:38:20 UTC
verified on vt14.1

Comment 25 errata-xmlrpc 2015-04-28 18:49:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0888.html