all in
Verification results were mistakenly added to the VDSM clone BZ. See: https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c11 build vt13.12 successfully installed with no crashes or abnormal behavior, and the following method statistics benchmark was taken. build vt13.7 vs build vt13.12 (both engine & host). use case: sampling 600sec 1 host 1 vm. method invocations count results: build vt13.7 reproducing the bug. 19 - getAllVmStats 76 - list 76 - getVmStats ----------------------------- build vt13.12 fixing the bug. 20 - getAllVmStats 116 - list 0 - getVmStats
Failed QE as this fix has introduce a regression that is now documented in new BZ #1198680
moving back to modified until build will be delivered.
clearing failedQA since a new build will be delivered with vt13.14
I've setup an VT13.11 engine with the latest vdsm build VT13.14, that should fix this issue. Now, I don't see the error that we saw when using VT13.12 build (Host moved to "ERROR" state) but I keep seeing this error in vdsm.log: Thread-964::ERROR::2015-03-11 09:52:44,842::__init__::493::jsonrpc.JsonRpcServer::(_serveRequest) Internal server error Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/yajsonrpc/__init__.py", line 488, in _serveRequest res = method(**params) File "/usr/share/vdsm/rpc/Bridge.py", line 278, in _dynamicMethod ret = retfield(result) File "/usr/share/vdsm/rpc/Bridge.py", line 341, in Host_getVMList_Ret return [v['vmId'] for v in ret['vmList']] TypeError: string indices must be integers, not str Thread-964::DEBUG::2015-03-11 09:52:44,843::stompReactor::163::yajsonrpc.StompServer::(send) Sending response
this is not a failure of engine, no other changes needed there, only in the complementary vdsm bug 119735 so moving back to ON_QA
verified. build vt13.15 applied on engine & vdsm both. use case: sampling 500 sec 1 host 1 vm. method invocations count results: 16 - getAllVmStats 61 - list 0 - getVmStats
Please also check MOM is working properly after this change. If not, we'll need a MOM update.
Re-opening, due to possible incompatibility issue with MOM, based on: https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20 If no code change is needed, feel free to move this BZ back to ON_QA
(In reply to Gil Klein from comment #14) > Re-opening, due to possible incompatibility issue with MOM, based on: > > https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20 > > If no code change is needed, feel free to move this BZ back to ON_QA Shouldn't we create a bug on MOM, and move this one to back MODIFIED?
(In reply to Oved Ourfali from comment #15) > (In reply to Gil Klein from comment #14) > > Re-opening, due to possible incompatibility issue with MOM, based on: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1196735#c20 > > > > If no code change is needed, feel free to move this BZ back to ON_QA > > Shouldn't we create a bug on MOM, and move this one to back MODIFIED? Yes, that possible. Do you know what is the exact flow that breaks MOM, so we can try it out and document it in a new BZ?
Francesco, can you share the exact issue or just open a MOM bug on that?
see the sister bug 119735 Engine doesn't need any changes
(In reply to Oved Ourfali from comment #17) > Francesco, can you share the exact issue or just open a MOM bug on that? Sure. Will file new bug later if needed. Problem is: with current VDSM 3.5.x, when any VM is booted, MOM crashes like this 2015-03-12 18:20:36,996 - mom - INFO - MOM starting 2015-03-12 18:20:37,038 - mom.HostMonitor - INFO - Host Monitor starting 2015-03-12 18:20:37,042 - mom - INFO - hypervisor interface vdsm 2015-03-12 18:20:37,068 - mom.GuestManager - INFO - Guest Manager starting 2015-03-12 18:20:37,075 - mom.Policy - INFO - Loaded policy '00-defines' 2015-03-12 18:20:37,102 - mom.Policy - INFO - Loaded policy '02-balloon' 2015-03-12 18:20:37,136 - mom.Policy - INFO - Loaded policy '03-ksm' 2015-03-12 18:20:37,190 - mom.Policy - INFO - Loaded policy '04-cputune' 2015-03-12 18:20:37,194 - mom.PolicyEngine - INFO - Policy Engine starting 2015-03-12 18:20:37,195 - mom.RPCServer - INFO - RPC Server is disabled 2015-03-12 18:20:37,197 - mom.HostMonitor - INFO - HostMonitor is ready 2015-03-12 18:20:42,081 - mom.GuestManager - ERROR - Guest Manager crashed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 86, in run self._step() File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 106, in _step self._spawn_guest_monitors(domain_list) File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 122, in _spawn_guest_monitors info = self.hypervisor_iface.getVmInfo(id) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 144, in getVmInfo data['pid'] = self.getVmPid(id) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 67, in getVmPid return response['vmList'][0]['pid'] TypeError: string indices must be integers 2015-03-12 18:20:42,200 - mom - ERROR - Thread 'GuestManager' has exited 2015-03-12 18:20:42,202 - mom.HostMonitor - INFO - Host Monitor ending 2015-03-12 18:20:47,220 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 run:0 sleep_millisecs:0 2015-03-12 18:20:47,223 - mom.PolicyEngine - INFO - Policy Engine ending 2015-03-12 18:20:47,223 - mom - INFO - MOM ending 2015-03-12 18:20:50,662 - mom.RPCServer - INFO - getStatistics() due to the changes introduced in VDSM internal API, which MOM happens to use when running in VDSM. It is worth to be noted that while this breaks MOM, VDSM continues to run unchanged. This is also one of the reasons why this breakage gone unnoticed during verification of the VDSM changes: one has to specifically look for MOM failures in /var/log/vdsm/mom.log
*** Bug 1198680 has been marked as a duplicate of this bug. ***
looks like VT14 missing this patch. engine logged the following exceptions related the structure of list: 2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Failed in ListVDS method, for vds: fake477; host: fake477 2015-03-19 12:17:59,605 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-90) [385fc317] Command ListVDSCommand(HostName = fake477, HostId = 79879619-988e-4740-a37f-db59d9bc80b7, vds=Host[fake477,79879619-988e-4740-a37f-db59d9bc80b7]) execution failed. Exception: ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.String
(In reply to Eldad Marciano from comment #21) > looks like VT14 missing this patch. yes, sorry, it's only in vt14.1
verified on vt14.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0888.html