Bug 1454633
Summary: | mom continuously crashing on getVmInfo (mom/HypervisorInterfaces/vdsmjsonrpcInterface.py) data['pid'] = vm['pid'] KeyError: 'pid' | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] mom | Reporter: | Shira Maximov <mshira> | ||||
Component: | General | Assignee: | Andrej Krejcir <akrejcir> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Liran Rotenberg <lrotenbe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | --- | CC: | akrejcir, bugs, dfediuck, mavital, michal.skrivanek, mshira | ||||
Target Milestone: | ovirt-4.2.0 | Keywords: | Regression | ||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ rule-engine: planning_ack+ rule-engine: devel_ack+ mavital: testing_ack+ |
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | mom-0.5.11-1 | Doc Type: | No Doc Update | ||||
Doc Text: |
undefined
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-20 11:27:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1496413 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Have you seen this in 4.1 as well? Or is it 4.2 only issue? This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. (In reply to Martin Sivák from comment #1) > Have you seen this in 4.1 as well? Or is it 4.2 only issue? it's only in 4.2 IIRC the PID was dropped from stats I'm seeing it constantly crashing in ovirt-system-tests. Raising severity. 2017-09-12 06:53:47,056 - mom.GuestManager - INFO - Guest Manager starting: multi-thread 2017-09-12 06:53:47,061 - mom.Policy - INFO - Loaded policy '00-defines' 2017-09-12 06:53:47,064 - mom.Policy - INFO - Loaded policy '01-parameters' 2017-09-12 06:53:47,070 - mom.GuestManager - ERROR - Guest Manager crashed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 88, in run self._spawn_guest_monitors(domain_list) File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 113, in _spawn_guest_monitors info = self.hypervisor_iface.getVmInfo(id) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py", line 133, in getVmInfo data['pid'] = vm['pid'] KeyError: 'pid' 2017-09-12 06:53:47,097 - mom.Policy - INFO - Loaded policy '02-balloon' 2017-09-12 06:53:47,122 - mom.Policy - INFO - Loaded policy '03-ksm' 2017-09-12 06:53:47,153 - mom.Policy - INFO - Loaded policy '04-cputune' 2017-09-12 06:53:47,189 - mom.Policy - INFO - Loaded policy '05-iotune' 2017-09-12 06:53:47,189 - mom.PolicyEngine - INFO - Policy Engine starting 2017-09-12 06:53:47,190 - mom.RPCServer - INFO - Using unix socket /var/run/vdsm/mom-vdsm.sock 2017-09-12 06:53:47,191 - mom.RPCServer - INFO - RPC Server starting 2017-09-12 06:53:48,924 - mom.RPCServer - INFO - ping() 2017-09-12 06:53:48,925 - mom.RPCServer - INFO - getStatistics() 2017-09-12 06:53:56,545 - mom.RPCServer - INFO - ping() 2017-09-12 06:53:56,546 - mom.RPCServer - INFO - getStatistics() 2017-09-12 06:54:02,205 - mom - ERROR - Thread 'GuestManager' has exited 2017-09-12 06:54:02,228 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0 2017-09-12 06:54:02,233 - mom.PolicyEngine - INFO - Policy Engine ending 2017-09-12 06:54:02,559 - mom.RPCServer - INFO - RPC Server ending 2017-09-12 06:54:07,559 - mom - INFO - MOM ending 2017-09-12 06:54:12,724 - mom - INFO - MOM starting 2017-09-12 06:54:12,753 - mom.HostMonitor - INFO - Host Monitor starting 2017-09-12 06:54:12,753 - mom - INFO - hypervisor interface vdsmjsonrpcbulk 2017-09-12 06:54:12,929 - mom.HostMonitor - INFO - HostMonitor is ready 2017-09-12 06:54:13,074 - mom.GuestManager - INFO - Guest Manager starting: multi-thread 2017-09-12 06:54:13,082 - mom.Policy - INFO - Loaded policy '00-defines' 2017-09-12 06:54:13,088 - mom.Policy - INFO - Loaded policy '01-parameters' 2017-09-12 06:54:13,090 - mom.GuestManager - ERROR - Guest Manager crashed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 88, in run self._spawn_guest_monitors(domain_list) File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 113, in _spawn_guest_monitors info = self.hypervisor_iface.getVmInfo(id) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py", line 133, in getVmInfo data['pid'] = vm['pid'] KeyError: 'pid' 2017-09-12 06:54:13,110 - mom.Policy - INFO - Loaded policy '02-balloon' 2017-09-12 06:54:13,135 - mom.Policy - INFO - Loaded policy '03-ksm' 2017-09-12 06:54:13,167 - mom.Policy - INFO - Loaded policy '04-cputune' 2017-09-12 06:54:13,208 - mom.Policy - INFO - Loaded policy '05-iotune' mom-0.5.10-0.0.master.el7.centos.noarch vdsm-4.20.3-22.git95788e5.el7.centos.x86_64 Verified on: 4.2.0-0.0.master.20170929123516.git007c392.el7.centos vdsm-4.20.3-121.git77235c7.el7.centos.x86_64 Steps of verification: 1. Created CPU qos of with 10% limit 2. Created CPU profile with the qos created in the step 1. 3. Attached the CPU profile create to a VM and start the load the VM. As mentioned before: The host should allocate the following cpu percentage for the VM : host cores / VM cores * 10 ( the limit if CPU qos) I tried it on two hosts: -One with 4 cores: VM set to 1 core and i got 40% cpu. Host is 10%. VM set to 2 cores and i got 20% cpu. Host is 10%. -Second host with 8 cores: VM set to 1 core and i got 80% cpu. Host is 10%. VM set to 2 cores and i got 40% cpu. Host is 10%. Actual results: The VM gets the cpu expected. The host is in the QoS limitation. mom.log doesn't show any errors about the guest manager. This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1281371 [details] mom.log Description of problem: CPU qos not working as expected because Guest Manager crashed Version-Release number of selected component (if applicable): oVirt Engine Version: 4.2.0-0.0.master.20170521155744.gitb6f1a86.el7.centos How reproducible: 100% Steps to Reproduce: 1. Creating CPU qos of with 10% limit 2. Create CPU profile with the qos created in the step 1. 3. Attach the CPU profile create to a VM and start the load the VM. The host should allocate the following cpu percentage for the VM : host cores / VM cores * 10 ( the limit if CPU qos) in my case, the host has 8 cores and the VM 1 core. so the host should allocate 80% cpu (from 1 core) for that specific VM, instead the vm gets 100%. Actual results: The VM gets 100% of 1 core, and it should get only 80% Expected results: Additional info: in mom.log: 2017-05-23 10:38:52,637 - mom.GuestManager - ERROR - Guest Manager crashed Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 88, in run self._spawn_guest_monitors(domain_list) File "/usr/lib/python2.7/site-packages/mom/GuestManager.py", line 113, in _spawn_guest_monitors info = self.hypervisor_iface.getVmInfo(id) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py", line 133, in getVmInfo data['pid'] = vm['pid'] KeyError: 'pid' a guest monitor is responsible for evaluating all the policies so the VM will have no QoS when it crashes