1390675 – Hosted Engine CPU usage is always shown as 100 % in the web UI

Bug 1390675 - Hosted Engine CPU usage is always shown as 100 % in the web UI

Summary: Hosted Engine CPU usage is always shown as 100 % in the web UI

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.0.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.1.0-rc
Target Release:	4.1.0.2
Assignee:	Martin Sivák
QA Contact:	Artyom
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-01 16:20 UTC by RamaKasturi
Modified:	2017-05-11 09:27 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-02-01 14:44:45 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ rule-engine: blocker+

Attachments	(Terms of Use)
Adding screenshot for the same (152.35 KB, image/png) 2016-11-01 16:22 UTC, RamaKasturi	no flags	Details
CPU on the engine web UI (138.06 KB, image/png) 2016-12-06 17:33 UTC, Simone Tiraboschi	no flags	Details
Screenshot from 2016-12-07 15-52-28.png (165.41 KB, image/png) 2016-12-07 13:53 UTC, Nikolai Sednev	no flags	Details
Vdsm master reports normal CPU usage for a regular VM (471.45 KB, image/png) 2016-12-09 17:28 UTC, Francesco Romani	no flags	Details
Screenshot from 2016-12-14 12-39-16.png (162.14 KB, image/png) 2016-12-14 10:40 UTC, Nikolai Sednev	no flags	Details
engine.log (8.21 MB, text/plain) 2016-12-14 10:41 UTC, Nikolai Sednev	no flags	Details
engine-setup.log (3.77 MB, text/plain) 2016-12-14 10:42 UTC, Nikolai Sednev	no flags	Details
server.log (456.33 KB, text/plain) 2016-12-14 10:44 UTC, Nikolai Sednev	no flags	Details
ui.log (2.22 KB, text/plain) 2016-12-14 10:44 UTC, Nikolai Sednev	no flags	Details
Screenshot from 2016-12-19 14-38-21.png (189.48 KB, image/png) 2016-12-19 12:38 UTC, Nikolai Sednev	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	68927	0	master	MERGED	core: Update VmManager caches when Hosted Engine VM is imported	2020-05-26 23:03:54 UTC
oVirt gerrit	69837	0	ovirt-engine-4.1	MERGED	core: Update VmManager caches when Hosted Engine VM is imported	2020-05-26 23:03:54 UTC

Description RamaKasturi 2016-11-01 16:20:12 UTC

Description of problem:
After installing latest upstream ovirt bits i.e 4.1, i see that hosted engine CPU usage is always shows as 100% though the usage is very less when directly checked on the VM with top command.

top - 13:33:12 up 32 min,  1 user,  load average: 0.69, 0.66, 0.57
Tasks: 150 total,   1 running, 149 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.3 us,  9.0 sy,  0.0 ni, 82.5 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st

Version-Release number of selected component (if applicable):
ovirt-engine-setup-4.1.0-0.0.master.20161024211322.gitfc0de31.el7.centos.noarch
ovirt-engine-4.1.0-0.0.master.20161024211322.gitfc0de31.el7.centos.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install the latest upstream ovirt bits i.e 4.1
2. Once the hosted engine deployment is done, login to the web admin portal and check for the CPU usage of hosted-engine vm.
3.

Actual results:
Hosted Engine VM CPU usage is always reported as 100 % though the actual usage on the system is very less.

Expected results:
Hosted Engine VM CPU usage should be the same what is shown in the system when checked directly with top command.

Additional info:

Comment 1 RamaKasturi 2016-11-01 16:22:22 UTC

Created attachment 1216153 [details]
Adding screenshot for the same

Comment 2 Michal Skrivanek 2016-11-03 09:34:43 UTC

please add vdsm/engine log showing the actual report for the HE VM. Alternatively please include "top" CPU usage of that qemu process rather than a complete host aggregated data

Comment 5 Nikolai Sednev 2016-11-30 12:42:14 UTC

I did not seen such CPU loads via WEBUI in downstream 4.0.6.1-0.1.el7ev and before it. Looks like this is a regression. My deployments are regular hosted engine environments deployed on NFS with NFS data storage domains. My CPU loads shown as >1% load.

Comment 6 Simone Tiraboschi 2016-12-06 17:33:07 UTC

Created attachment 1228656 [details]
CPU on the engine web UI

Comment 7 Simone Tiraboschi 2016-12-06 17:34:23 UTC

Reproduced also here with engine and vdsm from 4.1.
The CPU is constantly at 100% in the engine while vdsm shows:

[root@c72he20161206h1 ovirt]# vdsClient -s 0 getVmStats c4a7bdfc-552e-4bf3-889d-b4141e59393b | grep cpu
        cpuUser = 4.88
        cpuSys = 0.60
        cpuUsage = 88830000000
        vcpuPeriod = 100000
        vcpuQuota = -1
        vcpuCount = 4
[root@c72he20161206h1 ovirt]# vdsClient -s 0 getVmStats c4a7bdfc-552e-4bf3-889d-b4141e59393b | grep cpu
        cpuUser = 4.88
        cpuSys = 0.60
        cpuUsage = 88830000000
        vcpuPeriod = 100000
        vcpuQuota = -1
        vcpuCount = 4
[root@c72he20161206h1 ovirt]# vdsClient -s 0 getVmStats c4a7bdfc-552e-4bf3-889d-b4141e59393b | grep cpu
        cpuUser = 4.88
        cpuSys = 0.60
        cpuUsage = 88830000000
        vcpuPeriod = 100000
        vcpuQuota = -1
        vcpuCount = 4
[root@c72he20161206h1 ovirt]# vdsClient -s 0 getVmStats c4a7bdfc-552e-4bf3-889d-b4141e59393b | grep cpu
        cpuUser = 6.78
        cpuSys = 0.67
        cpuUsage = 89230000000
        vcpuPeriod = 100000
        vcpuQuota = -1
        vcpuCount = 4

Comment 8 Simone Tiraboschi 2016-12-06 17:39:01 UTC

While in the engine:

[root@enginevm ~]# sudo -u postgres psql engine -c "select num_of_cpus, cpu_user, cpu_sys, usage_cpu_percent from vms where vm_guid='c4a7bdfc-552e-4bf3-889d-b4141e59393b'"
could not change directory to "/root"
 num_of_cpus | cpu_user | cpu_sys | usage_cpu_percent 
-------------+----------+---------+-------------------
           4 |        7 |       1 |               100
(1 row)

[root@enginevm ~]# sudo -u postgres psql engine -c "select num_of_cpus, cpu_user, cpu_sys, usage_cpu_percent from vms where vm_guid='c4a7bdfc-552e-4bf3-889d-b4141e59393b'"
could not change directory to "/root"
 num_of_cpus | cpu_user | cpu_sys | usage_cpu_percent 
-------------+----------+---------+-------------------
           4 |        7 |       1 |               100
(1 row)

[root@enginevm ~]# sudo -u postgres psql engine -c "select num_of_cpus, cpu_user, cpu_sys, usage_cpu_percent from vms where vm_guid='c4a7bdfc-552e-4bf3-889d-b4141e59393b'"
could not change directory to "/root"
 num_of_cpus | cpu_user | cpu_sys | usage_cpu_percent 
-------------+----------+---------+-------------------
           4 |        4 |       0 |               100
(1 row)

[root@enginevm ~]# sudo -u postgres psql engine -c "select num_of_cpus, cpu_user, cpu_sys, usage_cpu_percent from vms where vm_guid='c4a7bdfc-552e-4bf3-889d-b4141e59393b'"
could not change directory to "/root"
 num_of_cpus | cpu_user | cpu_sys | usage_cpu_percent 
-------------+----------+---------+-------------------
           4 |        4 |       0 |               100
(1 row)

[root@enginevm ~]#

Comment 9 Simone Tiraboschi 2016-12-06 17:39:50 UTC

Not hc specific since I'm not on a hc env.

Comment 10 Simone Tiraboschi 2016-12-06 17:48:32 UTC

The issue is probably here in vdsm logs:

2016-12-06 18:41:01,161 ERROR (periodic/0) [root] VM metrics collection failed (vmstats:263)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 204, in send_metrics
    stat['balloonInfo']['balloon_max']
KeyError: 'balloon_max'

Comment 11 Michal Skrivanek 2016-12-06 18:08:26 UTC

With bulk sampling, do we still read sum of per-cpu guest usage as in the original code https://bugzilla.redhat.com/show_bug.cgi?id=1078897#c20 or do we by mistake perhaps read only the first vcpu?

Comment 12 Michal Skrivanek 2016-12-06 18:09:52 UTC

(In reply to Simone Tiraboschi from comment #10)
> The issue is probably here in vdsm logs:
> 
> 2016-12-06 18:41:01,161 ERROR (periodic/0) [root] VM metrics collection
> failed (vmstats:263)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 204, in
> send_metrics
>     stat['balloonInfo']['balloon_max']
> KeyError: 'balloon_max'

that's part of ybronhei's metrics which is not enabled by default afaik. Why do you enable it? The code is probably wrong though, so worth filing separately

Comment 13 Simone Tiraboschi 2016-12-06 23:07:33 UTC

(In reply to Michal Skrivanek from comment #12)
> that's part of ybronhei's metrics which is not enabled by default afaik. Why
> do you enable it? The code is probably wrong though, so worth filing
> separately

I simply installed, from master, ovirt-hosted-engine-setup that requires vdsm.
No special configuration.

Comment 14 Nikolai Sednev 2016-12-07 13:47:37 UTC

I also see it on my new and clean 4.1 deployment:
ovirt-engine-dwh-setup-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-engine-tools-backup-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-userportal-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-backend-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-wildfly-10.1.0-1.el7.x86_64
ovirt-engine-wildfly-overlay-10.0.0-1.el7.noarch
ovirt-vmconsole-1.0.4-0.0.master.20161130185641.git51ed572.el7.centos.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-release-master-4.1.0-0.5.master.20161201000129.gitf370ec3.el7.centos.noarch
ovirt-setup-lib-1.1.0-0.0.master.20161107100014.gitb73abeb.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.2-0.1.20161130.gite99bbd1.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch
ovirt-host-deploy-java-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-dwh-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-imageio-proxy-0.5.0-0.201612010904.git743fc2d.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-iso-uploader-4.1.0-0.0.master.20160909154152.git14502bd.el7.centos.noarch
ovirt-engine-dbscripts-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-webadmin-portal-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-setup-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-restapi-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-guest-agent-common-1.0.12-3.el7.noarch
ovirt-vmconsole-proxy-1.0.4-0.0.master.20161130185641.git51ed572.el7.centos.noarch
ovirt-engine-lib-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-imageio-proxy-setup-0.5.0-0.201612010904.git743fc2d.el7.centos.noarch
ovirt-engine-websocket-proxy-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-dashboard-1.1.0-0.4.20161128git5ed6f96.el7.centos.noarch
ovirt-engine-hosts-ansible-inventory-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-tools-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.3-0.0.master.20161118164738.gitd0ff686.el7.noarch
ovirt-imageio-common-0.5.0-0.201612010904.git743fc2d.el7.centos.noarch
ovirt-engine-setup-base-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
Linux version 3.10.0-327.36.3.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Oct 24 16:09:20 UTC 2016
Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.2.1511 (Core)

Comment 15 Nikolai Sednev 2016-12-07 13:53:32 UTC

Created attachment 1229062 [details]
Screenshot from 2016-12-07 15-52-28.png

Comment 16 Francesco Romani 2016-12-09 16:28:45 UTC

(In reply to Michal Skrivanek from comment #11)
> With bulk sampling, do we still read sum of per-cpu guest usage as in the
> original code https://bugzilla.redhat.com/show_bug.cgi?id=1078897#c20 or do
> we by mistake perhaps read only the first vcpu?

The idea is still to read the sum[1]. This code has not changed since 4.0.z,
so this issue is weird.

+++

[1] http://libvirt.org/html/libvirt-libvirt-domain.html#virConnectGetAllDomainStats

Comment 17 Francesco Romani 2016-12-09 17:27:10 UTC

I had my development environment handy so I did this check.
one random non-hosted VM, Engine oVirt Engine Version: 4.0.0.4-1.el7.centos (from official centos RPS), Vdsm from master 
vdsm-hook-vmfex-dev-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-xmlrpc-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-jsonrpc-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-python-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-4.18.999-1112.gite377d72.el7.centos.x86_64
vdsm-yajsonrpc-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-api-4.18.999-1112.gite377d72.el7.centos.noarch
vdsm-cli-4.18.999-1112.gite377d72.el7.centos.noarch

the VM cpu reported is well below 100%. please check the attached screenshot as proof.

Comment 18 Francesco Romani 2016-12-09 17:28:27 UTC

Created attachment 1230096 [details]
Vdsm master  reports normal CPU usage for a regular VM

Comment 19 Michal Skrivanek 2016-12-09 17:47:13 UTC

Thanks. 
I had a chance in to take a look at Nikolai's fresh installed system where it indeed reproduced for HE VM but it did not for regular VM - and after looking around I restarted ovirt-engine and since then the issue didn't reappear and both regular and HE VMs showed correct values I start to suspect the initial HE import

moving to SLA

Comment 20 Nikolai Sednev 2016-12-14 10:32:37 UTC

I see this issue on my environment after deployed fresh 4.1 HE and even after restarted the whole environment (host with the engine on it) and then added 2 data storage domains to it (NFS), deployment also was made over NFS.

My host running with:
ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-setup-lib-1.1.0-0.0.master.20161107100014.gitb73abeb.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161130101611.gitb3ad261.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch
rhev-release-4.0.6-6-001.noarch
sanlock-3.4.0-1.el7.x86_64
mom-0.5.8-1.el7ev.noarch
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch
vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-release41-pre-4.1.0-0.0.beta.20161201085255.git731841c.el7.centos.noarch
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161130135331.20161130135328.git3541725.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Engine is:
ovirt-engine-4.1.0-0.0.master.20161201071307.gita5ff876.el7.centos.noarch
Linux version 3.10.0-327.36.3.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Oct 24 16:09:20 UTC 2016
Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.2.1511 (Core)

Comment 21 Nikolai Sednev 2016-12-14 10:40:02 UTC

Created attachment 1231594 [details]
Screenshot from 2016-12-14 12-39-16.png

Comment 22 Nikolai Sednev 2016-12-14 10:41:38 UTC

Created attachment 1231595 [details]
engine.log

Comment 23 Nikolai Sednev 2016-12-14 10:42:39 UTC

Created attachment 1231596 [details]
engine-setup.log

Comment 24 Nikolai Sednev 2016-12-14 10:43:58 UTC

Adding sosreport from host alma04 here https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88amdrQlJQcVdzdXc/view?usp=sharing

Comment 25 Nikolai Sednev 2016-12-14 10:44:31 UTC

Created attachment 1231598 [details]
server.log

Comment 26 Nikolai Sednev 2016-12-14 10:44:55 UTC

Created attachment 1231599 [details]
ui.log

Comment 27 Nikolai Sednev 2016-12-14 11:05:35 UTC

Then I manually made:
1)hosted-engine --set-maintenance --mode=global
2)hosted-engine --vm-poweroff
3)hosted-engine --vm-start
4)hosted-engine --set-maintenance --mode=none
5)CPU got normalized (1% CPU load instead of 100%) and was shown properly via WEBUI.

Comment 28 Martin Sivák 2016-12-14 12:13:02 UTC

Can you reproduce this and attach vdsClient -s 0 getAllVmStats from the host? Because we really need to know what is reported by vdsm here.

Comment 29 Nikolai Sednev 2016-12-19 12:37:57 UTC

puma18 ~]# vdsClient -s 0 getAllVmStats

a64f9d31-ab4a-4a66-9237-ddd6873a4770
        Status = Up
        displayInfo = [{'tlsPort': '-1', 'ipAddress': '0', 'port': '5900', 'type': 'vnc'}]
        memUsage = 20
        acpiEnable = true
        session = Unknown
        displaySecurePort = -1
        timeOffset = 0
        memoryStats = {'swap_out': '0', 'majflt': '0', 'swap_usage': '0', 'mem_cached': '465804', 'mem_free': '13256624', 'mem_buffers': '22032', 'swap_in': '0', 'swap_total': '0', 'pageflt': '143', 'mem_total': '16268136', 'mem_unused': '12768788'}
        pauseCode = NOERR
        disksUsage = [{'path': '/', 'total': '10433613824', 'used': '2231328768', 'fs': 'ext4'}]
        network = {'vnet0': {'macAddr': '00:16:3E:7D:DD:DD', 'rxDropped': '0', 'tx': '19605651', 'txDropped': '0', 'rxErrors': '0', 'rx': '5981735', 'txErrors': '0', 'state': 'unknown', 'sampleTime': 4302766.34, 'speed': '1000', 'name': 'vnet0'}}
        vmJobs = {}
        cpuUser = 6.41
        elapsedTime = 2395
        displayType = vnc
        cpuSys = 0.67
        appsList = ['ovirt-guest-agent-common-1.0.12-3.el7', 'kernel-3.10.0-327.36.3.el7', 'cloud-init-0.7.5-10.el7.centos.1']
        guestOs = 3.10.0-327.36.3.el7.x86_64
        vmName = HostedEngine
        guestFQDN = nsednev-he-4.scl.lab.tlv.redhat.com
        clientIp = 
        hash = 8338893227631203832
        guestCPUCount = 6
        vmType = kvm
        pid = 23577
        displayIp = 0
        cpuUsage = 25660000000
        vcpuPeriod = 100000
        displayPort = 5900
        guestTimezone = {'zone': 'Asia/Jerusalem', 'offset': 120}
        vcpuQuota = -1
        statusTime = 4302766340
        kvmEnable = true
        disks = {'vda': {'readLatency': '675443', 'writtenBytes': '193187840', 'truesize': '2378899456', 'apparentsize': '53687091200', 'readOps': '13209', 'writeLatency': '1582444', 'imageID': '2536e0f4-7e84-4da8-8eae-85bfdae7e08c', 'readBytes': '452727808', 'flushLatency': '190939', 'readRate': '1639.49299531', 'writeOps': '9118', 'writeRate': '43173.3155431'}, 'hdc': {'readLatency': '0', 'writtenBytes': '0', 'truesize': '0', 'apparentsize': '0', 'readOps': '4', 'writeLatency': '0', 'readBytes': '152', 'flushLatency': '0', 'readRate': '0.0', 'writeOps': '0', 'writeRate': '0.0'}}
        monitorResponse = 0
        guestOsInfo = {'kernel': '3.10.0-327.36.3.el7.x86_64', 'type': 'linux', 'version': '7.2.1511', 'distribution': 'centos', 'arch': 'x86_64', 'codename': 'Core'}
        username = root
        guestName = nsednev-he-4.scl.lab.tlv.redhat.com
        lastLogin = 1482149471.95
        vcpuCount = 6
        guestIPs = 10.35.163.167
        guestContainers = []
        netIfaces = [{'name': 'eth0', 'inet6': ['fe80::216:3eff:fe7d:dddd', '2620:52:0:23a0:216:3eff:fe7d:dddd'], 'inet': ['10.35.163.167'], 'hw': '00:16:3e:7d:dd:dd'}]

Comment 30 Nikolai Sednev 2016-12-19 12:38:48 UTC

Created attachment 1233373 [details]
Screenshot from 2016-12-19 14-38-21.png

Comment 31 Nikolai Sednev 2016-12-19 12:41:20 UTC

BTW, vm.conf on host shows as follows:
puma18 ~]# cat /run/ovirt-hosted-engine-ha/vm.conf
vmId=a64f9d31-ab4a-4a66-9237-ddd6873a4770
memSize=16384
display=vnc
devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:4fcf5f91-77fe-4cc1-af98-86de48c89abe,path:,device:cdrom,shared:false,type:disk}
devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:4a4686ee-be11-4ec2-8bd7-c4520b4b3367,imageID:2536e0f4-7e84-4da8-8eae-85bfdae7e08c,specParams:{},readonly:false,domainID:e9bf18cb-cb04-4e7d-af0a-2d7d8e2a74d3,optional:false,deviceId:2536e0f4-7e84-4da8-8eae-85bfdae7e08c,address:{bus:0x00, slot:0x06, domain:0x0000, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1}
devices={device:scsi,model:virtio-scsi,type:controller}
devices={nicModel:pv,macAddr:00:16:3E:7D:DD:DD,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:f9c977a3-6059-4775-a31f-2df457196bf1,address:{bus:0x00, slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface}
devices={device:console,specParams:{},type:console,deviceId:fcd74d3a-1d9b-4069-a421-35172e594115,alias:console0}
devices={device:vga,alias:video0,type:video}
devices={device:vnc,type:graphics}
vmName=HostedEngine
spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
smp=6
maxVCpus=6
cpuType=Westmere
emulatedMachine=rhel6.5.0
devices={device:virtio,specParams:{source:random},model:virtio,type:rng}

I have a separate bug https://bugzilla.redhat.com/show_bug.cgi?id=1402435 on " HE still uses 6.5-based machine type", this issue also might be related.

Comment 32 Martin Sivák 2016-12-19 12:56:37 UTC

This is horribly wrong: cpuUsage = 25660000000

I believe the number should be between 0 and 100%.

Comment 33 Martin Sivák 2016-12-19 13:35:25 UTC

Hmm I might be wrong there, I see it was introduced in vdsm 4.17 like this. I wonder how the engine side percentage is computed.

Comment 34 Martin Sivák 2016-12-19 13:41:45 UTC

But still, there is the cpuUser=6.41, that sounds like pretty high number for a 6 CPU VM.

The value in the engine is computed as Double percent = (getCpuSys() + getCpuUser()) / numOfCpus;

This means VDSM itself returns 7.08 / 6 -> more than 100% load.

Comment 35 Michal Skrivanek 2016-12-19 13:46:18 UTC

this is in percent, so 7.08/6 -> 1% usage

Comment 36 Nikolai Sednev 2016-12-19 14:05:11 UTC

(In reply to Michal Skrivanek from comment #35)
> this is in percent, so 7.08/6 -> 1% usage

I've seen that 1% being shown in WEBUI after engine's VM being restarted.

Comment 37 Martin Sivák 2016-12-20 14:03:31 UTC

Ok so Nikolai reproduced this with debugging enabled and here are the findings:

org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer#updateVmStatistics calls statistics.updateRuntimeData and passes 0 as the number of CPUs. Zero as a divider is Infinity and that is rounded down to 100%

The number of CPUs is cached in the VmManager, the cache is populated by org.ovirt.engine.core.vdsbroker.VmManager#updateStaticFields. That method is however not called for external VMs. And it seems it was not called for the hosted engine VM, because all the cached fields (name, cpu count, memory, origin, autostart..) have the default values of 0 or null.



So this is indeed an hosted engine VM import issue.

The engine restart solves this, because the VmManager reads the information about the already imported VM from the database during initialization (and gets the number of CPUs from there).

Comment 38 Martin Sivák 2016-12-21 11:45:13 UTC

Btw, can you check if this can be reproduced using plain engine and external VMs too? Just take a running engine and start a VM using virsh/libvirt and see what happens.

Comment 42 Nikolai Sednev 2016-12-26 10:26:55 UTC

The issue not being reproduced any more, HE-VM appears as down in WEBADMIN right after fresh deployment, until ovirt-engine service being manually restarted on HE-VM and then no signs of 100% loaded CPU being reported by the UI.
Checked on these components on hosts:
ovirt-engine-appliance-4.1-20161222.1.el7.centos.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161221071755.git46cacd3.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161221070856.20161221070854.git387fa53.el7.centos.noarch
ovirt-engine-appliance-4.1-20161222.1.el7.centos.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1218.gitd36143e.el7.centos.x86_64
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)
You have new mail in /var/spool/mail/root

One engine:
ovirt-engine-setup-plugin-ovirt-engine-common-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-proxy-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-iso-uploader-4.1.0-0.0.master.20160909154152.git14502bd.el7.centos.noarch
ovirt-engine-userportal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dbscripts-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-extensions-api-impl-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
python-ovirt-engine-sdk4-4.1.0-0.1.a0.20161215git77fce51.el7.centos.x86_64
ovirt-host-deploy-java-1.6.0-0.0.master.20161215101008.gitb76ad50.el7.centos.noarch
ovirt-release41-pre-4.1.0-0.6.beta2.20161221025826.gitc487776.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.2-1.el7.noarch
ovirt-engine-dwh-setup-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-imageio-proxy-setup-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-engine-tools-backup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-backend-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-tools-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-webadmin-portal-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-restapi-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-wildfly-overlay-10.0.0-1.el7.noarch
ovirt-engine-cli-3.6.9.2-1.el7.centos.noarch
ovirt-web-ui-0.1.1-2.el7.centos.x86_64
ovirt-engine-setup-base-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-engine-dwh-4.1.0-0.0.master.20161129154019.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-hosts-ansible-inventory-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-engine-dashboard-1.1.0-0.4.20161128git5ed6f96.el7.centos.noarch
ovirt-engine-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-guest-agent-common-1.0.13-1.20161220085008.git165fff1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-engine-wildfly-10.1.0-1.el7.x86_64
ovirt-engine-lib-4.1.0-0.3.beta2.20161221085908.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7.centos.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Dec 6 23:06:41 UTC 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.3.1611 (Core)

Comment 43 Red Hat Bugzilla Rules Engine 2016-12-26 10:27:02 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 44 Nikolai Sednev 2016-12-26 10:33:54 UTC

Comment #42 is about 4.1beta. I will provide info for this bug, once I'll get 4.0.6 fresh environment ready.

Comment 45 Doron Fediuck 2017-01-02 12:29:48 UTC

Michal,
we can fix this for HE. However this needs a fix in the import external VM area as well.
Would you like us to fix just the HE part, or would you prefer to take over and handle for external VMs as well?

Comment 46 Michal Skrivanek 2017-01-02 13:17:37 UTC

I'm afraid fixing it properly would be a lot more complex. Unfortunately we missed that opportunity to generalize it during the work on import HE to be useful for external VMs; they are also not really supported yet. So fixing it just for HE for now is good enough I guess, and once we finally start working on proper full external/kvm VM support we can refactor and unify the implementation.

Comment 47 Red Hat Bugzilla Rules Engine 2017-01-02 13:20:23 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 48 Artyom 2017-01-24 11:35:41 UTC

Verified rhevm-4.1.0.2-0.1.el7.noarch

Note You need to log in before you can comment on or make changes to this bug.