Bug 1420878

Summary: oVirt 4 engine generates significantly more load than 3
Product: [oVirt] ovirt-engine Reporter: Marcus Sundberg <devel>
Component: GeneralAssignee: bugs <bugs>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0.4CC: bugs, devel, rgolan, sradco, ykaul, ylavi
Target Milestone: ovirt-4.2.0Keywords: Performance
Target Release: ---Flags: devel: needinfo-
rule-engine: ovirt-4.2+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-26 12:27:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
CPU graph covering the last year. none

Description Marcus Sundberg 2017-02-09 17:59:30 UTC
Created attachment 1248870 [details]
CPU graph covering the last year.

We have been running oVirt for a couple of years, and up until and including
3.6.7 the load on the engine node (an old HP DL360 Gen5 with SAS disks but
no battery backed RAID controller) was low. Then in the beginning of
October we upgraded to 4.0.4, and this caused the IO load to become
several times higher (I guess is due to the DWH functionality).

In November we upgraded to 4.0.5, which didn't have any effect on the load.

But now in January when we upgraded to 4.0.6 the CPU usage increased to
more than three times previously. As we yesterday upgraded to 4.1.0
and the load remains at the levels of 4.0.6 I'm reporting this against
4.1.0.4.

The number of hosts and VMs have been pretty constant during the time:

hosts-active          : 11
hosts-total           : 12
storage_domains-active: 7
storage_domains-total : 9
users-active          : 52
users-total           : 52
vms-active            : 242
vms-total             : 579

Comment 1 Yaniv Kaul 2017-02-13 19:24:09 UTC
Marcus - is this hosted engine or regular engine installation?
Can you provide engine, apache and DWH logs? 
Can you see in 'top' what seems to consume most of the CPU usage?
(We've also made various improvements in 4.1, I wonder how it'll look like with 4.1).

Comment 2 Yaniv Lavi 2017-02-26 12:27:32 UTC
I'm closing as there is no reply for 2 weeks.
Please reopen if you can provide the needed info.

Comment 3 Marcus Sundberg 2017-03-27 11:57:03 UTC
This was a regular engine installation.

I managed to cut the IO load in half by modifying the postgresql config
from the EL7 defaults: 
-shared_buffers = 32MB			# min 128kB
+shared_buffers = 2048MB			# min 128kB
IO load was then still IMHO a bit high.

The high CPU usage introduced in 4.0.6 then resolved itself after a reboot,
while still running 4.0.6.

Then about a month ago I replaced the hardware of the engine with
a Dell PowerEdge R610 with battery backed RAID, and that reduced
average IO wait time of the system to about 1.8% which I consider
fine, even though it is twice as much as the IO load were on
oVirt 3.6 on worse hardware.

(Also upgraded the new engine to 4.1.1 a few days ago, and both IO
and CPU load remained consistent).

Thus closing this as WORKSFORME now.

Comment 4 Roy Golan 2017-03-27 19:35:51 UTC
(In reply to Marcus Sundberg from comment #3)
> This was a regular engine installation.
> 
> I managed to cut the IO load in half by modifying the postgresql config
> from the EL7 defaults: 
> -shared_buffers = 32MB			# min 128kB
> +shared_buffers = 2048MB			# min 128kB
> IO load was then still IMHO a bit high.

- what was the iowait before and after that change? 
- how much shared mem does pg take now? I'm suggesting that 2048 is just way high and that 512Mb can work just as well.
- how much physycal mem this machine has?

PG recommends setting this between 25%-40% of the machine mem, depends on the workload.