Created attachment 1248870 [details] CPU graph covering the last year. We have been running oVirt for a couple of years, and up until and including 3.6.7 the load on the engine node (an old HP DL360 Gen5 with SAS disks but no battery backed RAID controller) was low. Then in the beginning of October we upgraded to 4.0.4, and this caused the IO load to become several times higher (I guess is due to the DWH functionality). In November we upgraded to 4.0.5, which didn't have any effect on the load. But now in January when we upgraded to 4.0.6 the CPU usage increased to more than three times previously. As we yesterday upgraded to 4.1.0 and the load remains at the levels of 4.0.6 I'm reporting this against 4.1.0.4. The number of hosts and VMs have been pretty constant during the time: hosts-active : 11 hosts-total : 12 storage_domains-active: 7 storage_domains-total : 9 users-active : 52 users-total : 52 vms-active : 242 vms-total : 579
Marcus - is this hosted engine or regular engine installation? Can you provide engine, apache and DWH logs? Can you see in 'top' what seems to consume most of the CPU usage? (We've also made various improvements in 4.1, I wonder how it'll look like with 4.1).
I'm closing as there is no reply for 2 weeks. Please reopen if you can provide the needed info.
This was a regular engine installation. I managed to cut the IO load in half by modifying the postgresql config from the EL7 defaults: -shared_buffers = 32MB # min 128kB +shared_buffers = 2048MB # min 128kB IO load was then still IMHO a bit high. The high CPU usage introduced in 4.0.6 then resolved itself after a reboot, while still running 4.0.6. Then about a month ago I replaced the hardware of the engine with a Dell PowerEdge R610 with battery backed RAID, and that reduced average IO wait time of the system to about 1.8% which I consider fine, even though it is twice as much as the IO load were on oVirt 3.6 on worse hardware. (Also upgraded the new engine to 4.1.1 a few days ago, and both IO and CPU load remained consistent). Thus closing this as WORKSFORME now.
(In reply to Marcus Sundberg from comment #3) > This was a regular engine installation. > > I managed to cut the IO load in half by modifying the postgresql config > from the EL7 defaults: > -shared_buffers = 32MB # min 128kB > +shared_buffers = 2048MB # min 128kB > IO load was then still IMHO a bit high. - what was the iowait before and after that change? - how much shared mem does pg take now? I'm suggesting that 2048 is just way high and that 512Mb can work just as well. - how much physycal mem this machine has? PG recommends setting this between 25%-40% of the machine mem, depends on the workload.