Bug 1845747 - After upgrading to 4.3 and updating cluster, VM tab is extremely slow, until VM's are restarted
Summary: After upgrading to 4.3 and updating cluster, VM tab is extremely slow, until...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.3.11
: 4.3.11
Assignee: Lucia Jelinkova
QA Contact: Tzahi Ashkenazi
URL:
Whiteboard:
: 1877120 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-10 00:24 UTC by Marcus West
Modified: 2023-12-15 18:07 UTC (History)
14 users (show)

Fixed In Version: rhv-4.3.11-4
Doc Type: Bug Fix
Doc Text:
Previously, after upgrading to 4.3 and updating the cluster, the virtual machine (VM) tab in the Administration Portal was extremely slow until you restarted the VMs. This issue happened because updating the page recalculated the list of changed fields for every VM on the VM list page (read from the snapshot). The current release fixes this issue. It eliminates the previous performance impact by calculating the changed fields only once when the next run snapshot is created.
Clone Of:
Environment:
Last Closed: 2020-09-30 10:07:13 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
ahadas: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-37524 0 None None None 2022-08-18 17:06:45 UTC
Red Hat Knowledge Base (Solution) 5229841 0 None None None 2020-07-17 00:24:31 UTC
Red Hat Product Errata RHBA-2020:4112 0 None None None 2020-09-30 10:07:44 UTC
oVirt gerrit 109736 0 master MERGED engine: Add changedFields to next-run Snapshots 2021-01-07 21:45:37 UTC
oVirt gerrit 109777 0 ovirt-engine-4.3 ABANDONED engine: Add changedFields to next-run Snapshots 2021-01-07 21:45:37 UTC
oVirt gerrit 109802 0 ovirt-engine-4.3 MERGED engine: Add changedFields to next-run Snapshots 2021-01-07 21:44:58 UTC

Description Marcus West 2020-06-10 00:24:00 UTC
Description of problem:

After upgrading to 4.3 and updating the cluster, VM tab is extremely slow, until VM's are restarted

Version-Release number of selected component (if applicable):

ovirt-engine-4.2.8.9-0.1.el7ev.noarch
ovirt-engine-4.3.10.4-0.1.el7.noarch

How reproducible:

100% so far

Steps to Reproduce:
1. Upgrade from 4.2 to 4.3, with multiple VM's running
2. Upgrade the cluster version from 4.2 to 4.3
3. Observer that VM's need restarting to effect config update

Actual results:

VM tab takes around 10 seconds to complete (62 VM's).  Via api, request takes over 1 minute.  This gets worse if there are more VM's

Expected results:

performance should be similar to before (~1sec via gui or api)

Additional info:

While restarting the VM's is required and does resolve the issue, it's not always practical to restart all VM's immediately.  This can mean that GUI performance will be degraded for quite some time. (until a significant number of VM's are rebooted)

Comment 5 Arik 2020-06-17 09:00:26 UTC
Summarize an offline discussion about this:

1. Additional database queries that were added as part of retrieving the changed fields in next-run configuration compared to the current configuration probably have a significant effect on the overhead added to the search query.
2. While eliminating the additional database queries seems possible, we'll still need to execute an additional database query per-VM and parse the OVF on every refresh.
3. The downsides of in-memory caching: (a) we'll need to make sure it's in-sync; and (b) increase memory consumption.

So avoiding the computation of the changed-fields on refreshes by retrieving them from the database makes sense.

Comment 11 mlehrer 2020-07-16 12:42:56 UTC
Regarding the reproducer steps for scale:

#Seems like we'd need to:
Create 4.2 Engine and host with 120 vms on 1 host.
Update engine only to 4.3
Edit Cluster which has 120 vms running 4.2 and set compatibility version to 4.3
Click VM tab - scale will generate trace of all sqls related to VM tabs UI action and engine utilization.

Then,
While 4.2 vms are still running (without being rebooted) - bring down 4.3 engine, upgrade to fixed in version of 4.3.11.
Click VM tab - to generate trace of all sqls related to VM tabs UI.


Expected result is what a reduction in sqls called upon UI view of VM tabs?  Reduction in Engine CPU utilization for UI VM tab view?
Are there specific queries we shouldn't see in the trace once we've upgraded to 4.3.11?

Do you agree with the above?

Comment 12 Arik 2020-07-16 14:52:47 UTC
Yes, that's correct.
As for the questions -
Most importantly, we should see reduction in CPU utilization by the engine (both at the database level and at the Java level).
With the fix, we should also see much lower amount of database queries on the 'snapshots' table - could be that not at all.

Comment 18 Tzahi Ashkenazi 2020-08-03 13:10:28 UTC
Comparing  Results on rhev versions :
     
     1. rhv-release-4.2.13-2  ( baseline)
     2. rhv-release-4.3.10-7  ( bad version as reported on the BZ)
     3. rhv-release-4.3.11-4  ( fix version )

environment:

VMs Count : 180

the API command that was using for this test > curl -k -u admin@1 https://rhev-green-01./ovirt-engine/api/vms


1.  The baseline cycle was on  rhv-release-4.2.13-2  
       a. API call using curl command took :  0m0.657s
       b. CPU Utilization is normal  

2.  Problematic version  was on rhv-release-4.3.10-7
       a. API call using curl command took :  0m44.725s
       b. from the GUI > Compute > VMs  ,loading/response took around 10 seconds
       c. CPU utilization was 95% which is very high !!!
       d. Java & postmaster consuming 75%  CPU usage


3. Fix version that was tested  rhv-release-4.3.11-4
      a. API call using curl command took  1.1 sec
      b. CPU Utilization is normal
      c.  Java & postmaster consuming  5% CPU  usage which is normal

Notes:
1. with the fix jdbc total time took 260.3 ms  on 435 queries  , the only usage of snapshot queries is  call delete_entity_snapshot_by_command_id
     
2. without the fix jdbc total time took 10.9 sec  on  4215 queries, the usage of snapshot queries  is as following:
         getsnapshotbysnapshotid ,getsnapshotbyvmidandtype ,getsnapshotsbyvmsnapshotid total execution count : 540 times

Comment 21 Arik 2020-09-15 05:44:02 UTC
*** Bug 1877120 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2020-09-30 10:07:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Virtualization Engine security, bug fix 4.3.11), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4112


Note You need to log in before you can comment on or make changes to this bug.