Bug 1845747
| Summary: | After upgrading to 4.3 and updating cluster, VM tab is extremely slow, until VM's are restarted | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Marcus West <mwest> |
| Component: | ovirt-engine | Assignee: | Lucia Jelinkova <ljelinko> |
| Status: | CLOSED ERRATA | QA Contact: | Tzahi Ashkenazi <tashkena> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.3.10 | CC: | ahadas, aoconnor, bcholler, dagur, fdelorey, jtejal, ljelinko, lrotenbe, mavital, mlehrer, nobody, pelauter, rdlugyhe, rmcswain |
| Target Milestone: | ovirt-4.3.11 | Keywords: | Performance |
| Target Release: | 4.3.11 | Flags: | ahadas:
needinfo-
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rhv-4.3.11-4 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, after upgrading to 4.3 and updating the cluster, the virtual machine (VM) tab in the Administration Portal was extremely slow until you restarted the VMs. This issue happened because updating the page recalculated the list of changed fields for every VM on the VM list page (read from the snapshot). The current release fixes this issue. It eliminates the previous performance impact by calculating the changed fields only once when the next run snapshot is created.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-30 10:07:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Marcus West
2020-06-10 00:24:00 UTC
Summarize an offline discussion about this: 1. Additional database queries that were added as part of retrieving the changed fields in next-run configuration compared to the current configuration probably have a significant effect on the overhead added to the search query. 2. While eliminating the additional database queries seems possible, we'll still need to execute an additional database query per-VM and parse the OVF on every refresh. 3. The downsides of in-memory caching: (a) we'll need to make sure it's in-sync; and (b) increase memory consumption. So avoiding the computation of the changed-fields on refreshes by retrieving them from the database makes sense. Regarding the reproducer steps for scale: #Seems like we'd need to: Create 4.2 Engine and host with 120 vms on 1 host. Update engine only to 4.3 Edit Cluster which has 120 vms running 4.2 and set compatibility version to 4.3 Click VM tab - scale will generate trace of all sqls related to VM tabs UI action and engine utilization. Then, While 4.2 vms are still running (without being rebooted) - bring down 4.3 engine, upgrade to fixed in version of 4.3.11. Click VM tab - to generate trace of all sqls related to VM tabs UI. Expected result is what a reduction in sqls called upon UI view of VM tabs? Reduction in Engine CPU utilization for UI VM tab view? Are there specific queries we shouldn't see in the trace once we've upgraded to 4.3.11? Do you agree with the above? Yes, that's correct. As for the questions - Most importantly, we should see reduction in CPU utilization by the engine (both at the database level and at the Java level). With the fix, we should also see much lower amount of database queries on the 'snapshots' table - could be that not at all. Comparing Results on rhev versions :
1. rhv-release-4.2.13-2 ( baseline)
2. rhv-release-4.3.10-7 ( bad version as reported on the BZ)
3. rhv-release-4.3.11-4 ( fix version )
environment:
VMs Count : 180
the API command that was using for this test > curl -k -u admin@1 https://rhev-green-01./ovirt-engine/api/vms
1. The baseline cycle was on rhv-release-4.2.13-2
a. API call using curl command took : 0m0.657s
b. CPU Utilization is normal
2. Problematic version was on rhv-release-4.3.10-7
a. API call using curl command took : 0m44.725s
b. from the GUI > Compute > VMs ,loading/response took around 10 seconds
c. CPU utilization was 95% which is very high !!!
d. Java & postmaster consuming 75% CPU usage
3. Fix version that was tested rhv-release-4.3.11-4
a. API call using curl command took 1.1 sec
b. CPU Utilization is normal
c. Java & postmaster consuming 5% CPU usage which is normal
Notes:
1. with the fix jdbc total time took 260.3 ms on 435 queries , the only usage of snapshot queries is call delete_entity_snapshot_by_command_id
2. without the fix jdbc total time took 10.9 sec on 4215 queries, the usage of snapshot queries is as following:
getsnapshotbysnapshotid ,getsnapshotbyvmidandtype ,getsnapshotsbyvmsnapshotid total execution count : 540 times
*** Bug 1877120 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Virtualization Engine security, bug fix 4.3.11), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4112 |