Bug 1414207 - Too many ISO refreshing events
Summary: Too many ISO refreshing events
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Tal Nisan
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-18 02:59 UTC by Germano Veit Michel
Modified: 2020-06-11 13:12 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Previously, the ForceRefreshDomainFilesListByDefault parameter used by virt-viewer was set to "True" by default, sometimes causing a flood of ISO domain refresh events in the logs if many console clients were connected. In the current release, setting this parameter to "False" reduces the number of logged refresh events.
Clone Of:
Environment:
Last Closed: 2018-05-15 17:40:52 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
server, engine and vdsm logs (2.06 MB, application/x-gzip)
2017-02-22 15:51 UTC, Kevin Alon Goldblatt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2934581 0 None None None 2017-02-20 00:19:02 UTC
Red Hat Product Errata RHEA-2018:1488 0 None None None 2018-05-15 17:42:08 UTC
oVirt gerrit 79029 0 ovirt-engine-4.1 ABANDONED core: Suppress multiple audit logs on ISO list refresh 2020-04-03 12:29:04 UTC
oVirt gerrit 79030 0 master MERGED core: Added invalidate cache mechanism for domain files refresh 2020-04-03 12:29:04 UTC
oVirt gerrit 79263 0 master MERGED Added a refresh parameter to FilesService list 2020-04-03 12:29:04 UTC
oVirt gerrit 79422 0 model_4.1 MERGED Added a refresh parameter to FilesService list 2020-04-03 12:29:03 UTC
oVirt gerrit 79616 0 master MERGED restapi: Update to model 4.2.14 2020-04-03 12:29:03 UTC
oVirt gerrit 79619 0 ovirt-engine-4.1 ABANDONED restapi: Update to model 4.1.38 2020-04-03 12:29:03 UTC
oVirt gerrit 80051 0 master MERGED restapi: Make force refresh of domain files defined by param 2020-04-03 12:29:03 UTC

Description Germano Veit Michel 2017-01-18 02:59:00 UTC
Description of problem:

Since https://bugzilla.redhat.com/show_bug.cgi?id=1347726#c5 virt-viewer asks for a refresh of the ISO Domain images every 5 minutes.

For every connected console client, a refresh is made every 5 minutes. So if there are 100 clients, we have roughly 1 refresh every 3 seconds. Even with just a few clients connected it's already disturbing. I wonder what I looks like on VDI solutions. Apparently there will be further improvements from that BZ in the future. But currently this mechanism simply floods the engine logs and doesn't look optimal. Currently, when the admin clicks on the Events tab, to check how RHV is working for the last days, he mostly sees pages and pages of useless information:

Refresh image list succeeded for domain(s): ISOs (All file type)

And if some clients use the older virt-viewer versions (5 seconds), it's even worse.

- Does this really need to go to audit_log? Why not just log in case of error? engine.log is enough.

Alternatively, just throwing some random ideas:
- refresh ONCE every X minutes and cache the results for all subsequent requests from the clients?
- Somehow monitor the domain for new files (inotify doesn't right but there could be some other solution?) and refresh only when it's needed? 
- Or make engine-iso-upload tell the manager via api to refresh?

Version-Release number of selected component (if applicable):
4.0.5

How reproducible:
100%

Steps to Reproduce:
1. Add ISO domain
2. Have clients using the VMs

Actual results:
Logs flooded, Events tab becomes useless without filters

Expected results:
Less irrelevant information flooding the Events tab.

Comment 2 Tal Nisan 2017-01-18 08:20:10 UTC
Yaniv, what do you think will be the best course of action here?

Comment 3 Yaniv Lavi 2017-01-22 10:00:41 UTC
Why is the client requesting a refresh?
To allow mounting the ISOs on the client drop-down?

Comment 4 Tal Nisan 2017-01-22 13:55:24 UTC
It comes from virt-viewer apparently, we have a mechanism to avoid repeating audit logs and suppress them for a given amount of time, sounds OK to you?

Comment 5 Yaniv Lavi 2017-01-23 22:22:25 UTC
(In reply to Tal Nisan from comment #4)
> It comes from virt-viewer apparently, we have a mechanism to avoid repeating
> audit logs and suppress them for a given amount of time, sounds OK to you?

Why is this audit needed? Maybe drop it?

Comment 6 Yaniv Lavi 2017-01-23 22:24:47 UTC
Wouldn't the client refresh cause a scale issue. Why is the client doing this? 
Maybe a refresh every few minutes is enough.

Comment 7 Yaniv Kaul 2017-01-24 05:56:21 UTC
(In reply to Yaniv Dary from comment #6)
> Wouldn't the client refresh cause a scale issue. Why is the client doing
> this? 
> Maybe a refresh every few minutes is enough.
But this is exactly what's happening. Every client is asking every 5 minutes. But since we do not respond in a cached response, multiple clients cause a refresh storm.

Comment 8 Tal Nisan 2017-01-24 10:01:16 UTC
(In reply to Yaniv Kaul from comment #7)
> (In reply to Yaniv Dary from comment #6)
> > Wouldn't the client refresh cause a scale issue. Why is the client doing
> > this? 
> > Maybe a refresh every few minutes is enough.
> But this is exactly what's happening. Every client is asking every 5
> minutes. But since we do not respond in a cached response, multiple clients
> cause a refresh storm.

Exactly, the problem is not the interval, it's the fact that it's per client and not a general interval

Comment 9 Yaniv Kaul 2017-02-06 08:17:23 UTC
(In reply to Tal Nisan from comment #8)
> (In reply to Yaniv Kaul from comment #7)
> > (In reply to Yaniv Dary from comment #6)
> > > Wouldn't the client refresh cause a scale issue. Why is the client doing
> > > this? 
> > > Maybe a refresh every few minutes is enough.
> > But this is exactly what's happening. Every client is asking every 5
> > minutes. But since we do not respond in a cached response, multiple clients
> > cause a refresh storm.
> 
> Exactly, the problem is not the interval, it's the fact that it's per client
> and not a general interval

Tal, how do we proceed? The audit log is the problem, or the fact we are actually performing so many refreshes? I reckon the latter. If it's just an audit log issue, I'm far less concerned.

Comment 10 Tal Nisan 2017-02-06 09:32:43 UTC
As I wrote in comment 4, we can suppress those quite easily but I think the refresh every 5 minutes multiplied by each running guest is the real issue here, some caching mechanism with an expiry time of a few minutes should do the trick

Comment 11 Yaniv Kaul 2017-02-06 10:18:34 UTC
(In reply to Tal Nisan from comment #10)
> As I wrote in comment 4, we can suppress those quite easily but I think the
> refresh every 5 minutes multiplied by each running guest is the real issue
> here, some caching mechanism with an expiry time of a few minutes should do
> the trick

Agreed, please implement.

Comment 15 Kevin Alon Goldblatt 2017-02-22 15:38:17 UTC
I reproduced the scenario with the following code:
------------------------------------------------------
ovirt-engine-4.1.1-0.1.el7.noarch
rhevm-4.1.1-0.1.el7.noarch
vdsm-4.19.5-1.el7ev.x86_64

Reproduced with the following scenario:
-----------------------------------------------------
1. Created a vm with a disk and started the vm and accessed the console:
>> only one messaged was reported in the engine log every 5 minutes
"Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)"
2. Created 9 more vms and started them
3. Accessed each VM's via the console with about 1 to 2 minutes appart >>> Now a request is reported in the log up to 3 times a minute

Print out from engine log:
----------------------------
2017-02-22 16:51:58,106+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [d23ae7c5-6349-4b25-b6a8-4ae741f1dd71] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:52:21,031+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [de0ade3b-f406-4ea3-ab1e-7ed68220dbfc] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:52:54,095+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [0ec6a6d4-e0ac-4374-839a-b929761703e7] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:53:34,122+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-16) [c0d62762-3786-4393-9dcc-25530fe1769c] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:53:40,781+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [8bd4c7f8-b9a1-4fde-b206-f6372855fb3b] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:53:52,777+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [123a046d-4f82-4ccd-b38c-311d9c0a7577] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:54:54,148+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [6dc78534-75bb-402f-8c5e-ebf5d0503f43] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:55:24,517+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-8) [b0dfac5c-85cf-4f4b-8dac-baff0cb43b05] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:56:58,713+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [5e5822cd-5519-4657-a108-cca63fa30aee] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:57:21,759+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [a65d3c24-a89a-4db1-895c-13b225a7f3a8] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:57:54,817+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [c69e93d5-9c8e-49e4-82be-3851459b1409] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:58:34,791+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [41c6b2c8-4ecb-4e96-a17c-865215d71196] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:58:42,990+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [8cdb28b1-cb80-41a7-b9c1-4ff2506806c1] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
2017-02-22 16:58:53,719+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [3c99063c-a313-49b9-aa78-c290f5061eb7] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)

Comment 16 Kevin Alon Goldblatt 2017-02-22 15:51:00 UTC
Created attachment 1256522 [details]
server, engine and vdsm logs

Added logs

Comment 20 Allon Mureinik 2017-07-02 20:37:42 UTC
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4.

Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.

Comment 21 Allon Mureinik 2017-07-12 12:24:33 UTC
The proposed patch is way too delicate for a z-stream.
In interests of stability, I propose moving this out to 4.2.

Comment 22 rhev-integ 2017-11-02 13:39:20 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops@redhat.com

Comment 23 rhev-integ 2017-11-02 21:08:59 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'ovirt-engine-api-model'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops@redhat.com

Comment 24 Yaniv Kaul 2017-11-05 10:07:34 UTC
(In reply to rhev-integ from comment #23)
> INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following
> reason:
> 
> [Project 'ovirt-engine-api-model'/Component 'ovirt-engine' mismatch]

At least one patch was on the correct project.

> 
> For more info please contact: rhv-devops@redhat.com

Comment 25 Eyal Edri 2017-11-05 15:19:08 UTC
Its in our hands, do we want to allow other projects attached to the bug and move bugs to ON_QA even if it has one patch on the same component? if not one, then how many? 

If we can agree on a logic, we can work on change it in our script.

Comment 28 Kevin Alon Goldblatt 2017-12-13 11:26:22 UTC
Tested with the following code:
--------------------------------
ovirt-engine-4.2.0-0.6.el7.noarch
vdsm-4.20.9-1.el7ev.x86_64


Verified with the following scenario:
-----------------------------------------------------
1. Created a vm with a disk and started the vm and accessed the console:
>> only one messaged was reported in the engine log every 5 minutes
"2017-12-12 11:57:41,202+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [6d0552f8-b976-4e01-82c4-a1b20c0b93e3] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Refresh image list succeeded for domain(s): ISO369 (All file type)
"
2. Created 9 more vms and started them
3. Started the console on all 10 vms
After 20 minutes 26 records
After 90 minutes 156 records and then Access 1 more vm
After 180 minutes 166 records

4. Changed the config value on the engine as follows:
engine-config --set ForceRefreshDomainFilesListByDefault=false
[root@kevin4 ~]# engine-config --get ForceRefreshDomainFilesListByDefault
ForceRefreshDomainFilesListByDefault: false version: general
[root@kevin4 ~]# systemctl restart ovirt-engine
5. Waited around 2 hours, access the same 10 consoles >>>>> only 30 records were  reported.


Moving to VERIFIED

Comment 31 errata-xmlrpc 2018-05-15 17:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 32 Franta Kust 2019-05-16 13:05:55 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.