Description of problem: Since https://bugzilla.redhat.com/show_bug.cgi?id=1347726#c5 virt-viewer asks for a refresh of the ISO Domain images every 5 minutes. For every connected console client, a refresh is made every 5 minutes. So if there are 100 clients, we have roughly 1 refresh every 3 seconds. Even with just a few clients connected it's already disturbing. I wonder what I looks like on VDI solutions. Apparently there will be further improvements from that BZ in the future. But currently this mechanism simply floods the engine logs and doesn't look optimal. Currently, when the admin clicks on the Events tab, to check how RHV is working for the last days, he mostly sees pages and pages of useless information: Refresh image list succeeded for domain(s): ISOs (All file type) And if some clients use the older virt-viewer versions (5 seconds), it's even worse. - Does this really need to go to audit_log? Why not just log in case of error? engine.log is enough. Alternatively, just throwing some random ideas: - refresh ONCE every X minutes and cache the results for all subsequent requests from the clients? - Somehow monitor the domain for new files (inotify doesn't right but there could be some other solution?) and refresh only when it's needed? - Or make engine-iso-upload tell the manager via api to refresh? Version-Release number of selected component (if applicable): 4.0.5 How reproducible: 100% Steps to Reproduce: 1. Add ISO domain 2. Have clients using the VMs Actual results: Logs flooded, Events tab becomes useless without filters Expected results: Less irrelevant information flooding the Events tab.
Yaniv, what do you think will be the best course of action here?
Why is the client requesting a refresh? To allow mounting the ISOs on the client drop-down?
It comes from virt-viewer apparently, we have a mechanism to avoid repeating audit logs and suppress them for a given amount of time, sounds OK to you?
(In reply to Tal Nisan from comment #4) > It comes from virt-viewer apparently, we have a mechanism to avoid repeating > audit logs and suppress them for a given amount of time, sounds OK to you? Why is this audit needed? Maybe drop it?
Wouldn't the client refresh cause a scale issue. Why is the client doing this? Maybe a refresh every few minutes is enough.
(In reply to Yaniv Dary from comment #6) > Wouldn't the client refresh cause a scale issue. Why is the client doing > this? > Maybe a refresh every few minutes is enough. But this is exactly what's happening. Every client is asking every 5 minutes. But since we do not respond in a cached response, multiple clients cause a refresh storm.
(In reply to Yaniv Kaul from comment #7) > (In reply to Yaniv Dary from comment #6) > > Wouldn't the client refresh cause a scale issue. Why is the client doing > > this? > > Maybe a refresh every few minutes is enough. > But this is exactly what's happening. Every client is asking every 5 > minutes. But since we do not respond in a cached response, multiple clients > cause a refresh storm. Exactly, the problem is not the interval, it's the fact that it's per client and not a general interval
(In reply to Tal Nisan from comment #8) > (In reply to Yaniv Kaul from comment #7) > > (In reply to Yaniv Dary from comment #6) > > > Wouldn't the client refresh cause a scale issue. Why is the client doing > > > this? > > > Maybe a refresh every few minutes is enough. > > But this is exactly what's happening. Every client is asking every 5 > > minutes. But since we do not respond in a cached response, multiple clients > > cause a refresh storm. > > Exactly, the problem is not the interval, it's the fact that it's per client > and not a general interval Tal, how do we proceed? The audit log is the problem, or the fact we are actually performing so many refreshes? I reckon the latter. If it's just an audit log issue, I'm far less concerned.
As I wrote in comment 4, we can suppress those quite easily but I think the refresh every 5 minutes multiplied by each running guest is the real issue here, some caching mechanism with an expiry time of a few minutes should do the trick
(In reply to Tal Nisan from comment #10) > As I wrote in comment 4, we can suppress those quite easily but I think the > refresh every 5 minutes multiplied by each running guest is the real issue > here, some caching mechanism with an expiry time of a few minutes should do > the trick Agreed, please implement.
I reproduced the scenario with the following code: ------------------------------------------------------ ovirt-engine-4.1.1-0.1.el7.noarch rhevm-4.1.1-0.1.el7.noarch vdsm-4.19.5-1.el7ev.x86_64 Reproduced with the following scenario: ----------------------------------------------------- 1. Created a vm with a disk and started the vm and accessed the console: >> only one messaged was reported in the engine log every 5 minutes "Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)" 2. Created 9 more vms and started them 3. Accessed each VM's via the console with about 1 to 2 minutes appart >>> Now a request is reported in the log up to 3 times a minute Print out from engine log: ---------------------------- 2017-02-22 16:51:58,106+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [d23ae7c5-6349-4b25-b6a8-4ae741f1dd71] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:52:21,031+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [de0ade3b-f406-4ea3-ab1e-7ed68220dbfc] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:52:54,095+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [0ec6a6d4-e0ac-4374-839a-b929761703e7] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:53:34,122+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-16) [c0d62762-3786-4393-9dcc-25530fe1769c] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:53:40,781+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [8bd4c7f8-b9a1-4fde-b206-f6372855fb3b] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:53:52,777+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [123a046d-4f82-4ccd-b38c-311d9c0a7577] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:54:54,148+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [6dc78534-75bb-402f-8c5e-ebf5d0503f43] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:55:24,517+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-8) [b0dfac5c-85cf-4f4b-8dac-baff0cb43b05] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:56:58,713+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-15) [5e5822cd-5519-4657-a108-cca63fa30aee] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:57:21,759+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [a65d3c24-a89a-4db1-895c-13b225a7f3a8] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:57:54,817+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [c69e93d5-9c8e-49e4-82be-3851459b1409] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:58:34,791+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [41c6b2c8-4ecb-4e96-a17c-865215d71196] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:58:42,990+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [8cdb28b1-cb80-41a7-b9c1-4ff2506806c1] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type) 2017-02-22 16:58:53,719+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [3c99063c-a313-49b9-aa78-c290f5061eb7] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Refresh image list succeeded for domain(s): ISO_DOMAIN (All file type)
Created attachment 1256522 [details] server, engine and vdsm logs Added logs
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4. Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.
The proposed patch is way too delicate for a z-stream. In interests of stability, I propose moving this out to 4.2.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Project 'ovirt-engine-api-model'/Component 'ovirt-engine' mismatch] For more info please contact: rhv-devops
(In reply to rhev-integ from comment #23) > INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Project 'ovirt-engine-api-model'/Component 'ovirt-engine' mismatch] At least one patch was on the correct project. > > For more info please contact: rhv-devops
Its in our hands, do we want to allow other projects attached to the bug and move bugs to ON_QA even if it has one patch on the same component? if not one, then how many? If we can agree on a logic, we can work on change it in our script.
Tested with the following code: -------------------------------- ovirt-engine-4.2.0-0.6.el7.noarch vdsm-4.20.9-1.el7ev.x86_64 Verified with the following scenario: ----------------------------------------------------- 1. Created a vm with a disk and started the vm and accessed the console: >> only one messaged was reported in the engine log every 5 minutes "2017-12-12 11:57:41,202+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [6d0552f8-b976-4e01-82c4-a1b20c0b93e3] EVENT_ID: REFRESH_REPOSITORY_IMAGE_LIST_SUCCEEDED(998), Refresh image list succeeded for domain(s): ISO369 (All file type) " 2. Created 9 more vms and started them 3. Started the console on all 10 vms After 20 minutes 26 records After 90 minutes 156 records and then Access 1 more vm After 180 minutes 166 records 4. Changed the config value on the engine as follows: engine-config --set ForceRefreshDomainFilesListByDefault=false [root@kevin4 ~]# engine-config --get ForceRefreshDomainFilesListByDefault ForceRefreshDomainFilesListByDefault: false version: general [root@kevin4 ~]# systemctl restart ovirt-engine 5. Waited around 2 hours, access the same 10 consoles >>>>> only 30 records were reported. Moving to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488
BZ<2>Jira Resync