Bug 1367519 - VmsStatisticsFetcher excessive logging in engine.log (clone of bug 1366138 for 3.6.9)
Summary: VmsStatisticsFetcher excessive logging in engine.log (clone of bug 1366138 fo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.6.7
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-3.6.9
: ---
Assignee: Arik
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-16 15:48 UTC by Julio Entrena Perez
Modified: 2022-07-09 09:30 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Previously, recently added logs that printed the amount of virtual machines running on a host were excessively written to the log file. Now, the frequency of these log have been reduced by printing them only upon a change in the number of virtual machines running on the host.
Clone Of:
Environment:
Last Closed: 2016-09-21 18:05:52 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47383 0 None None None 2022-07-09 09:30:29 UTC
Red Hat Knowledge Base (Solution) 2532251 0 None None None 2016-08-16 16:00:10 UTC
Red Hat Product Errata RHSA-2016:1929 0 normal SHIPPED_LIVE Moderate: Red Hat Virtualization Manager (RHV) bug fix 3.6.9 2016-09-21 21:57:10 UTC
oVirt gerrit 62636 0 ovirt-engine-3.6 MERGED core: log num of running vms only upon change 2016-08-22 12:19:13 UTC

Description Julio Entrena Perez 2016-08-16 15:48:20 UTC
Description of problem:
A huge amount of lines from VmsStatisticsFetcher are continuously logged to engine.log under normal operation.
This vastly increases the size of the log files and makes checking its contents frustrating.

Version-Release number of selected component (if applicable):
rhevm-3.6.7.5-0.1.el6

How reproducible:
Always

Steps to Reproduce:
1. Observe engine.log
2.
3.

Actual results:
A huge amount of lines like the following:
2016-08-16 13:09:12,149 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-38) [62369c0f] Fetched 1 VMs from VDS 'c6c2e51a-277b-4aaf-86bb-0099b
ad0c826'
2016-08-16 13:09:29,013 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-15) [476c44e9] Fetched 1 VMs from VDS 'c6c2e51a-277b-4aaf-86bb-0099b
ad0c826'

Those account for most of the content in engine.log :
# wc -l engine.log
54045 engine.log

# grep org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher engine.log | wc -l
45378

Expected results:
Maybe these events should be logged with DEBUG level (rather than INFO) so they are not written to engine.log with the default logging configuration.

Additional info:
These lines are being logged at INFO level:

backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/VmsStatisticsFetcher.java

 41     @Override
 42     protected void onFetchVms() {
 43         if (log.isDebugEnabled()) {
 44             logBuilder = new StringBuilder();
 45         }
 46         super.onFetchVms();
*47         log.info("Fetched {} VMs from VDS '{}'",
 48                 vdsmVms.size(),
 49                 vdsManager.getVdsId());
 50         if (log.isDebugEnabled()) {
 51             log.debug(logBuilder.toString());
 52         }
 53     }

Comment 1 Arik 2016-08-17 05:54:50 UTC

*** This bug has been marked as a duplicate of bug 1366138 ***

Comment 2 Yaniv Kaul 2016-08-17 06:43:34 UTC
(In reply to Arik from comment #1)
> 
> *** This bug has been marked as a duplicate of bug 1366138 ***

Arik - do we want to fix this in 3.6.9?

Comment 3 Arik 2016-08-17 07:06:31 UTC
(In reply to Yaniv Kaul from comment #2)
It is a really simple and easy to backport fix.
If people find it an issue with high severity then sure.

Comment 5 Michal Skrivanek 2016-08-17 07:38:22 UTC
Well, no. 3.6 is a version where this information could be most useful(reproduce known issues fixed in later versions), it's one line per each host every 15s. The impact is very low since in stable conditions the lines are exactly the same except timestamp so they are very well compressed.

Comment 6 Julio Entrena Perez 2016-08-17 08:19:09 UTC
The impact is that this, at least, doubles the size of engine.log.
In my test environment those lines account for 84% of engine.log.

(In reply to Michal Skrivanek from comment #5)
> Well, no. 3.6 is a version where this information could be most
> useful(reproduce known issues fixed in later versions)

That's the very definition of DEBUG.
What's the value of dumping all these details with INFO level?

> it's one line per
> each host every 15s.

When there are 100s of hosts several lines per second.

Comment 8 Michal Skrivanek 2016-08-18 13:12:00 UTC
(In reply to Julio Entrena Perez from comment #6)
> The impact is that this, at least, doubles the size of engine.log.
> In my test environment those lines account for 84% of engine.log.

the compressed one? or you're talking about the current engine.log. Does it cause any issues?

Comment 9 Julio Entrena Perez 2016-08-18 13:56:01 UTC
(In reply to Michal Skrivanek from comment #8)

> the compressed one? or you're talking about the current engine.log. 
The engine.log file attached to this bz.

> Does it
> cause any issues?

Yes, it creates up to 10 to 12 lines per second (depending on the size of the environment).
This results in most of the contents of engine.log being useless and results in increased disk usage.

In some circumstances it might be useful to have this information, which is why we are not requesting to get rid of those lines, but to log them with DEBUG level instead, so they are not sent to engine.log by default but can still be enabled if required.

Comment 11 Julio Entrena Perez 2016-08-22 09:16:32 UTC
http://gerrit.ovirt.org/62636 shows that we are intending to still log some of those entries at INFO level.
Could you please estimate the amount of lines what will still be logged in comparison with the actual approach? (e.g. we expect this to result in 2% of the current lines still being logged).

Comment 12 Arik 2016-08-22 11:04:48 UTC
Julio, we found these printings helpful for the investigation of severe issues that were hard to reproduce. That is why changing them to lower log level is not a good option - telling a user that got into a split-brain that happens in a typical setup once in a year "great, now please change to debug level and wait for it to reproduce so we can investigate" is problematic.

I do not think we can provide an estimation like you are asking for without further information about the setup: number of hosts, scheduling policy that is used and the frequency that VMs are being started+stopped+suspended+restored+migrated.

But in a typical setup there should be significantly less printings - currently it is printed once in every 15 sec per-host. With the http://gerrit.ovirt.org/62636 it would be once in every 15 sec per-host *and only if the number of VMs running on it is different than it used to 15sec ago*

Comment 21 Israel Pinto 2016-08-29 05:39:29 UTC
Verify with: 
Engine: rhevm-3.6.9-0.1.el6.noarch
Steps:
1. Create 3 VMs and run them on one Host.
2. Monitor engine log
3. Stop On of the VMs

Results: PASS the log in update only one time.

Output:
[root@violet-rhel yum.repos.d]# tail -f /var/log/ovirt-engine/engine.log | grep Fetched
2016-08-29 08:22:55,229 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-45) [] Fetched 3 VMs from VDS '8d1a80f2-b180-44e3-8e17-735e1a759cd7'

--- after stop VM ----
2016-08-29 08:28:46,758 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-14) [] Fetched 2 VMs from VDS '8d1a80f2-b180-44e3-8e17-735e1a759cd7'

Comment 23 errata-xmlrpc 2016-09-21 18:05:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1929.html


Note You need to log in before you can comment on or make changes to this bug.