1367519 – VmsStatisticsFetcher excessive logging in engine.log (clone of bug 1366138 for 3.6.9)

Bug 1367519 - VmsStatisticsFetcher excessive logging in engine.log (clone of bug 1366138 for 3.6.9)

Summary: VmsStatisticsFetcher excessive logging in engine.log (clone of bug 1366138 fo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.6.7
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-3.6.9
Target Release:	---
Assignee:	Arik
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-16 15:48 UTC by Julio Entrena Perez
Modified:	2022-07-09 09:30 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	Previously, recently added logs that printed the amount of virtual machines running on a host were excessively written to the log file. Now, the frequency of these log have been reduced by printing them only upon a change in the number of virtual machines running on the host.
Clone Of:
Environment:
Last Closed:	2016-09-21 18:05:52 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-47383	None	None	None	2022-07-09 09:30:29 UTC
Red Hat Knowledge Base (Solution)	2532251	None	None	None	2016-08-16 16:00:10 UTC
Red Hat Product Errata	RHSA-2016:1929	normal	SHIPPED_LIVE	Moderate: Red Hat Virtualization Manager (RHV) bug fix 3.6.9	2016-09-21 21:57:10 UTC
oVirt gerrit	62636	ovirt-engine-3.6	MERGED	core: log num of running vms only upon change	2016-08-22 12:19:13 UTC

Description Julio Entrena Perez 2016-08-16 15:48:20 UTC

Description of problem:
A huge amount of lines from VmsStatisticsFetcher are continuously logged to engine.log under normal operation.
This vastly increases the size of the log files and makes checking its contents frustrating.

Version-Release number of selected component (if applicable):
rhevm-3.6.7.5-0.1.el6

How reproducible:
Always

Steps to Reproduce:
1. Observe engine.log
2.
3.

Actual results:
A huge amount of lines like the following:
2016-08-16 13:09:12,149 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-38) [62369c0f] Fetched 1 VMs from VDS 'c6c2e51a-277b-4aaf-86bb-0099b
ad0c826'
2016-08-16 13:09:29,013 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-15) [476c44e9] Fetched 1 VMs from VDS 'c6c2e51a-277b-4aaf-86bb-0099b
ad0c826'

Those account for most of the content in engine.log :
# wc -l engine.log
54045 engine.log

# grep org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher engine.log | wc -l
45378

Expected results:
Maybe these events should be logged with DEBUG level (rather than INFO) so they are not written to engine.log with the default logging configuration.

Additional info:
These lines are being logged at INFO level:

backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/VmsStatisticsFetcher.java

 41     @Override
 42     protected void onFetchVms() {
 43         if (log.isDebugEnabled()) {
 44             logBuilder = new StringBuilder();
 45         }
 46         super.onFetchVms();
*47         log.info("Fetched {} VMs from VDS '{}'",
 48                 vdsmVms.size(),
 49                 vdsManager.getVdsId());
 50         if (log.isDebugEnabled()) {
 51             log.debug(logBuilder.toString());
 52         }
 53     }

Comment 1 Arik 2016-08-17 05:54:50 UTC


*** This bug has been marked as a duplicate of bug 1366138 ***

Comment 2 Yaniv Kaul 2016-08-17 06:43:34 UTC

(In reply to Arik from comment #1)
> 
> *** This bug has been marked as a duplicate of bug 1366138 ***

Arik - do we want to fix this in 3.6.9?

Comment 3 Arik 2016-08-17 07:06:31 UTC

(In reply to Yaniv Kaul from comment #2)
It is a really simple and easy to backport fix.
If people find it an issue with high severity then sure.

Comment 5 Michal Skrivanek 2016-08-17 07:38:22 UTC

Well, no. 3.6 is a version where this information could be most useful(reproduce known issues fixed in later versions), it's one line per each host every 15s. The impact is very low since in stable conditions the lines are exactly the same except timestamp so they are very well compressed.

Comment 6 Julio Entrena Perez 2016-08-17 08:19:09 UTC

The impact is that this, at least, doubles the size of engine.log.
In my test environment those lines account for 84% of engine.log.

(In reply to Michal Skrivanek from comment #5)
> Well, no. 3.6 is a version where this information could be most
> useful(reproduce known issues fixed in later versions)

That's the very definition of DEBUG.
What's the value of dumping all these details with INFO level?

> it's one line per
> each host every 15s.

When there are 100s of hosts several lines per second.

Comment 8 Michal Skrivanek 2016-08-18 13:12:00 UTC

(In reply to Julio Entrena Perez from comment #6)
> The impact is that this, at least, doubles the size of engine.log.
> In my test environment those lines account for 84% of engine.log.

the compressed one? or you're talking about the current engine.log. Does it cause any issues?

Comment 9 Julio Entrena Perez 2016-08-18 13:56:01 UTC

(In reply to Michal Skrivanek from comment #8)

> the compressed one? or you're talking about the current engine.log. 
The engine.log file attached to this bz.

> Does it
> cause any issues?

Yes, it creates up to 10 to 12 lines per second (depending on the size of the environment).
This results in most of the contents of engine.log being useless and results in increased disk usage.

In some circumstances it might be useful to have this information, which is why we are not requesting to get rid of those lines, but to log them with DEBUG level instead, so they are not sent to engine.log by default but can still be enabled if required.

Comment 11 Julio Entrena Perez 2016-08-22 09:16:32 UTC

http://gerrit.ovirt.org/62636 shows that we are intending to still log some of those entries at INFO level.
Could you please estimate the amount of lines what will still be logged in comparison with the actual approach? (e.g. we expect this to result in 2% of the current lines still being logged).

Comment 12 Arik 2016-08-22 11:04:48 UTC

Julio, we found these printings helpful for the investigation of severe issues that were hard to reproduce. That is why changing them to lower log level is not a good option - telling a user that got into a split-brain that happens in a typical setup once in a year "great, now please change to debug level and wait for it to reproduce so we can investigate" is problematic.

I do not think we can provide an estimation like you are asking for without further information about the setup: number of hosts, scheduling policy that is used and the frequency that VMs are being started+stopped+suspended+restored+migrated.

But in a typical setup there should be significantly less printings - currently it is printed once in every 15 sec per-host. With the http://gerrit.ovirt.org/62636 it would be once in every 15 sec per-host *and only if the number of VMs running on it is different than it used to 15sec ago*

Comment 21 Israel Pinto 2016-08-29 05:39:29 UTC

Verify with: 
Engine: rhevm-3.6.9-0.1.el6.noarch
Steps:
1. Create 3 VMs and run them on one Host.
2. Monitor engine log
3. Stop On of the VMs

Results: PASS the log in update only one time.

Output:
[root@violet-rhel yum.repos.d]# tail -f /var/log/ovirt-engine/engine.log | grep Fetched
2016-08-29 08:22:55,229 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-45) [] Fetched 3 VMs from VDS '8d1a80f2-b180-44e3-8e17-735e1a759cd7'

--- after stop VM ----
2016-08-29 08:28:46,758 INFO  [org.ovirt.engine.core.vdsbroker.VmsStatisticsFetcher] (DefaultQuartzScheduler_Worker-14) [] Fetched 2 VMs from VDS '8d1a80f2-b180-44e3-8e17-735e1a759cd7'

Comment 23 errata-xmlrpc 2016-09-21 18:05:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1929.html

Note You need to log in before you can comment on or make changes to this bug.