Bug 1358136 - [LOGGING] /var/log/ovirt-engine/ui.log is growing at a rate of 16mb/second
Summary: [LOGGING] /var/log/ovirt-engine/ui.log is growing at a rate of 16mb/second
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.7
Hardware: All
OS: Linux
high
urgent
Target Milestone: ovirt-4.0.2
: 4.0.2.4
Assignee: Vojtech Szocs
QA Contact: Pavel Novotny
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-20 07:11 UTC by Gil Klein
Modified: 2016-08-22 16:22 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
This update prevents UI exceptions from repeatedly spamming the remote /var/log/ovirt-engine/ui.log after subsequent occurrences of the same error. Each remote operation is an HTTP request and consumes network resources. Any uncaught exception in the Administration Portal or User Portal is compared to the last one, and only logged remotely if it's not the same. All occurrences are still logged on the web browser.
Clone Of:
Environment:
Last Closed: 2016-08-22 12:32:21 UTC
oVirt Team: UX
Embargoed:
rule-engine: ovirt-4.0.z+
mgoldboi: exception+
rule-engine: planning_ack+
rule-engine: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 61237 0 master MERGED webadmin,userportal: Improve the way logging works 2016-07-26 13:48:05 UTC
oVirt gerrit 61411 0 ovirt-engine-4.0 MERGED webadmin,userportal: Improve the way logging works 2016-07-26 15:03:08 UTC
oVirt gerrit 61414 0 ovirt-engine-4.0.2 MERGED webadmin,userportal: Improve the way logging works 2016-07-26 15:59:21 UTC

Description Gil Klein 2016-07-20 07:11:24 UTC
Description of problem:
/var/log/ovirt-engine/ui.log is growing at a rate of 16mb/second on a production system

Version-Release number of selected component (if applicable):
rhevm-3.6.7.5-0.1.el6.noarch

How reproducible:
100% on this system

Steps to Reproduce:
Monitor the /var/log/ovirt-engine/ui.log log file

Actual results:
# while [ 1 ]; do du -sh ui.log; sleep 60; done
86M	ui.log
102M	ui.log
118M	ui.log
134M	ui.log


Expected results:
Logs should suppress repeating messages or rotate quickly based on file size  

Additional info:

Log snippet:

2016-07-20 10:08:20,176 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-13) [] Permutation name: 44BAC8E4AC393CDE74A1EDFCB3C555A1
2016-07-20 10:08:20,176 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-13) [] Uncaught exception: : com.google.gwt.core.client.JavaScriptException: (NS_ERROR_NOT_INITIALIZED)
 __gwt$exception: <skipped>
 result: 3253927937
 filename: https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html
 lineNumber: 17734
 columnNumber: 0
 inner: null
 data: null
 stack: kv@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17734:25
hu@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:9982:24
dwf@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17259:24
O3@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:24153:7163
hwf/c.onreadystatechange<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:16937:62
Yt@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:12103:26
au@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:21295:56
_t/<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:20646:53
:
	at Unknown.kv(Unknown Source)
	at Unknown.hu(Unknown Source)
	at Unknown.dwf(Unknown Source)
	at Unknown.O3(Unknown Source)
	at Unknown.hwf/c.onreadystatechange<(Unknown Source)
	at Unknown.Yt(Unknown Source)
	at Unknown.au(Unknown Source)
	at Unknown._t/<(Unknown Source)
	at Unknown.anonymous(Unknown Source)

Comment 2 Yaniv Kaul 2016-07-21 07:23:05 UTC
1. I assume it also affect the server CPU usage.
2. Can you install debug symbols and report the issue specifically? This bug will be used for flood protection for this mechanism, but we should also look at the specific exception.

Comment 3 Michal Skrivanek 2016-07-21 07:30:19 UTC
can you add /var/log/httpd/ssl_access_log and check it's "POST /ovirt-engine/userportal/remote_logging" calls?
If so, you can identify the client causing that, restart that browser and the flooding should stop

Comment 5 Gil Klein 2016-07-21 08:18:20 UTC
(In reply to Yaniv Kaul from comment #2)
> 1. I assume it also affect the server CPU usage.
CPU is around 25%-30% 
> 2. Can you install debug symbols and report the issue specifically? This bug
> will be used for flood protection for this mechanism, but we should also
> look at the specific exception.
Sure, here it is with debug symbols installed:

2016-07-21 11:16:18,675 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-7) [] Uncaught exception: : com.google.gwt.core.client.JavaScriptException: (NS_ERROR_NOT_INITIALIZED)
 __gwt$exception: <skipped>
 result: 3253927937
 filename: https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html
 lineNumber: 17734
 columnNumber: 0
 inner: null
 data: null
 stack: kv@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17734:25
hu@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:9982:24
dwf@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17259:24
O3@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:24153:7163
hwf/c.onreadystatechange<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:16937:62
Yt@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:12103:26
au@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:21295:56
_t/<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:20646:53
:
	at com.google.gwt.core.client.impl.UnloadSupport.setTimeout0(UnloadSupport.java:40) [gwt-servlet.jar:]
	at com.google.gwt.core.client.impl.Impl.setTimeout(Impl.java:285) [gwt-servlet.jar:]
	at com.google.gwt.xhr.client.XMLHttpRequest.$clearOnReadyStateChange(XMLHttpRequest.java:162) [gwt-servlet.jar:]
	at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:]
	at Unknown.hwf/c.onreadystatechange<(Unknown Source)
	at com.google.gwt.core.client.impl.Impl.apply(Impl.java:296) [gwt-servlet.jar:]
	at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:335) [gwt-servlet.jar:]
	at Unknown._t/<(Unknown Source)
	at Unknown.anonymous(Unknown Source)

Comment 6 Vojtech Szocs 2016-07-22 11:11:28 UTC
I have a fix to prevent spamming ui.log, just need to test it.

The actual error seems to be: http://stackoverflow.com/q/26781855

TL;DR seems like a Firefox bug (race condition when canceling request while the response is coming in) -- can someone verify this doesn't reproduce in Chrome or IE?

Comment 7 Vojtech Szocs 2016-07-22 13:11:06 UTC
Fix posted and verified on artificially triggered error that is always effectively the same.

Comment 8 Michal Skrivanek 2016-07-22 14:03:23 UTC
(In reply to vszocs from comment #6)
> TL;DR seems like a Firefox bug (race condition when canceling request while
> the response is coming in) -- can someone verify this doesn't reproduce in
> Chrome or IE?

Gil has the list of client IPs, worth checking with owners what browser they use

Comment 9 Vojtech Szocs 2016-07-22 14:06:59 UTC
(In reply to Michal Skrivanek from comment #8)
> Gil has the list of client IPs, worth checking with owners what browser they
> use

OK, but in any case, the stack trace doesn't contain any reference to oVirt UI code, so the actual error encountered is definitely GWT core / browser related, not application code related.

Comment 12 Pavel Novotny 2016-08-17 14:59:19 UTC
Verified in rhevm-4.0.2.6-0.1.el7ev.noarch

I managed to repeatedly throw a UI exception in Webadmin
(by frantically clicking and drag&dropping in dialog host -> Setup Host Networks). 

In ui.log file, the exception is logged only once.
In browser console, the exception is logged every time and has a counter for second and further occurrence.


ui.log:
--~-- 
2016-08-17 16:10:52,000 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-59) [] Permutation name: E99F8DCCF76495B2EE9534AA6AA5E8A0
2016-08-17 16:10:52,001 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-59) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
        at java.lang.Throwable.fillInStackTrace(Throwable.java:114) [rt.jar:1.8.0_101]
        at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) [rt.jar:1.8.0_101]
--~--



browser console (Firefox 48):
--~--
Wed Aug 17 16:10:51 GMT+200 2016 remote
SEVERE: Uncaught exceptioncom.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:03 GMT+200 2016 
SEVERE: Uncaught exception (2x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:28 GMT+200 2016 
SEVERE: Uncaught exception (3x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:53 GMT+200 2016 
SEVERE: Uncaught exception (4x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:14:03 GMT+200 2016 
SEVERE: Uncaught exception (5x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
--~--

Comment 13 Vojtech Szocs 2016-08-22 16:22:43 UTC
(In reply to Pavel Novotny from comment #12)
> I managed to repeatedly throw a UI exception in Webadmin
> (by frantically clicking and drag&dropping in dialog host -> Setup Host
> Networks). 
> 
> In ui.log file, the exception is logged only once.
> In browser console, the exception is logged every time and has a counter for
> second and further occurrence.

Hm, maybe we should open a bug on that (Setup Host Networks) dialog, but thanks for confirming this fix.


Note You need to log in before you can comment on or make changes to this bug.