Bug 1358136

Summary: [LOGGING] /var/log/ovirt-engine/ui.log is growing at a rate of 16mb/second
Product: [oVirt] ovirt-engine Reporter: Gil Klein <gklein>
Component: GeneralAssignee: Vojtech Szocs <vszocs>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Novotny <pnovotny>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.6.7CC: bgraveno, bugs, gklein, mgoldboi, michal.skrivanek, mperina, oourfali, sbonazzo, vszocs
Target Milestone: ovirt-4.0.2Flags: rule-engine: ovirt-4.0.z+
mgoldboi: exception+
rule-engine: planning_ack+
rule-engine: devel_ack+
pstehlik: testing_ack+
Target Release: 4.0.2.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
This update prevents UI exceptions from repeatedly spamming the remote /var/log/ovirt-engine/ui.log after subsequent occurrences of the same error. Each remote operation is an HTTP request and consumes network resources. Any uncaught exception in the Administration Portal or User Portal is compared to the last one, and only logged remotely if it's not the same. All occurrences are still logged on the web browser.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-22 12:32:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: UX RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gil Klein 2016-07-20 07:11:24 UTC
Description of problem:
/var/log/ovirt-engine/ui.log is growing at a rate of 16mb/second on a production system

Version-Release number of selected component (if applicable):
rhevm-3.6.7.5-0.1.el6.noarch

How reproducible:
100% on this system

Steps to Reproduce:
Monitor the /var/log/ovirt-engine/ui.log log file

Actual results:
# while [ 1 ]; do du -sh ui.log; sleep 60; done
86M	ui.log
102M	ui.log
118M	ui.log
134M	ui.log


Expected results:
Logs should suppress repeating messages or rotate quickly based on file size  

Additional info:

Log snippet:

2016-07-20 10:08:20,176 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-13) [] Permutation name: 44BAC8E4AC393CDE74A1EDFCB3C555A1
2016-07-20 10:08:20,176 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-13) [] Uncaught exception: : com.google.gwt.core.client.JavaScriptException: (NS_ERROR_NOT_INITIALIZED)
 __gwt$exception: <skipped>
 result: 3253927937
 filename: https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html
 lineNumber: 17734
 columnNumber: 0
 inner: null
 data: null
 stack: kv@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17734:25
hu@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:9982:24
dwf@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17259:24
O3@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:24153:7163
hwf/c.onreadystatechange<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:16937:62
Yt@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:12103:26
au@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:21295:56
_t/<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:20646:53
:
	at Unknown.kv(Unknown Source)
	at Unknown.hu(Unknown Source)
	at Unknown.dwf(Unknown Source)
	at Unknown.O3(Unknown Source)
	at Unknown.hwf/c.onreadystatechange<(Unknown Source)
	at Unknown.Yt(Unknown Source)
	at Unknown.au(Unknown Source)
	at Unknown._t/<(Unknown Source)
	at Unknown.anonymous(Unknown Source)

Comment 2 Yaniv Kaul 2016-07-21 07:23:05 UTC
1. I assume it also affect the server CPU usage.
2. Can you install debug symbols and report the issue specifically? This bug will be used for flood protection for this mechanism, but we should also look at the specific exception.

Comment 3 Michal Skrivanek 2016-07-21 07:30:19 UTC
can you add /var/log/httpd/ssl_access_log and check it's "POST /ovirt-engine/userportal/remote_logging" calls?
If so, you can identify the client causing that, restart that browser and the flooding should stop

Comment 5 Gil Klein 2016-07-21 08:18:20 UTC
(In reply to Yaniv Kaul from comment #2)
> 1. I assume it also affect the server CPU usage.
CPU is around 25%-30% 
> 2. Can you install debug symbols and report the issue specifically? This bug
> will be used for flood protection for this mechanism, but we should also
> look at the specific exception.
Sure, here it is with debug symbols installed:

2016-07-21 11:16:18,675 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (ajp-/127.0.0.1:8702-7) [] Uncaught exception: : com.google.gwt.core.client.JavaScriptException: (NS_ERROR_NOT_INITIALIZED)
 __gwt$exception: <skipped>
 result: 3253927937
 filename: https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html
 lineNumber: 17734
 columnNumber: 0
 inner: null
 data: null
 stack: kv@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17734:25
hu@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:9982:24
dwf@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:17259:24
O3@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:24153:7163
hwf/c.onreadystatechange<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:16937:62
Yt@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:12103:26
au@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:21295:56
_t/<@https://rhevm-3.qa.lab.tlv.redhat.com/ovirt-engine/webadmin/44BAC8E4AC393CDE74A1EDFCB3C555A1.cache.html:20646:53
:
	at com.google.gwt.core.client.impl.UnloadSupport.setTimeout0(UnloadSupport.java:40) [gwt-servlet.jar:]
	at com.google.gwt.core.client.impl.Impl.setTimeout(Impl.java:285) [gwt-servlet.jar:]
	at com.google.gwt.xhr.client.XMLHttpRequest.$clearOnReadyStateChange(XMLHttpRequest.java:162) [gwt-servlet.jar:]
	at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:]
	at Unknown.hwf/c.onreadystatechange<(Unknown Source)
	at com.google.gwt.core.client.impl.Impl.apply(Impl.java:296) [gwt-servlet.jar:]
	at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:335) [gwt-servlet.jar:]
	at Unknown._t/<(Unknown Source)
	at Unknown.anonymous(Unknown Source)

Comment 6 Vojtech Szocs 2016-07-22 11:11:28 UTC
I have a fix to prevent spamming ui.log, just need to test it.

The actual error seems to be: http://stackoverflow.com/q/26781855

TL;DR seems like a Firefox bug (race condition when canceling request while the response is coming in) -- can someone verify this doesn't reproduce in Chrome or IE?

Comment 7 Vojtech Szocs 2016-07-22 13:11:06 UTC
Fix posted and verified on artificially triggered error that is always effectively the same.

Comment 8 Michal Skrivanek 2016-07-22 14:03:23 UTC
(In reply to vszocs from comment #6)
> TL;DR seems like a Firefox bug (race condition when canceling request while
> the response is coming in) -- can someone verify this doesn't reproduce in
> Chrome or IE?

Gil has the list of client IPs, worth checking with owners what browser they use

Comment 9 Vojtech Szocs 2016-07-22 14:06:59 UTC
(In reply to Michal Skrivanek from comment #8)
> Gil has the list of client IPs, worth checking with owners what browser they
> use

OK, but in any case, the stack trace doesn't contain any reference to oVirt UI code, so the actual error encountered is definitely GWT core / browser related, not application code related.

Comment 12 Pavel Novotny 2016-08-17 14:59:19 UTC
Verified in rhevm-4.0.2.6-0.1.el7ev.noarch

I managed to repeatedly throw a UI exception in Webadmin
(by frantically clicking and drag&dropping in dialog host -> Setup Host Networks). 

In ui.log file, the exception is logged only once.
In browser console, the exception is logged every time and has a counter for second and further occurrence.


ui.log:
--~-- 
2016-08-17 16:10:52,000 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-59) [] Permutation name: E99F8DCCF76495B2EE9534AA6AA5E8A0
2016-08-17 16:10:52,001 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-59) [] Uncaught exception: com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
        at java.lang.Throwable.fillInStackTrace(Throwable.java:114) [rt.jar:1.8.0_101]
        at java.lang.RuntimeException.RuntimeException(RuntimeException.java:32) [rt.jar:1.8.0_101]
--~--



browser console (Firefox 48):
--~--
Wed Aug 17 16:10:51 GMT+200 2016 remote
SEVERE: Uncaught exceptioncom.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:03 GMT+200 2016 
SEVERE: Uncaught exception (2x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:28 GMT+200 2016 
SEVERE: Uncaught exception (3x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:11:53 GMT+200 2016 
SEVERE: Uncaught exception (4x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
...
Wed Aug 17 16:14:03 GMT+200 2016 
SEVERE: Uncaught exception (5x)com.google.gwt.event.shared.UmbrellaException: Exception caught: null Operands
	at Unknown.Bv(Unknown Source)
	at Unknown.Jv(Unknown Source)
	at Unknown.D7(Unknown Source)
...
--~--

Comment 13 Vojtech Szocs 2016-08-22 16:22:43 UTC
(In reply to Pavel Novotny from comment #12)
> I managed to repeatedly throw a UI exception in Webadmin
> (by frantically clicking and drag&dropping in dialog host -> Setup Host
> Networks). 
> 
> In ui.log file, the exception is logged only once.
> In browser console, the exception is logged every time and has a counter for
> second and further occurrence.

Hm, maybe we should open a bug on that (Setup Host Networks) dialog, but thanks for confirming this fix.