Bug 1139688 - [scale] - "Unable to evaluate payload" when loading cluster policy
Summary: [scale] - "Unable to evaluate payload" when loading cluster policy
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.1-1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Liran Zelkha
QA Contact: Eldad Marciano
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-09 12:57 UTC by Eldad Marciano
Modified: 2016-02-10 19:38 UTC (History)
12 users (show)

Fixed In Version: org.ovirt.engine-root-3.5.0-15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 17:10:44 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
jstack logs (30.87 KB, application/octet-stream)
2014-09-15 09:51 UTC, Eldad Marciano
no flags Details
follow thread "ajp-/127.0.0.1:8702-10" (57.15 KB, application/octet-stream)
2014-09-15 13:05 UTC, Eldad Marciano
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 33513 0 master MERGED webadmin: Remove unneccessary queries from Cluster Dialog show Never
oVirt gerrit 33790 0 ovirt-engine-3.5 ABANDONED webadmin: Remove unneccessary queries from Cluster Dialog show Never
oVirt gerrit 33791 0 ovirt-engine-3.5 MERGED webadmin: Remove unneccessary queries from Cluster Dialog show Never

Description Eldad Marciano 2014-09-09 12:57:29 UTC
Description of problem:
In trying to edit the cluster (by left click > edit)
after up to 10 sec waiting i have got the following message:
"Operation Canceled
Error while executing action: A Request to the Server failed: Unable to evaluate payload"

there is nothing at the engine.log 
at the server log i found this which is correlated to "payload" [1].

my distribution is:
-3 DC's
-500 Hosts
-~7000 vms.

this is not reproduced on small setup, because of that i have tagged it as 'scale'




[1]
2014-09-09 11:59:47,781 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/webadmin].[GenericApiServlet]] (ajp-/127.0.0.1:8702-33) JBWEB000236: Servlet.service() for servlet Ge
nericApiServlet threw exception: java.lang.RuntimeException: Unable to report failure
        at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doUnexpectedFailure(AbstractRemoteServiceServlet.java:107) [gwt-servlet.jar:]
        at org.ovirt.engine.ui.frontend.server.gwt.GenericApiGWTServiceImpl.doUnexpectedFailure(GenericApiGWTServiceImpl.java:238) [frontend.jar:]
        at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:67) [gwt-servlet.jar:]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754) [jboss-servlet-api_3.0_spec.jar:1.0.2.Final-redhat-1]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec.jar:1.0.2.Final-redhat-1]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.ovirt.engine.ui.frontend.server.gwt.GwtCachingFilter.doFilter(GwtCachingFilter.java:132) [frontend.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.ovirt.engine.core.branding.BrandingFilter.doFilter(BrandingFilter.java:72) [branding.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.ovirt.engine.core.utils.servlet.LocaleFilter.doFilter(LocaleFilter.java:64) [utils.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:58) [bll.jar:]
        at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:49) [bll.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.ovirt.engine.core.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:80) [common.jar:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:512) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) [jboss-as-web.jar:7.4.0.Final-redhat-19]
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.jboss.web.rewrite.RewriteValve.invoke(RewriteValve.java:466) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:490) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:420) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb.jar:7.4.8.Final-redhat-4]
Caused by: java.lang.RuntimeException: Unable to continue
        at com.google.gwt.rpc.client.ast.RpcCommandVisitor.halt(RpcCommandVisitor.java:337) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:527) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:500) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.visit(WebModePayloadSink.java:414) [gwt-servlet.jar:]
        at com.google.gwt.rpc.client.ast.ReturnCommand.traverse(ReturnCommand.java:44) [gwt-servlet.jar:]
        at com.google.gwt.rpc.client.ast.RpcCommandVisitor.doAccept(RpcCommandVisitor.java:320) [gwt-servlet.jar:]
        at com.google.gwt.rpc.client.ast.RpcCommandVisitor.accept(RpcCommandVisitor.java:42) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.WebModePayloadSink.accept(WebModePayloadSink.java:890) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.RPC.streamResponse(RPC.java:472) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.RPC.invokeAndStreamResponse(RPC.java:198) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.RpcServlet.processCall(RpcServlet.java:172) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.RpcServlet.processPost(RpcServlet.java:233) [gwt-servlet.jar:]
        at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62) [gwt-servlet.jar:]
        ... 33 more
Caused by: com.google.gwt.user.client.rpc.SerializationException: Could not send data
        at com.google.gwt.rpc.server.WebModePayloadSink.send(WebModePayloadSink.java:930) [gwt-servlet.jar:]
        at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:525) [gwt-servlet.jar:]
        ... 44 more
Caused by: ClientAbortException:  java.net.SocketException: Connection reset
        at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:426) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:356) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:449) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:438) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:90) [jbossweb.jar:7.4.8.Final-redhat-4]
        at com.google.gwt.rpc.server.WebModePayloadSink.send(WebModePayloadSink.java:928) [gwt-servlet.jar:]
        ... 45 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) [rt.jar:1.7.0_65]
        at java.net.SocketOutputStream.write(SocketOutputStream.java:159) [rt.jar:1.7.0_65]
        at org.apache.coyote.ajp.AjpProcessor$SocketOutputBuffer.doWrite(AjpProcessor.java:1330) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.coyote.Response.doWrite(Response.java:594) [jbossweb.jar:7.4.8.Final-redhat-4]
        at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:421) [jbossweb.jar:7.4.8.Final-redhat-4]
        ... 50 more

Version-Release number of selected component (if applicable):
3.4.2

How reproducible:
95%

Steps to Reproduce:
1. scale setup (500 hosts, ~7000 VMS)
2. left click at the web-admin on cluster tab.

Actual results:
getting a "Operation Canceled" message.

Expected results:
gets the edit window as well.

Additional info:

Comment 1 Liran Zelkha 2014-09-15 08:36:39 UTC
Please give much more info:
1. How many simulatenous users
2. Attach 3 jstack logs for server
3. Attach full engine.log

Comment 2 Eldad Marciano 2014-09-15 09:36:14 UTC
Liran from some reason, at some point nothing writing to server log
there is no traffic at this log since 10-Sep.

Once i reproduced the bug i wanted to add the server log since we saw the error \ exception there

this bug might be side-affect of another issue, to me it looks like resources lack.

please advise.

Comment 3 Eldad Marciano 2014-09-15 09:51:31 UTC
Created attachment 937512 [details]
jstack logs

Comment 4 Eldad Marciano 2014-09-15 13:05:01 UTC
Created attachment 937605 [details]
follow thread "ajp-/127.0.0.1:8702-10"

Comment 5 Liran Zelkha 2014-09-15 18:50:12 UTC
Following the thread dump, it seems all your HTTP threads are blocking on 0x00000007025f87d0, which is a lock by RollingFile of the Log4J infrastructure we use. The weird thing is that no thread is locking that file.
I suggest the following:
1. Please enclose the full log.
2. Please check the free file handles and free size of your disk.
3. I'll check if there are issues with our log4j

Comment 6 Liran Zelkha 2014-09-16 05:50:11 UTC
The log you sent has data in Sep-15, so I'm not following you.
It seems like a network issue from the server to your browser...
Is it always reproducible, and only for this specific action?

Comment 7 Eldad Marciano 2014-09-16 14:07:20 UTC
(In reply to Liran Zelkha from comment #6)
> The log you sent has data in Sep-15, so I'm not following you.
> It seems like a network issue from the server to your browser...
> Is it always reproducible, and only for this specific action?

Yep.

Comment 8 Liran Zelkha 2014-09-17 07:30:04 UTC
Are the jstack files from exactly when you activate this scenario? Because it doesn't show anything related...

Comment 9 Eldad Marciano 2014-09-17 12:37:57 UTC
not correlated to description.
because of that i comment the jstcks files to "follow thread "ajp-/127.0.0.1:8702-10"

Comment 10 Liran Zelkha 2014-09-23 12:49:10 UTC
It seems like there are 2 DB queries that execute once the Edit Cluster option is selected:
SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name LIKE 'PerfCluster' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647

SELECT * FROM ((SELECT distinct vds.* FROM  vds   WHERE  vds.vds_group_name LIKE 'PerfCluster' )  ORDER BY vds_name,vds_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647

These are heavy queries, and probably cause the timeout. I will check why these are called.

Comment 11 Liran Zelkha 2014-09-23 12:57:16 UTC
Eldad, can you run both queries (change the cluster name to your cluster name) and tell me how long it takes them to execute?

Comment 12 Eldad Marciano 2014-09-24 08:43:35 UTC
see the results using pgadmin:
SELECT * FROM ((SELECT distinct vms.* FROM  vms   WHERE  vms.vds_group_name LIKE 'dc_fake' )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647

results first run 32 sec, second run 26 sec.


==============================================================

SELECT * FROM ((SELECT distinct vds.* FROM  vds   WHERE  vds.vds_group_name LIKE 'dc_fake' )  ORDER BY vds_name,vds_name ASC ) as T1 OFFSET (1 -1) LIMIT 214748364

results first run 4 sec, second run 3.8 sec.

looks like using vds much more faster.

Comment 13 Eldad Marciano 2014-10-01 11:21:55 UTC
which query is it?
i want to replace it im mine 3.4 setup

Comment 14 Liran Zelkha 2014-10-01 11:24:04 UTC
The change I did is for 3.5, not 3.4. Can't be easily backported.

Comment 15 Eldad Marciano 2015-01-28 12:29:29 UTC
verified on VT13.7

Comment 16 Eyal Edri 2015-02-17 17:10:44 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.