Description of problem: In trying to edit the cluster (by left click > edit) after up to 10 sec waiting i have got the following message: "Operation Canceled Error while executing action: A Request to the Server failed: Unable to evaluate payload" there is nothing at the engine.log at the server log i found this which is correlated to "payload" [1]. my distribution is: -3 DC's -500 Hosts -~7000 vms. this is not reproduced on small setup, because of that i have tagged it as 'scale' [1] 2014-09-09 11:59:47,781 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/ovirt-engine/webadmin].[GenericApiServlet]] (ajp-/127.0.0.1:8702-33) JBWEB000236: Servlet.service() for servlet Ge nericApiServlet threw exception: java.lang.RuntimeException: Unable to report failure at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doUnexpectedFailure(AbstractRemoteServiceServlet.java:107) [gwt-servlet.jar:] at org.ovirt.engine.ui.frontend.server.gwt.GenericApiGWTServiceImpl.doUnexpectedFailure(GenericApiGWTServiceImpl.java:238) [frontend.jar:] at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:67) [gwt-servlet.jar:] at javax.servlet.http.HttpServlet.service(HttpServlet.java:754) [jboss-servlet-api_3.0_spec.jar:1.0.2.Final-redhat-1] at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec.jar:1.0.2.Final-redhat-1] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.ovirt.engine.ui.frontend.server.gwt.GwtCachingFilter.doFilter(GwtCachingFilter.java:132) [frontend.jar:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.ovirt.engine.core.branding.BrandingFilter.doFilter(BrandingFilter.java:72) [branding.jar:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.ovirt.engine.core.utils.servlet.LocaleFilter.doFilter(LocaleFilter.java:64) [utils.jar:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:58) [bll.jar:] at org.ovirt.engine.core.bll.AutomaticLoginFilter.doFilter(AutomaticLoginFilter.java:49) [bll.jar:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.ovirt.engine.core.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:80) [common.jar:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:512) [jbossweb.jar:7.4.8.Final-redhat-4] at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) [jboss-as-web.jar:7.4.0.Final-redhat-19] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:145) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb.jar:7.4.8.Final-redhat-4] at org.jboss.web.rewrite.RewriteValve.invoke(RewriteValve.java:466) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:490) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:420) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb.jar:7.4.8.Final-redhat-4] Caused by: java.lang.RuntimeException: Unable to continue at com.google.gwt.rpc.client.ast.RpcCommandVisitor.halt(RpcCommandVisitor.java:337) [gwt-servlet.jar:] at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:527) [gwt-servlet.jar:] at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:500) [gwt-servlet.jar:] at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.visit(WebModePayloadSink.java:414) [gwt-servlet.jar:] at com.google.gwt.rpc.client.ast.ReturnCommand.traverse(ReturnCommand.java:44) [gwt-servlet.jar:] at com.google.gwt.rpc.client.ast.RpcCommandVisitor.doAccept(RpcCommandVisitor.java:320) [gwt-servlet.jar:] at com.google.gwt.rpc.client.ast.RpcCommandVisitor.accept(RpcCommandVisitor.java:42) [gwt-servlet.jar:] at com.google.gwt.rpc.server.WebModePayloadSink.accept(WebModePayloadSink.java:890) [gwt-servlet.jar:] at com.google.gwt.rpc.server.RPC.streamResponse(RPC.java:472) [gwt-servlet.jar:] at com.google.gwt.rpc.server.RPC.invokeAndStreamResponse(RPC.java:198) [gwt-servlet.jar:] at com.google.gwt.rpc.server.RpcServlet.processCall(RpcServlet.java:172) [gwt-servlet.jar:] at com.google.gwt.rpc.server.RpcServlet.processPost(RpcServlet.java:233) [gwt-servlet.jar:] at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62) [gwt-servlet.jar:] ... 33 more Caused by: com.google.gwt.user.client.rpc.SerializationException: Could not send data at com.google.gwt.rpc.server.WebModePayloadSink.send(WebModePayloadSink.java:930) [gwt-servlet.jar:] at com.google.gwt.rpc.server.WebModePayloadSink$PayloadVisitor.commit(WebModePayloadSink.java:525) [gwt-servlet.jar:] ... 44 more Caused by: ClientAbortException: java.net.SocketException: Connection reset at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:426) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:356) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:449) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:438) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:90) [jbossweb.jar:7.4.8.Final-redhat-4] at com.google.gwt.rpc.server.WebModePayloadSink.send(WebModePayloadSink.java:928) [gwt-servlet.jar:] ... 45 more Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) [rt.jar:1.7.0_65] at java.net.SocketOutputStream.write(SocketOutputStream.java:159) [rt.jar:1.7.0_65] at org.apache.coyote.ajp.AjpProcessor$SocketOutputBuffer.doWrite(AjpProcessor.java:1330) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.coyote.Response.doWrite(Response.java:594) [jbossweb.jar:7.4.8.Final-redhat-4] at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:421) [jbossweb.jar:7.4.8.Final-redhat-4] ... 50 more Version-Release number of selected component (if applicable): 3.4.2 How reproducible: 95% Steps to Reproduce: 1. scale setup (500 hosts, ~7000 VMS) 2. left click at the web-admin on cluster tab. Actual results: getting a "Operation Canceled" message. Expected results: gets the edit window as well. Additional info:
Please give much more info: 1. How many simulatenous users 2. Attach 3 jstack logs for server 3. Attach full engine.log
Liran from some reason, at some point nothing writing to server log there is no traffic at this log since 10-Sep. Once i reproduced the bug i wanted to add the server log since we saw the error \ exception there this bug might be side-affect of another issue, to me it looks like resources lack. please advise.
Created attachment 937512 [details] jstack logs
Created attachment 937605 [details] follow thread "ajp-/127.0.0.1:8702-10"
Following the thread dump, it seems all your HTTP threads are blocking on 0x00000007025f87d0, which is a lock by RollingFile of the Log4J infrastructure we use. The weird thing is that no thread is locking that file. I suggest the following: 1. Please enclose the full log. 2. Please check the free file handles and free size of your disk. 3. I'll check if there are issues with our log4j
The log you sent has data in Sep-15, so I'm not following you. It seems like a network issue from the server to your browser... Is it always reproducible, and only for this specific action?
(In reply to Liran Zelkha from comment #6) > The log you sent has data in Sep-15, so I'm not following you. > It seems like a network issue from the server to your browser... > Is it always reproducible, and only for this specific action? Yep.
Are the jstack files from exactly when you activate this scenario? Because it doesn't show anything related...
not correlated to description. because of that i comment the jstcks files to "follow thread "ajp-/127.0.0.1:8702-10"
It seems like there are 2 DB queries that execute once the Edit Cluster option is selected: SELECT * FROM ((SELECT distinct vms.* FROM vms WHERE vms.vds_group_name LIKE 'PerfCluster' ) ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647 SELECT * FROM ((SELECT distinct vds.* FROM vds WHERE vds.vds_group_name LIKE 'PerfCluster' ) ORDER BY vds_name,vds_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647 These are heavy queries, and probably cause the timeout. I will check why these are called.
Eldad, can you run both queries (change the cluster name to your cluster name) and tell me how long it takes them to execute?
see the results using pgadmin: SELECT * FROM ((SELECT distinct vms.* FROM vms WHERE vms.vds_group_name LIKE 'dc_fake' ) ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 2147483647 results first run 32 sec, second run 26 sec. ============================================================== SELECT * FROM ((SELECT distinct vds.* FROM vds WHERE vds.vds_group_name LIKE 'dc_fake' ) ORDER BY vds_name,vds_name ASC ) as T1 OFFSET (1 -1) LIMIT 214748364 results first run 4 sec, second run 3.8 sec. looks like using vds much more faster.
which query is it? i want to replace it im mine 3.4 setup
The change I did is for 3.5, not 3.4. Can't be easily backported.
verified on VT13.7
rhev 3.5.0 was released. closing.