Description of problem: Since the log rotate for engine log is defined with both minsize and 'daily', it means that it has to match both - more than a day and more than minsize (20M). Since a day hasn't passed since I've installed engine, yet I'm getting communication issues with one of the hosts, my engine log keeps growing with communication exceptions. It's now at 320MB and keeps growing. The repetitive exception: 2016-03-22 11:16:55,533 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-4) [602b6a1e] Command 'PollVDSCommand(HostName = lago_basic_suite_3_6_host0, VdsIdVDSComma ndParametersBase:{runAsync='true', hostId='b99902a4-92a2-4642-a6ad-ed98f389a94e'})' execution failed: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Conne ction issues during send request 2016-03-22 11:16:55,533 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [7fdd752] Connecting to /192.168.200.6 2016-03-22 11:16:58,531 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-4) [602b6a1e] Error: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connec tion issues during send request 2016-03-22 11:16:58,531 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-4) [602b6a1e] Exception: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vd sbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request at java.util.concurrent.FutureTask.report(FutureTask.java:122) [rt.jar:1.7.0_79] at java.util.concurrent.FutureTask.get(FutureTask.java:202) [rt.jar:1.7.0_79] at org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:72) [vdsbroker.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworkPoller.poll(HostSetupNetworkPoller.java:30) [bll.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.invokeSetupNetworksCommand(HostSetupNetworksCommand.java:423) [bll.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:256) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1215) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1359) [bll.jar:] ... Version-Release number of selected component (if applicable): [root@lago_basic_suite_3_6_engine ~]# rpm -qa |grep ovirt ovirt-engine-sdk-python-3.6.3.0-1.el6.noarch ovirt-setup-lib-1.0.1-1.el6.noarch ovirt-host-deploy-1.4.1-1.el6.noarch ovirt-engine-dbscripts-3.6.3.4-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.6.3.4-1.el6.noarch ovirt-engine-userportal-3.6.3.4-1.el6.noarch ovirt-engine-jboss-as-7.1.1-1.el6.x86_64 ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6.noarch ovirt-engine-cli-3.6.2.0-1.el6.noarch ovirt-image-uploader-3.6.0-1.el6.noarch ovirt-engine-setup-base-3.6.3.4-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.6.3.4-1.el6.noarch ovirt-engine-wildfly-8.2.1-1.el6.x86_64 ovirt-vmconsole-proxy-1.0.0-1.el6.noarch ovirt-engine-websocket-proxy-3.6.3.4-1.el6.noarch ebay-cors-filter-1.0.1-0.1.ovirt.el6.noarch ovirt-engine-setup-3.6.3.4-1.el6.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.3.4-1.el6.noarch ovirt-engine-restapi-3.6.3.4-1.el6.noarch ovirt-engine-tools-3.6.3.4-1.el6.noarch ovirt-engine-webadmin-portal-3.6.3.4-1.el6.noarch ovirt-engine-lib-3.6.3.4-1.el6.noarch ovirt-engine-extensions-api-impl-3.6.3.4-1.el6.noarch ovirt-iso-uploader-3.6.0-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.6.3.4-1.el6.noarch ovirt-host-deploy-java-1.4.1-1.el6.noarch ovirt-vmconsole-1.0.0-1.el6.noarch ovirt-engine-wildfly-overlay-8.0.4-1.el6.noarch ovirt-engine-vmconsole-proxy-helper-3.6.3.4-1.el6.noarch ovirt-engine-backend-3.6.3.4-1.el6.noarch ovirt-engine-3.6.3.4-1.el6.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This is part of the setup networks. I guess they can hide the exception, if needed. Moving to network.
I'm also seeing: 2016-03-27 05:33:32,409 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-10-thread-2) [705b4264] Error: null 2016-03-27 05:33:32,410 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-10-thread-2) [705b4264] Exception: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc.<init>(StatusForXmlRpc.java:20) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc.<init>(StatusOnlyReturnForXmlRpc.java:13) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.FutureVDSCommand.get(FutureVDSCommand.java:66) [vdsbroker.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworkPoller.poll(HostSetupNetworkPoller.java:31) [bll.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.invokeSetupNetworksCommand(HostSetupNetworksCommand.java:433) [bll.jar:] at org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand.executeCommand(HostSetupNetworksCommand.java:294) [bll.jar:]
Above causes host installation to fail. Setting as High and AutomationBlocker as it affects testing in Lago.
Hi, I need a help with deciding how to resolve this bug. As one of comments were private, I'm sticking to that policy as well. There are three exceptions mentioned in comments. All of them are nothing new, but I think first two (in order I mention them) should be fixed. Log size is influenced by error in poller, which was fixed by https://gerrit.ovirt.org/#/c/54644/ and backported to 3.6 and 3.6.5 • comment 4: Exception text from this command: "Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues" leads to FutureVDSCommand. Any(literally) exception catched here except for TimeoutException is only logged and (probably null) return value is returned. If this is source of excessive logging (I think it is), then it needs to be refactored and it affects whole project. It's not possible to 'swallow' this exception in Poller (network code), as it's logged by superclass FutureVDSCommand. And that is reached via calling FutureVDSCommandType.Poll. • source of NPE exception from comment 2: it's very same issue as comment 4. 'Something bad' happens and FutureVDSCommand logs it. We cannot influence it in poller. 'Something bad' here probably means, that response for FutureVDSCommandType.Poll does not contain message key in innerMap parameter. See: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc#StatusForXmlRpc(java.util.Map<java.lang.String,java.lang.Object>) • and finally exception from comment 1: it's very same issue as comment 4. 'Something bad' happens and FutureVDSCommand logs it. We cannot influence it in poller. 'Something bad' here probably means, that getBroker().poll() failed with some exception. That's probably ok, as there might be some error during run of HostSetupNetworksCommand
(In reply to Martin Mucha from comment #5) > Hi, I need a help with deciding how to resolve this bug. As one of comments > were private, I'm sticking to that policy as well. > > There are three exceptions mentioned in comments. All of them are nothing > new, but I think first two (in order I mention them) should be fixed. Log > size is influenced by error in poller, which was fixed by > https://gerrit.ovirt.org/#/c/54644/ and backported to 3.6 and 3.6.5 > I think we are looking at this from the wrong direction. The question we should ask ourselves is why, while Engine is sending a setup network command to the host, the host never receives it. My feeling is that it is caused by the *2nd* disconnect (which I'm not seeing when all is well). In comment 4 , it's: JsonRpc (StompReactor)::ERROR::2016-03-28 14:51:36,220::betterAsyncore::124::vds.dispatcher::(recv) SSL error during reading data: unexpected eof - and as you can see, this is NOT happening when all is well.
*** Bug 1326847 has been marked as a duplicate of this bug. ***
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Is there any possibility of having this fixed in the 3.6.x release as well? This breaks our deployment currently, and waiting for 4.x would be a show stopper for us.
(In reply to Charlie Inglese from comment #10) > Is there any possibility of having this fixed in the 3.6.x release as well? > This breaks our deployment currently, and waiting for 4.x would be a show > stopper for us. Definitely. However, our practice is to always fix for the next release before back-porting to existing release. We will fix it promptly for 3.6.x as soon as it's completed in 4.0.
(In reply to Yaniv Kaul from comment #11) > (In reply to Charlie Inglese from comment #10) > > Is there any possibility of having this fixed in the 3.6.x release as well? > > This breaks our deployment currently, and waiting for 4.x would be a show > > stopper for us. > > Definitely. However, our practice is to always fix for the next release > before back-porting to existing release. We will fix it promptly for 3.6.x > as soon as it's completed in 4.0. But this does not imply that you wait for 4.0 release (which will be months away) until you backport to 3.6 ? this is also a showstopper for me (see BZ 1329317 ) and breaking any deployment. Thanks Sven
I just saw this was already merged in 3.6. Is there a snapshot release which I might be able to test? I need this fix asap. Thanks Sven
(In reply to Sven Kieske from comment #12) > (In reply to Yaniv Kaul from comment #11) > > (In reply to Charlie Inglese from comment #10) > > > Is there any possibility of having this fixed in the 3.6.x release as well? > > > This breaks our deployment currently, and waiting for 4.x would be a show > > > stopper for us. > > > > Definitely. However, our practice is to always fix for the next release > > before back-porting to existing release. We will fix it promptly for 3.6.x > > as soon as it's completed in 4.0. > > But this does not imply that you wait for 4.0 release (which will be months > away) > until you backport to 3.6 ? Of course not. As soon as it's in 4.0 and verified, we backport. > > this is also a showstopper for me (see BZ 1329317 ) and breaking any > deployment. How often does it happen? We see it as a race that happens from time to time and that re-installation solves it. We now suspect that IPv6 on the host causes it to be more apparent - and are looking into it as well. > > Thanks > > Sven
Hi, I redeployed today the same hosts and still get some errors. I agree it doesn't happen everytime (I was able to reinstall host from GUI), but it totally breaks automation. Here's an excerpt from engine.log: grep -A26 "Connection issues during send request" /var/log/ovirt-engine/engine.log 2016-04-27 10:49:22,859 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-77) [] ERROR, org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand, exception: Connection issues during send request, log id: 18ba111 2016-04-27 10:49:22,859 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-77) [] Exception: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcIIrsServer.getStoragePoolInfo(JsonRpcIIrsServer.java:398) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand.executeIrsBrokerCommand(GetStoragePoolInfoVDSCommand.java:31) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:161) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData.proceedStoragePoolStats(IrsProxyData.java:336) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData._updatingTimer_Elapsed(IrsProxyData.java:238) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) [:1.8.0_71] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_71] at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_71] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection timeout at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.SSLStompClient.waitForConnect(SSLStompClient.java:107) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.SSLStompClient.sendMessage(SSLStompClient.java:77) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68) [vdsbroker.jar:] ... 15 more You can find the version information regarding engine and vdsm here: https://bugzilla.redhat.com/show_bug.cgi?id=1329317#c20 (I still don't know if I'm facing 1, 2 or more bugs here). You can also find complete logs in the above mentioned BZ (tagged as the "new" logs). HTH Sven
I see from description that this BZ is related to or the same as BZ #1329317.
*** Bug 1331344 has been marked as a duplicate of this bug. ***
*** Bug 1315824 has been marked as a duplicate of this bug. ***
Verified on - 3.6.6.2-0.1.el6 and vdsm-4.17.28-0.el7ev.noarch