+++ This bug was initially created as a clone of Bug #1176407 +++ Description of problem: Our automation jobs fail randomly with: Engine: 2014-12-21 10:59:14,282 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 2014-12-21 10:59:14,284 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 2014-12-21 10:59:14,284 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages 2014-12-21 10:59:14,285 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing Server: 2014-12-21 10:43:51,129 ERROR [stderr] (Response tracker) Exception in thread "Response tracker" java.lang.NullPointerException 2014-12-21 10:43:51,130 ERROR [stderr] (Response tracker) at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.removeRequestFromTracking(ResponseTracker.java:58) 2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker) at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.run(ResponseTracker.java:93) 2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker) at java.lang.Thread.run(Thread.java:745) switching to xml-rpc solve the issue Version-Release number of selected component (if applicable): rhevm-3.5.0-0.26.el6ev.noarch How reproducible: Steps to Reproduce: 1. Automation jobs: Port mirroring vNIC Profiles Network migration --- Additional comment from Piotr Kliczewski on 2014-12-21 18:13:08 IST --- I checked vdsm logs and this is the issue found on vdsm: Thread-2882::ERROR::2014-12-19 23:08:33,431::API::1499::vds::(_rollback) Cannot delete network u'sw162': It doesn't exist in the system Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1497, in _rollback yield rollbackCtx File "/usr/share/vdsm/API.py", line 1385, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) ConfigNetworkError: (27, "Cannot delete network u'sw162': It doesn't exist in the system") which causes error on the engine side. I found additional issue in the engine logs which seems to be unrelated to networking issue: 2014-12-19 21:55:19,167 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back 2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:] at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71] --- Additional comment from Oved Ourfali on 2014-12-22 09:28:15 IST --- Dan - can you look at this as well? --- Additional comment from Piotr Kliczewski on 2014-12-22 10:27:07 IST --- I pushed a patch to remove side effect (NPE) of setupNetwork failure. If test case covers the failure then we have a fix for this bug. We need to understand nature of setupNetworks failure. =============================================================================== During the analysis of bug 1176407, an NPE was discovered when adding a VM from scratch, which seems unrelated to the original issue in the BZ. If it's a real issue, though, it's probably important (if not critical) to fix.
Arik, the only thing I can see that looks remotely connected to this NPE is your patch 6fd0381949be01270a881f6a2466029e4dd2a50e (That's a RHEVM commit hash. Change-Id is Idd9604c7a219e49d140576a66c9b15e9abb34d79) which was introduced in vt8 to solve bug 1149135. Can you please take a look and either confirm (and fix) or refute (in which case, feel free to change the whiteboard if you don't think this is a virt issue).
The relevant stack trace: 2014-12-19 21:55:19,167 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back 2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:] at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:] at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]
The NPE happens in the following line: Guid diskOperatorId = diskOperatorIdFromParams != null ? diskOperatorIdFromParams : getCurrentUser().getId(); The only way this can produce an NPE is if getCurrentUser() is null, and I'm not sure what flow leads there.
The AddVmFromScratch command is executed as internal command, thus eliminates the possibility that it was trigger from REST-API. The only places which invokes it as internal commands are: 1. when adding VM and attach it to pool - I don't see it in the logs 2. when the monitoring adds external VM It makes sense that #2 is the flow in this case, as in this flow getCurrentUser is indeed null. I'll add a null-check to preserve the null-safety of the previous code before my change.
missing backports
Hi Arik can you provide me a best way to verified this bug?
(In reply to Artyom from comment #6) It can be easily verified: 1. run a VM in some environment 2. add the host the VM runs on to other environment (while the VM is running) If the VM is added as 'external' VM in the 'other' environment and no NPE exception appears in the log then it is verified.
Thanks Arik. Verified on rhevm-3.5.0-0.28.el6ev.noarch(both engines) and host vdsm-4.16.8.1-4.el7ev.x86_64 Vm added as external and engine.log don't have NullPointerExceptions only some errors: 2015-01-07 18:23:12,736 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) Connecting to alma05.qa.lab.tlv.redhat.com/10.35.117.28 2015-01-07 18:23:12,741 WARN [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) Retry failed 2015-01-07 18:23:12,742 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (org.ovirt.thread.pool-7-thread-37) Exception during connection 2015-01-07 18:23:12,743 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-7-thread-50) [76fd6f56] java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request 2015-01-07 18:23:12,746 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-7-thread-50) [76fd6f56] Command TimeBoundPollVDSCommand(HostName = alma05.qa.lab.tlv.redhat.com, HostId = 0488bdc6-4a4e-4d81-9311-ed1a7c25e701) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request
RHEV-M 3.5.0 has been released