Bug 1176528 - NPE When attempting to create a new VM from scratch
Summary: NPE When attempting to create a new VM from scratch
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Arik
QA Contact: Artyom
URL:
Whiteboard: virt
Depends On:
Blocks: rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-12-22 09:03 UTC by Allon Mureinik
Modified: 2015-02-17 08:26 UTC (History)
20 users (show)

Fixed In Version: vt13.6
Doc Type: Bug Fix
Doc Text:
Clone Of: 1176407
Environment:
Last Closed: 2015-02-17 08:26:31 UTC
oVirt Team: ---


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 36346 master MERGED core: fix NPE on addition of external vm Never
oVirt gerrit 36348 ovirt-engine-3.5 MERGED core: fix NPE on addition of external vm Never

Description Allon Mureinik 2014-12-22 09:03:27 UTC
+++ This bug was initially created as a clone of Bug #1176407 +++

Description of problem:
Our automation jobs fail randomly with:
Engine:
2014-12-21 10:59:14,282 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
2014-12-21 10:59:14,284 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
2014-12-21 10:59:14,284 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
2014-12-21 10:59:14,285 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing

Server:
2014-12-21 10:43:51,129 ERROR [stderr] (Response tracker) Exception in thread "Response tracker" java.lang.NullPointerException
2014-12-21 10:43:51,130 ERROR [stderr] (Response tracker)       at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.removeRequestFromTracking(ResponseTracker.java:58)
2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker)       at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.run(ResponseTracker.java:93)
2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker)       at java.lang.Thread.run(Thread.java:745)

switching to xml-rpc solve the issue

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.26.el6ev.noarch

How reproducible:


Steps to Reproduce:
1. Automation jobs:
Port mirroring 
vNIC Profiles
Network migration

--- Additional comment from Piotr Kliczewski on 2014-12-21 18:13:08 IST ---

I checked vdsm logs and this is the issue found on vdsm:

Thread-2882::ERROR::2014-12-19 23:08:33,431::API::1499::vds::(_rollback) Cannot delete network u'sw162': It doesn't exist in the system
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1497, in _rollback
    yield rollbackCtx
  File "/usr/share/vdsm/API.py", line 1385, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (27, "Cannot delete network u'sw162': It doesn't exist in the system")


which causes error on the engine side.

I found additional issue in the engine logs which seems to be unrelated to networking issue:

2014-12-19 21:55:19,167 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back
2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]

--- Additional comment from Oved Ourfali on 2014-12-22 09:28:15 IST ---

Dan - can you look at this as well?

--- Additional comment from Piotr Kliczewski on 2014-12-22 10:27:07 IST ---

I pushed a patch to remove side effect (NPE) of setupNetwork failure. If test case covers the failure then we have a fix for this bug.

We need to understand nature of setupNetworks failure.
===============================================================================
During the analysis of bug 1176407, an NPE was discovered when adding a VM from scratch, which seems unrelated to the original issue in the BZ.

If it's a real issue, though, it's probably important (if not critical) to fix.

Comment 1 Allon Mureinik 2014-12-22 09:07:37 UTC
Arik, the only thing I can see that looks remotely connected to this NPE is your patch 6fd0381949be01270a881f6a2466029e4dd2a50e (That's a RHEVM commit hash. Change-Id is Idd9604c7a219e49d140576a66c9b15e9abb34d79) which was introduced in vt8 to solve bug 1149135.

Can you please take a look and either confirm (and fix) or refute (in which case, feel free to change the whiteboard if you don't think this is a virt issue).

Comment 2 Allon Mureinik 2014-12-22 09:17:13 UTC
The relevant stack trace:

2014-12-19 21:55:19,167 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back
2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]

Comment 3 Allon Mureinik 2014-12-22 09:18:30 UTC
The NPE happens in the following line:

Guid diskOperatorId = diskOperatorIdFromParams != null ? diskOperatorIdFromParams : getCurrentUser().getId();

The only way this can produce an NPE is if getCurrentUser() is null, and I'm not sure what flow leads there.

Comment 4 Arik 2014-12-22 11:17:27 UTC
The AddVmFromScratch command is executed as internal command, thus eliminates the possibility that it was trigger from REST-API. The only places which invokes it as internal commands are:
1. when adding VM and attach it to pool - I don't see it in the logs
2. when the monitoring adds external VM

It makes sense that #2 is the flow in this case, as in this flow getCurrentUser is indeed null.
I'll add a null-check to preserve the null-safety of the previous code before my change.

Comment 5 Michal Skrivanek 2014-12-23 09:29:17 UTC
missing backports

Comment 6 Artyom 2015-01-07 09:20:30 UTC
Hi Arik can you provide me a best way to verified this bug?

Comment 7 Arik 2015-01-07 10:56:21 UTC
(In reply to Artyom from comment #6)
It can be easily verified:
1. run a VM in some environment
2. add the host the VM runs on to other environment (while the VM is running)

If the VM is added as 'external' VM in the 'other' environment and no NPE exception appears in the log then it is verified.

Comment 8 Artyom 2015-01-07 16:30:07 UTC
Thanks Arik.
Verified on rhevm-3.5.0-0.28.el6ev.noarch(both engines) and host vdsm-4.16.8.1-4.el7ev.x86_64
Vm added as external and engine.log don't have NullPointerExceptions only some errors:
2015-01-07 18:23:12,736 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) Connecting to alma05.qa.lab.tlv.redhat.com/10.35.117.28
2015-01-07 18:23:12,741 WARN  [org.ovirt.vdsm.jsonrpc.client.utils.retry.Retryable] (SSL Stomp Reactor) Retry failed
2015-01-07 18:23:12,742 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (org.ovirt.thread.pool-7-thread-37) Exception during connection
2015-01-07 18:23:12,743 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-7-thread-50) [76fd6f56] java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request
2015-01-07 18:23:12,746 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-7-thread-50) [76fd6f56] Command TimeBoundPollVDSCommand(HostName = alma05.qa.lab.tlv.redhat.com, HostId = 0488bdc6-4a4e-4d81-9311-ed1a7c25e701) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request

Comment 9 Omer Frenkel 2015-02-17 08:26:31 UTC
RHEV-M 3.5.0 has been released


Note You need to log in before you can comment on or make changes to this bug.