This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1270725 - [Cinder] Stateless VM fails to start with NullPointerException, operation does not rollback
[Cinder] Stateless VM fails to start with NullPointerException, operation doe...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.2
: 3.6.2
Assigned To: Maor
Natalie Gavrielov
:
: 1270726 (view as bug list)
Depends On: 1305809
Blocks: 1135132 1288157
  Show dependency treegraph
 
Reported: 2015-10-12 05:21 EDT by Ori Gofen
Modified: 2016-02-23 08:32 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-23 08:32:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: ovirt‑3.6.z?
rule-engine: planning_ack?
tnisan: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
log (4.55 MB, text/plain)
2015-10-12 05:21 EDT, Ori Gofen
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 49677 master MERGED core: Update executionContext with Steps. Never
oVirt gerrit 49680 master MERGED core: Use execution context when calling parent command. Never
oVirt gerrit 49949 ovirt-engine-3.6 MERGED core: Update executionContext with Steps. Never
oVirt gerrit 49950 ovirt-engine-3.6 MERGED core: Use execution context when calling parent command. Never

  None (edit)
Description Ori Gofen 2015-10-12 05:21:15 EDT
Created attachment 1081926 [details]
log

Description of problem:
Cinder with ceph backend is not yet support a stateless VM functionality.
When attaching a cinder disk to a stateless VM and starting it, the operation fails with NullPointer:
"2015-10-12 11:56:07,020 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler_Worker-6) [] Lock freed to object 'EngineLock:{exclusiveLock
s='[a0dcfb3d-591e-47e1-8dba-c6353fbd9cd5=<VM, ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM$VmName n_vm>]', sharedLocks='null'}'
2015-10-12 11:56:07,150 ERROR [org.ovirt.engine.core.bll.tasks.CommandExecutor] (DefaultQuartzScheduler_Worker-6) [] Error invoking callback method 'onSucceeded' for 'SUCCEED
ED' command '1f191124-b705-45c3-843a-48991d879e5f'
2015-10-12 11:56:07,151 ERROR [org.ovirt.engine.core.bll.tasks.CommandExecutor] (DefaultQuartzScheduler_Worker-6) [] Exception: javax.ejb.EJBException: java.lang.NullPointerE
xception
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:216) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:268) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:377) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:246) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.2.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.5.3.Final-redhat-2]"

Webadmin's message is wrong:
"VM n_vm was run as stateless with one or more of disks that do not allow snapshots"

Cinder with ceph backend does support snapshots thus, this operation should be successful.
After the operation failed I have removed the VM, cinder reported on one redundant "leftover" snapshot.

[root@ogofen-cinder ~(keystone_admin)]# cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+--------------+------+
|                  ID                  |              Volume ID               |   Status  | Display Name | Size |
+--------------------------------------+--------------------------------------+-----------+--------------+------+
| b2a5029b-07b3-43b8-8072-bc0feee26085 | fe7b1961-a714-464a-9606-cbfa4bd785fd | available |     None     |  8   |
+--------------------------------------+--------------------------------------+-----------+--------------+------+

Version-Release number of selected component (if applicable):
rhevm-3.6-14

How reproducible:
100%

Steps to Reproduce:
1.have a powered off VM with one cinder disk
2.edit VM as stateless
3.attach 2 cinder disk to the VM and launch the VM
4.after failure attempt to remove the VM and it's disks

Actual results:
stateless VM is not supported, the operation described above ends with unused ceph volume

Expected results:
operation successful

Additional info:
Comment 1 Oved Ourfali 2015-10-12 05:34:44 EDT
Please open storage backend bugs n BLL.NETWORK.Storage component.
Comment 2 Allon Mureinik 2015-10-12 06:32:30 EDT
*** Bug 1270726 has been marked as a duplicate of this bug. ***
Comment 3 Red Hat Bugzilla Rules Engine 2015-10-19 06:57:39 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 4 Yaniv Lavi (Dary) 2015-10-29 08:40:15 EDT
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
Comment 5 Maor 2015-12-03 12:17:17 EST
It looks like the NPE is caused since the CoCo infrastructure does not pass steps to the child commands at job/ExecutionHandler.
Moving this to infra and open a separate bug on the flow of stateless VM with Cinder disk.
Comment 6 Sandro Bonazzola 2015-12-23 08:44:37 EST
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA
Comment 7 Jiri Belka 2016-01-19 08:18:54 EST
(In reply to Ori Gofen from comment #0)

> Steps to Reproduce:
> 1.have a powered off VM with one cinder disk
> 2.edit VM as stateless
> 3.attach 2 cinder disk to the VM and launch the VM
> 4.after failure attempt to remove the VM and it's disks
> 
> Actual results:
> stateless VM is not supported, the operation described above ends with
> unused ceph volume
> 
> Expected results:
> operation successful

Could you please help me to understand the steps?

In step '3' it should fail, should it fail silently or with an info in UI? I see in audit events that my test VM was started but no failure even it is still down.

In step '4' I should be able to remove this VM with its disks, right? I can't do it as there's locked snapshot...

engine=# select snapshot_id,vm_id,status,creation_date from snapshots where vm_id = ( select vm_guid from vms where vm_name = 'test');                                                                              
             snapshot_id              |                vm_id                 | status |       creation_date        
--------------------------------------+--------------------------------------+--------+----------------------------
 e314cea1-5edc-43dc-85f7-df31de2af34d | 65e0f3bf-2334-417b-b810-3aa0101c8951 | OK     | 2016-01-19 13:41:22.226+01
 abdf93a4-e8d0-4e94-b704-82a61cd26001 | 65e0f3bf-2334-417b-b810-3aa0101c8951 | LOCKED | 2016-01-19 13:51:02.538+01
(2 rows)
Comment 8 Maor 2016-01-19 08:35:34 EST
This bug (see [1]) fixes this issue so I'm not sure if you can still reproduce this.

I recommend that this bug should be verified along with https://bugzilla.redhat.com/1288157 since both of them are using steps.

[1] https://bugzilla.redhat.com/1288157

Note You need to log in before you can comment on or make changes to this bug.