Bug 1277558

Summary: [Cinder] creation of a snapshot with multiple cinder disks is reported as failed
Product: [oVirt] ovirt-engine Reporter: Ori Gofen <ogofen>
Component: BLL.StorageAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Ori Gofen <ogofen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0.2CC: acanan, bugs, derez, gklein, masayag, ogofen, oourfali, tnisan
Target Milestone: ovirt-3.6.1Flags: tnisan: ovirt-3.6.z?
ogofen: planning_ack?
rule-engine: devel_ack+
ogofen: testing_ack?
Target Release: 3.6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-16 12:16:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ori Gofen 2015-11-03 14:43:47 UTC
Description of problem:

Almost Every cinder command flow is using, apparently, api from externalprovider.foreman and it's not working, like never, this cause every cinder action to fail on logs, but more importantly, this causes a failure on the engine-webadmin when attempting a creation of a snapshot of a Vm with 2 cinder virtual disks(or more).


2015-11-03 16:05:51,770 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (ajp-/127.0.0.1:8702-6) [3265c76c] Lock Acquired to object 'EngineLock:{exclusiveLoc
ks='[db59cd36-3e20-4771-b7ac-37fddceca45d=<VM, ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM$VmName vm_cin>]', sharedLocks='null'}'
2015-11-03 16:05:52,374 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-12) [3265c76c] Running command: CreateAllSnapshotsFr
omVmCommand internal: false. Entities affected :  ID: db59cd36-3e20-4771-b7ac-37fddceca45d Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER
2015-11-03 16:05:52,556 INFO  [org.ovirt.engine.core.bll.storage.CreateCinderSnapshotCommand] (pool-6-thread-2) [30a04640] Running command: CreateCinderSnapshotCommand intern
al: true. Entities affected :  ID: 00000000-0000-0000-0000-000000000000 Type: StorageAction group CONFIGURE_VM_STORAGE with role type USER
2015-11-03 16:05:52,904 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-2) [] Failed to find host on any provider by host na
me 'ovirt-gofen-2.scl.lab.tlv.redhat.com'
2015-11-03 16:05:53,208 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-3) [] Failed to find host on any provider by host na
me 'ovirt-gofen-2.scl.lab.tlv.redhat.com'
2015-11-03 16:05:53,423 INFO  [org.ovirt.engine.core.bll.storage.CreateCinderSnapshotCommand] (pool-6-thread-3) [2d0d946c] Running command: CreateCinderSnapshotCommand intern
al: true. Entities affected :  ID: 00000000-0000-0000-0000-000000000000 Type: StorageAction group CONFIGURE_VM_STORAGE with role type USER
2015-11-03 16:05:54,860 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-12) [] Correlation ID: 3265c76c, Job ID:
aca8b20c-3a68-4711-b996-26245a7ea27b, Call Stack: null, Custom Event ID: -1, Message: Snapshot 't1' creation for VM 'vm_cin' was initiated by admin@internal.
2015-11-03 16:05:54,873 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (org.ovirt.thread.pool-7-thread-12) [] Lock freed to object 'EngineLock:{exclusiveLo
cks='[db59cd36-3e20-4771-b7ac-37fddceca45d=<VM, ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM$VmName vm_cin>]', sharedLocks='null'}'
2015-11-03 16:05:56,136 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-3) [] Failed to find host on any provider by host na
me 'ovirt-gofen-2.scl.lab.tlv.redhat.com'
2015-11-03 16:05:56,877 INFO  [org.ovirt.engine.core.bll.storage.CreateCinderSnapshotCommand] (DefaultQuartzScheduler_Worker-19) [2d0d946c] Ending command 'org.ovirt.engine.c
ore.bll.storage.CreateCinderSnapshotCommand' successfully.
2015-11-03 16:05:57,001 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler_Worker-19) [2d0d946c] Ending command 'org.ovirt.engine.core.
bll.CreateAllSnapshotsFromVmCommand' successfully.
2015-11-03 16:05:57,001 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler_Worker-19) [2d0d946c] There are still running CoCo tasks
2015-11-03 16:05:57,002 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler_Worker-19) [2d0d946c] Trying to release exclusive lock which does n
ot exist, lock key: 'db59cd36-3e20-4771-b7ac-37fddceca45dVM'
2015-11-03 16:05:57,002 INFO  [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (DefaultQuartzScheduler_Worker-19) [2d0d946c] Lock freed to object 'EngineLock:{excl
usiveLocks='[db59cd36-3e20-4771-b7ac-37fddceca45d=<VM, ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM$VmName vm_cin>]', sharedLocks='null'}'
2015-11-03 16:05:57,031 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-19) [2d0d946c] Correlation ID: 3265c76c, C
all Stack: null, Custom Event ID: -1, Message: Failed to complete snapshot 't1' creation for VM 'vm_cin'.
2015-11-03 16:05:58,007 ERROR [org.ovirt.engine.core.bll.host.provider.foreman.SystemProviderFinder] (ajp-/127.0.0.1:8702-9) [] Failed to find host on any provider by host name 'ovirt-gofen-2.scl.lab.tlv.redhat.com'

So to sum things up every Vm with x>1 cinder disks will report 'operation failed' when trying to create a snapshot. 

Version-Release number of selected component (if applicable):
3.6.0.2

How reproducible:
100%

Steps to Reproduce:
1.have a Vm+2 cinder disks
2.create a snapshot

Actual results:
The operation fails

Expected results:
The operation should not fail (UI wise)

Additional info:
This behavior is new and was not encountered during pre-integration, thus, marking it as a reGRession

Comment 1 Tal Nisan 2015-11-03 15:19:34 UTC
Daniel, please have a look asap.

BTW Ori, you can't mark a bug that marks a regression from pre-integration as a regression... Only official versions are in this game.

Comment 2 Daniel Erez 2015-11-03 19:57:22 UTC
@Moti:
* When SystemProviderFinder should be invoked?
* Could it affect other operations by taking locks for example?

@Ori:
* Can you please attach full engine and vdsm logs?
* Did you navigate to Errata tab in the webadmin before reproducing the issue?

Comment 3 Moti Asayag 2015-11-04 05:01:47 UTC
(In reply to Daniel Erez from comment #2)
> @Moti:
> * When SystemProviderFinder should be invoked?

It depends. In 3.6.1 the SystemProviderFinder will be called only when clicked on the 'Errata' on the system tree or calling via the api /api/katelloerrata resource.

In 3.6.0, before fixing bug 1260499, it will be called when SyncSearch() is called.

> * Could it affect other operations by taking locks for example?
> 

No, it shouldn't gain any lock.

> @Ori:
> * Can you please attach full engine and vdsm logs?
> * Did you navigate to Errata tab in the webadmin before reproducing the
> issue?

Comment 4 Oved Ourfali 2015-11-04 06:25:29 UTC
(In reply to Moti Asayag from comment #3)
> (In reply to Daniel Erez from comment #2)
> > @Moti:
> > * When SystemProviderFinder should be invoked?
> 
> It depends. In 3.6.1 the SystemProviderFinder will be called only when
> clicked on the 'Errata' on the system tree or calling via the api
> /api/katelloerrata resource.
> 
> In 3.6.0, before fixing bug 1260499, it will be called when SyncSearch() is
> called.

And this fix will only be part of 3.6.1.

> 
> > * Could it affect other operations by taking locks for example?
> > 
> 
> No, it shouldn't gain any lock.
> 
> > @Ori:
> > * Can you please attach full engine and vdsm logs?
> > * Did you navigate to Errata tab in the webadmin before reproducing the
> > issue?

Moving to storage to examine the cinder issue.
It has nothing to do with Foreman. You'll see these Foreman errors regardless of doing any operations on cinder....

Also, setting "storage | infra" whiteboard is nice.... however, it isn't supported in our queries, and that's not the way we work.

Comment 5 Daniel Erez 2015-11-04 10:21:48 UTC
To sum up, the recurring error on SystemProviderFinder seems unrelated and been already addressed by bug 1260499. The error in snapshot creation reproduces on master, however, it's merely a logging issue; i.e. operation is reported as failed and then as succeeded (suggested a patch for it). Hence, lowering severity.

Comment 6 Ori Gofen 2015-12-16 08:29:49 UTC
Verified on rhevm-3.6.1.3-0.1.el6.noarch

Comment 7 Sandro Bonazzola 2015-12-16 12:16:52 UTC
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.