Bug 1001584 - After restarting “ovirt-engine” service, Storage Domain enters “Inactive” mode during DetachStorageDomain command
Summary: After restarting “ovirt-engine” service, Storage Domain enters “Inactive” mod...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Liron Aravot
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-27 10:47 UTC by vvyazmin@redhat.com
Modified: 2016-02-10 18:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-08 11:47:16 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (5.62 MB, application/x-gzip)
2013-08-27 10:47 UTC, vvyazmin@redhat.com
no flags Details
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (8.22 MB, application/x-gzip)
2013-09-17 08:07 UTC, vvyazmin@redhat.com
no flags Details
engine.log (143.10 KB, application/x-gzip)
2015-11-03 12:21 UTC, Natalie Gavrielov
no flags Details

Description vvyazmin@redhat.com 2013-08-27 10:47:02 UTC
Created attachment 790886 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Description of problem:
After restart “ovirt-engine” service, Storage Domain enter to “Inactive” mode - during DetachStorageDomain command

Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS11 environment:

RHEVM:  rhevm-3.3.0-0.16.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.11-1.el6ev.noarch
VDSM:  vdsm-4.12.0-72.git287bb7e.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create Data Center with one host and 2 Storage Domains (SD)
2. Maintenance (DetachStorageDomain) non master SD.
3. During DetachStorageDomain command, restart “ovirt-engine” service

Actual results:
SD enter in “Inactive” Mode

Expected results:
Succeed maintenance SD. 

Impact on user:
Failed maintenance SD

Workaround:
Activate and then Deactivate same SD again

Additional info:

/var/log/ovirt-engine/engine.log

2013-08-26 16:26:56,238 INFO  [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Running command: DetachStorageDomainFromPoolCo
mmand internal: false. Entities affected :  ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage
2013-08-26 16:26:56,239 INFO  [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Start detach storage domain
2013-08-26 16:26:56,294 INFO  [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34]  Detach storage domain: before connect
2013-08-26 16:26:56,307 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-48) [77ebff34] START, ConnectStorageServerVDSCommand(HostName = tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 29479ada-c628-410c-8705-808beb06e92f, storageType = ISCSI, connectionList = [{ id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2e312b1a
2013-08-26 16:26:56,998 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-48) [77ebff34] FINISH, ConnectStorageServerVDSCommand, return: {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 2e312b1a
2013-08-26 16:26:56,999 INFO  [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34]  Detach storage domain: after connect
2013-08-26 16:26:57,000 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (pool-5-thread-47) [77ebff34] START, DetachStorageDomainVDSCommand( storagePoolId = 29479ada-c628-410c-8705-808beb06e92f, ignoreFailoverLimit = false, storageDomainId = 5aa0e6b6-6969-4c81-b676-db85d548249a, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 1, force = false), log id: 1d1b473a
2013-08-26 16:26:58,710 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-6) Failed to invoke scheduled method OnTimer: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) [:1.7.0_25]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_25]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_25]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: org.jboss.as.ejb3.component.EJBComponentUnavailableException: JBAS014559: Invocation cannot proceed as component is shutting down
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]


2013-08-27 10:33:24,093 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 5aa0e6b6-6969-4c81-b676-db85d548249a:SD-e-02 was reported by
 all hosts in status UP as problematic. Moving the domain to NonOperational.

vdsClient -s 0 getStorageDomainInfo 5aa0e6b6-6969-4c81-b676-db85d548249a 

	uuid = 5aa0e6b6-6969-4c81-b676-db85d548249a
	vguuid = PGGe3n-bhe5-f4iR-uBe0-5eRf-DSmI-4eq9KP
	lver = -1
	state = OK
	version = 3
	role = Regular
	pool = ['29479ada-c628-410c-8705-808beb06e92f']
	spm_id = -1
	type = ISCSI
	class = Data
	master_ver = 0
	name = SD-e-02




/var/log/vdsm/vdsm.log

Comment 1 vvyazmin@redhat.com 2013-09-17 08:05:39 UTC
Failed, tested on RHEVM 3.3 - IS14 environment:
Tested on FCP Data Centers

Host OS: RHEL 6.5

RHEVM:  rhevm-3.3.0-0.21.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.13-1.el6ev.noarch
VDSM:  vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

Comment 2 vvyazmin@redhat.com 2013-09-17 08:07:06 UTC
Created attachment 798659 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Comment 5 Ayal Baron 2013-12-18 09:23:33 UTC
Tal, update on this one?

Comment 7 Allon Mureinik 2015-07-07 13:46:14 UTC
Aharon - does QA have a test case for this? Does it still happen in 3.6.0?

Comment 8 Aharon Canan 2015-07-07 15:17:18 UTC
Probably we do but I do not think we ran it in the last year.
We can re-test

Comment 9 Yaniv Lavi 2015-10-22 08:18:44 UTC
Did you retest?

Comment 10 Aharon Canan 2015-11-01 15:38:37 UTC
Natalie - please do.

Comment 11 Natalie Gavrielov 2015-11-03 12:20:31 UTC
Ran the following scenario (a few times):

1. Moving an SD (not a master) to maintenance.
2. During the "locked state" (during the maintenance operation) perform engine restart.

Configuration: 
2 hosts (one is in "maintenance", a few SD's, the one that was put in maintenance state was not a master.

Environment: 
rhevm-3.6.0.2-0.1.el6.noarch

Result:
After the restart SD was in maintenance mode.

Comment 12 Natalie Gavrielov 2015-11-03 12:21:02 UTC
Created attachment 1088953 [details]
engine.log

Comment 13 Allon Mureinik 2015-11-08 08:04:36 UTC
(In reply to Natalie Gavrielov from comment #11)
> Ran the following scenario (a few times):
> 
> 1. Moving an SD (not a master) to maintenance.
> 2. During the "locked state" (during the maintenance operation) perform
> engine restart.
> 
> Configuration: 
> 2 hosts (one is in "maintenance", a few SD's, the one that was put in
> maintenance state was not a master.
> 
> Environment: 
> rhevm-3.6.0.2-0.1.el6.noarch
> 
> Result:
> After the restart SD was in maintenance mode.

To sum up - you moved a domain to maintenance, restarted the engine, and the domain still went to maintenance.
Doesn't this mean the BZ should be VERIFIED on the version you tested it with?

Comment 14 Aharon Canan 2015-11-08 11:35:46 UTC
(In reply to Allon Mureinik from comment #13)
> (In reply to Natalie Gavrielov from comment #11)
> > Ran the following scenario (a few times):
> > 
> > 1. Moving an SD (not a master) to maintenance.
> > 2. During the "locked state" (during the maintenance operation) perform
> > engine restart.
> > 
> > Configuration: 
> > 2 hosts (one is in "maintenance", a few SD's, the one that was put in
> > maintenance state was not a master.
> > 
> > Environment: 
> > rhevm-3.6.0.2-0.1.el6.noarch
> > 
> > Result:
> > After the restart SD was in maintenance mode.
> 
> To sum up - you moved a domain to maintenance, restarted the engine, and the
> domain still went to maintenance.
> Doesn't this mean the BZ should be VERIFIED on the version you tested it
> with?

For sure not verified as no patch here, 
Can be Works for me or something...

Comment 15 Allon Mureinik 2015-11-08 11:47:16 UTC
(In reply to Aharon Canan from comment #14)
> (In reply to Allon Mureinik from comment #13)
> > (In reply to Natalie Gavrielov from comment #11)
> > > Ran the following scenario (a few times):
> > > 
> > > 1. Moving an SD (not a master) to maintenance.
> > > 2. During the "locked state" (during the maintenance operation) perform
> > > engine restart.
> > > 
> > > Configuration: 
> > > 2 hosts (one is in "maintenance", a few SD's, the one that was put in
> > > maintenance state was not a master.
> > > 
> > > Environment: 
> > > rhevm-3.6.0.2-0.1.el6.noarch
> > > 
> > > Result:
> > > After the restart SD was in maintenance mode.
> > 
> > To sum up - you moved a domain to maintenance, restarted the engine, and the
> > domain still went to maintenance.
> > Doesn't this mean the BZ should be VERIFIED on the version you tested it
> > with?
> 
> For sure not verified as no patch here, 
> Can be Works for me or something...
That's a more appropriate course of action, agreed.


Note You need to log in before you can comment on or make changes to this bug.