Created attachment 790886 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm Description of problem: After restart “ovirt-engine” service, Storage Domain enter to “Inactive” mode - during DetachStorageDomain command Version-Release number of selected component (if applicable): RHEVM 3.3 - IS11 environment: RHEVM: rhevm-3.3.0-0.16.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.11-1.el6ev.noarch VDSM: vdsm-4.12.0-72.git287bb7e.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create Data Center with one host and 2 Storage Domains (SD) 2. Maintenance (DetachStorageDomain) non master SD. 3. During DetachStorageDomain command, restart “ovirt-engine” service Actual results: SD enter in “Inactive” Mode Expected results: Succeed maintenance SD. Impact on user: Failed maintenance SD Workaround: Activate and then Deactivate same SD again Additional info: /var/log/ovirt-engine/engine.log 2013-08-26 16:26:56,238 INFO [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Running command: DetachStorageDomainFromPoolCo mmand internal: false. Entities affected : ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage 2013-08-26 16:26:56,239 INFO [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Start detach storage domain 2013-08-26 16:26:56,294 INFO [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Detach storage domain: before connect 2013-08-26 16:26:56,307 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-48) [77ebff34] START, ConnectStorageServerVDSCommand(HostName = tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 29479ada-c628-410c-8705-808beb06e92f, storageType = ISCSI, connectionList = [{ id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2e312b1a 2013-08-26 16:26:56,998 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-5-thread-48) [77ebff34] FINISH, ConnectStorageServerVDSCommand, return: {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 2e312b1a 2013-08-26 16:26:56,999 INFO [org.ovirt.engine.core.bll.storage.DetachStorageDomainFromPoolCommand] (pool-5-thread-47) [77ebff34] Detach storage domain: after connect 2013-08-26 16:26:57,000 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (pool-5-thread-47) [77ebff34] START, DetachStorageDomainVDSCommand( storagePoolId = 29479ada-c628-410c-8705-808beb06e92f, ignoreFailoverLimit = false, storageDomainId = 5aa0e6b6-6969-4c81-b676-db85d548249a, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 1, force = false), log id: 1d1b473a 2013-08-26 16:26:58,710 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-6) Failed to invoke scheduled method OnTimer: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) [:1.7.0_25] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_25] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_25] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: org.jboss.as.ejb3.component.EJBComponentUnavailableException: JBAS014559: Invocation cannot proceed as component is shutting down at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] 2013-08-27 10:33:24,093 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-5-thread-50) Domain 5aa0e6b6-6969-4c81-b676-db85d548249a:SD-e-02 was reported by all hosts in status UP as problematic. Moving the domain to NonOperational. vdsClient -s 0 getStorageDomainInfo 5aa0e6b6-6969-4c81-b676-db85d548249a uuid = 5aa0e6b6-6969-4c81-b676-db85d548249a vguuid = PGGe3n-bhe5-f4iR-uBe0-5eRf-DSmI-4eq9KP lver = -1 state = OK version = 3 role = Regular pool = ['29479ada-c628-410c-8705-808beb06e92f'] spm_id = -1 type = ISCSI class = Data master_ver = 0 name = SD-e-02 /var/log/vdsm/vdsm.log
Failed, tested on RHEVM 3.3 - IS14 environment: Tested on FCP Data Centers Host OS: RHEL 6.5 RHEVM: rhevm-3.3.0-0.21.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.13-1.el6ev.noarch VDSM: vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64
Created attachment 798659 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Tal, update on this one?
Aharon - does QA have a test case for this? Does it still happen in 3.6.0?
Probably we do but I do not think we ran it in the last year. We can re-test
Did you retest?
Natalie - please do.
Ran the following scenario (a few times): 1. Moving an SD (not a master) to maintenance. 2. During the "locked state" (during the maintenance operation) perform engine restart. Configuration: 2 hosts (one is in "maintenance", a few SD's, the one that was put in maintenance state was not a master. Environment: rhevm-3.6.0.2-0.1.el6.noarch Result: After the restart SD was in maintenance mode.
Created attachment 1088953 [details] engine.log
(In reply to Natalie Gavrielov from comment #11) > Ran the following scenario (a few times): > > 1. Moving an SD (not a master) to maintenance. > 2. During the "locked state" (during the maintenance operation) perform > engine restart. > > Configuration: > 2 hosts (one is in "maintenance", a few SD's, the one that was put in > maintenance state was not a master. > > Environment: > rhevm-3.6.0.2-0.1.el6.noarch > > Result: > After the restart SD was in maintenance mode. To sum up - you moved a domain to maintenance, restarted the engine, and the domain still went to maintenance. Doesn't this mean the BZ should be VERIFIED on the version you tested it with?
(In reply to Allon Mureinik from comment #13) > (In reply to Natalie Gavrielov from comment #11) > > Ran the following scenario (a few times): > > > > 1. Moving an SD (not a master) to maintenance. > > 2. During the "locked state" (during the maintenance operation) perform > > engine restart. > > > > Configuration: > > 2 hosts (one is in "maintenance", a few SD's, the one that was put in > > maintenance state was not a master. > > > > Environment: > > rhevm-3.6.0.2-0.1.el6.noarch > > > > Result: > > After the restart SD was in maintenance mode. > > To sum up - you moved a domain to maintenance, restarted the engine, and the > domain still went to maintenance. > Doesn't this mean the BZ should be VERIFIED on the version you tested it > with? For sure not verified as no patch here, Can be Works for me or something...
(In reply to Aharon Canan from comment #14) > (In reply to Allon Mureinik from comment #13) > > (In reply to Natalie Gavrielov from comment #11) > > > Ran the following scenario (a few times): > > > > > > 1. Moving an SD (not a master) to maintenance. > > > 2. During the "locked state" (during the maintenance operation) perform > > > engine restart. > > > > > > Configuration: > > > 2 hosts (one is in "maintenance", a few SD's, the one that was put in > > > maintenance state was not a master. > > > > > > Environment: > > > rhevm-3.6.0.2-0.1.el6.noarch > > > > > > Result: > > > After the restart SD was in maintenance mode. > > > > To sum up - you moved a domain to maintenance, restarted the engine, and the > > domain still went to maintenance. > > Doesn't this mean the BZ should be VERIFIED on the version you tested it > > with? > > For sure not verified as no patch here, > Can be Works for me or something... That's a more appropriate course of action, agreed.