Bug 1256841 - Storage domain remains in 'preparing for maintenance' when there is a non operational host
Storage domain remains in 'preparing for maintenance' when there is a non ope...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-3.6.2
: 3.6.2
Assigned To: Liron Aravot
Elad
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-25 10:59 EDT by Liron Aravot
Modified: 2016-03-10 10:17 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1247957
Environment:
Last Closed: 2016-02-18 06:01:49 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 45322 master MERGED core: support domains maintenance on more hosts statuses Never
oVirt gerrit 48921 ovirt-engine-3.6 MERGED core: IrsProxyData.updateVdsDomainsData() - separation to methods Never
oVirt gerrit 48922 ovirt-engine-3.6 MERGED core: VdsDao - adding getAllForStoragePoolAndStatuses() Never
oVirt gerrit 48923 ovirt-engine-3.6 MERGED core: moving reportingVdsStatus constant from IrsBrokerCommand Never
oVirt gerrit 48924 ovirt-engine-3.6 MERGED core: decide whether to process domain monitoring in IrsProxyData Never
oVirt gerrit 48925 ovirt-engine-3.6 MERGED core: support domains maintenance on more hosts statuses Never
oVirt gerrit 48997 ovirt-engine-3.6 MERGED core: DeactivateStorageDomainCommand - refresh all hosts Never
oVirt gerrit 49005 ovirt-engine-3.6 MERGED core: ReconstructMasterDomain - connect only hosts in status UP Never
oVirt gerrit 49011 master MERGED core: DeactivateStorageDomainCommand - refresh all hosts Never
oVirt gerrit 49012 master MERGED core: ReconstructMasterDomain - connect only hosts in status UP Never

  None (edit)
Description Liron Aravot 2015-08-25 10:59:58 EDT
+++ This bug was initially created as a clone of Bug #1247957 +++

Description of problem:
Deactivating a storage domain job pass but the storage domain is in status 'preparing for maintenance'. Seems a false positive.

This happens because one of the host is in non-operational state, not sure if this is also a different bug (that storage domain should be deactivated properly when a host is non-operational)

Version-Release number of selected component (if applicable):
ovirt-engine-3.6.0-0.0.master.20150726172446.git65db93d.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Have a setup with multiple hosts and storage domains, type doesn't matter
2. One of the host is non-operational (in my case there's an issue accessing one of the gluster domains)
3. Try to deactivate any of the storage domains from the data center

Actual results:
Deactivating a storage domain job seems to finish properly (PASS), but the storage domains are in preparing for maintenance status.

Expected results:
Deactivating a storage domain job status is FAILED (or in case with one host is non operational status the storage domains hould be deactivated then the storage domain should be in the proper state)

Additional info:
engine RHEL 6.7
hosts RHEL 7.1

--- Additional comment from Carlos Mestre González on 2015-07-29 06:49:57 EDT ---



--- Additional comment from Carlos Mestre González on 2015-07-29 06:50:47 EDT ---

posting part of engine.log to quick look:

2015-07-29 13:21:32,562 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (org.ovirt.thread.pool-8-thread-10) [24576b6f] FINISH, DisconnectStorageServerVDSCommand, return: {88bca80f-7ce0-4768-8283-de6387e24464=0}, log id: 483509d0
2015-07-29 13:21:32,564 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-10) [24576b6f] Domain 'e7843c77-73bf-4866-af2e-5fb1ebe8d4b4' will remain in 'PreparingForMaintenance' status until deactivated on all hosts
2015-07-29 13:21:32,569 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-10) [24576b6f] Correlation ID: 2931eb50, Job ID: d67655d9-cd35-4cbf-852a-80d7aa33f563, Call Stack: null, Custom Event ID: -1, Message: Storage Domain test_4831_exp (Data Center golden_env_mixed) was deactivated.
2015-07-29 13:21:32,580 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (org.ovirt.thread.pool-8-thread-10) [24576b6f] Trying to release exclusive lock which does not exist, lock key: 'e7843c77-73bf-4866-af2e-5fb1ebe8d4b4STORAGE'
2015-07-29 13:21:32,580 INFO  [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainWithOvfUpdateCommand] (org.ovirt.thread.pool-8-thread-10) [24576b6f] Lock freed to object 'EngineLock:{exclusiveLocks='[e7843c77-73bf-4866-af2e-5fb1ebe8d4b4=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2015-07-29 13:21:32,882 ERROR [org.ovirt.engine.core.dao.jpa.TransactionalInterceptor] (default task-10) [] Failed to run operation in a new transaction: javax.persistence.PersistenceException: org.hibernate.HibernateException: A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance: org.ovirt.engine.core.common.job.Job.steps
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1763) [hibernate-entitymanager-4.3.7.Final.jar:4.3.7.Final]
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1677) [hibernate-entitymanager-4.3.7.Final.jar:4.3.7.Final]
    at org.hibernate.jpa.internal.QueryImpl.getResultList(QueryImpl.java:458) [hibernate-entitymanager-4.3.7.Final.jar:4.3.7.Final]
    at org.ovirt.engine.core.dao.jpa.AbstractJpaDao.multipleResults(AbstractJpaDao.java:89) [dal.jar:]
    at org.ovirt.engine.core.dao.JobDaoImpl$Proxy$_$$_WeldSubclass.multipleResults(Unknown Source) [dal.jar:]
    at org.ovirt.engine.core.dao.JobDaoImpl.getJobsByOffsetAndPageSize(JobDaoImpl.java:41) [dal.jar:]
    at org.ovirt.engine.core.dao.JobDaoImpl$Proxy$_$$_WeldSubclass.getJobsByOffsetAndPageSize(Unknown Source) [dal.jar:]
    at sun.reflect.GeneratedMethodAccessor875.invoke(Unknown Source) [:1.7.0_79]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_79]
    at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_79]
    at org.jboss.weld.interceptor.proxy.SimpleInterceptionChain.interceptorChainCompleted(SimpleInterceptionChain.java:51) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05]
    at org.jboss.weld.interceptor.chain.AbstractInterceptionChain.finish(AbstractInterceptio

--- Additional comment from Allon Mureinik on 2015-07-29 10:19:10 EDT ---

Liron, could you take a look please?

--- Additional comment from Liron Aravot on 2015-07-30 04:59:28 EDT ---

The implemented behavior is that when a host is non operational it's reporting data won't be collected and therefore the domain will remain in Preparing For Maintenance.

two possible action items here (perhaps 2 can be handled on a different BZ).
1. add a audit log message when domain moves to "Preparing to maintenance" and use the current one when the domain is moving to Maintenance status.

2. look into improving the current situation and if host is non-op (as host can be non op by various reasons) but has a domain report, use that report for moving unseen domains to maintenance.
Comment 1 Yaniv Lavi (Dary) 2015-09-06 10:38:04 EDT
should this be in POST?
Comment 2 Sandro Bonazzola 2015-10-26 08:29:54 EDT
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high
Comment 3 Red Hat Bugzilla Rules Engine 2015-10-29 06:05:01 EDT
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset.
Please set the correct milestone or add the z-stream flag.
Comment 4 Sandro Bonazzola 2015-12-23 08:40:55 EST
oVirt 3.6.2 RC1 has been released for testing, moving to ON_QA
Comment 5 Elad 2016-01-10 04:50:59 EST
Liron, should I verify according to the steps written in the description?
Comment 6 Liron Aravot 2016-01-14 09:32:05 EST
Elad, yes - but it doesn't have to be a gluster domain.
Comment 7 Elad 2016-01-17 11:08:50 EST
Storage domain deactivation, while having a non-operational host in the DC due to storage issues, succeeds.Domain changes its status to maintenance.

Tested the following scenario:
- 2 hosts in DC, 2 active domains (iSCSI and Gluster)
- Blocked one of the hosts to the Gluster domain. Host status changed to non-op
- Deactivated the Gluster domain

Domain moved to maintenance successfully, in event log got the following:

Storage Domain data4 (Data Center Default) was deactivated and has moved to 'Preparing for maintenance' until it will no longer be accessed by any Host of the Data Center.

- Activated the Gluster domain and deactivated the iSCSI domain (while the Gluster domain is still blocked from one of the hosts)

iSCSI domain moved maintenance successfully.


Verified using:
rhevm-3.6.2.5-0.1.el6.noarch
vdsm-4.17.17-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.