RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 807687 - vdsm: hsm becomes non-operational after activation if changes were made to master domain or its version while host was in maintenance
Summary: vdsm: hsm becomes non-operational after activation if changes were made to ma...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Eduardo Warszawski
QA Contact: Jakub Libosvar
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-28 13:56 UTC by Dafna Ron
Modified: 2022-07-09 05:34 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.9.6-10
Doc Type: Bug Fix
Doc Text:
Previously, due to an issue with pool metadata not refreshing correctly, attempting to put the HSM host into maintenance mode while reconstructing the master domain would result in the changes not being updated in the HSM. The pool metadata issue has been solved so that any changes and updates applied to the HSM in maintenance mode will be retained when it is reactivated.
Clone Of:
Environment:
Last Closed: 2012-12-04 18:56:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (790.20 KB, application/x-gzip)
2012-03-28 13:57 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 0 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Dafna Ron 2012-03-28 13:56:27 UTC
Description of problem:

if we put the hsm host in maintenance while we reconstruct master then the changes are not updated in hsm

this is caused by simply putting the master domain in maintenance while hsm is also in maintenance

if master domain is the same domain as before (which means version changes) the version is also not updated and the hsm will get wrong master domain or version. 

backend is sending disconnectStorageServer so domain should be disconnected 

Version-Release number of selected component (if applicable):

vdsm-4.9.6-4.5.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster add two storage domains
2. put hsm in maintenance and put the master domain in maintenance as well (so that the second domain will become master) 
3. activate the hsm -> host will become non-operational with can't find master domain error
4. put the hsm in maintenance again
5. put the new master domain in maintenance so that the old domain will become master again
6. activate the hsm -> host will become non-operational with wrong master version error
  
Actual results:

hsm is not updated with changes made to master domain while its disconnected from pool. 
when we activate the hsm and the master has changed to different location we get can't find master and when the version has changed (if we put master in maintenance) we will get wrong version

Expected results:

hsm should be updated with changes when activated. 

Additional info: will attach full logs from both hosts and backend

Thread-350::INFO::2012-03-28 15:11:09,899::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', hostID=2, scsiKey='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', remove=False, options=None)

Thread-351::INFO::2012-03-28 15:11:09,954::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=3, spUUID='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', conList=[{'connection': '10.35.64.106', 'iqn': 'iqn.1986-03.com.sun:02:dafna112713222714816', 'portal': '1', 'user': '', 'password': '******', 'id': '7c2518d4-f9b6-493f-b32e-fcf1668264b3', 'port': '3260'}, {'connection': '10.35.64.10', 'iqn': 'Dafna-big', 'portal': '1', 'user': '', 'password': '******', 'id': 'b1f1a8b2-f42d-4a33-8bf5-80f39f3d04a7', 'port': '3260'}], options=None)


Thread-347::ERROR::2012-03-28 15:07:51,354::sp::1456::Storage.StoragePool::(getMasterDomain) Requested master domain 83a46a9e-dac2-4513-bb21-a33ff76a495a does not have expected 
version 3 it is version 1
Thread-347::DEBUG::2012-03-28 15:07:51,355::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5'
Thread-347::DEBUG::2012-03-28 15:07:51,355::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' (0 active u
sers)
Thread-347::DEBUG::2012-03-28 15:07:51,356::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' is free, finding out
 if anyone is waiting for it.
Thread-347::DEBUG::2012-03-28 15:07:51,356::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5
', Clearing records.
Thread-347::ERROR::2012-03-28 15:07:51,357::task::853::TaskManager.Task::(_setError) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 813, in connectStoragePool
    return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options)
  File "/usr/share/vdsm/storage/hsm.py", line 855, in _connectStoragePool
    res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 641, in connect
    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1107, in __rebuild
    self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1457, in getMasterDomain
    raise se.StoragePoolWrongMaster(self.spUUID, msdUUID)
StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=83a46a9e-dac2-4513-bb21-a33ff76a495a, pool=8ed78e50-b61e-4b84-a5b7-7c17f76f16a5'
Thread-347::DEBUG::2012-03-28 15:07:51,358::task::872::TaskManager.Task::(_run) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::Task._run: 8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13 ('8e
d78e50-b61e-4b84-a5b7-7c17f76f16a5', 2, '8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', '83a46a9e-dac2-4513-bb21-a33ff76a495a', 3) {} failed - stopping task
Thread-347::DEBUG::2012-03-28 15:07:51,358::task::1199::TaskManager.Task::(stop) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::stopping in state preparing (force False)
Thread-347::DEBUG::2012-03-28 15:07:51,359::task::978::TaskManager.Task::(_decref) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::ref 1 aborting True
Thread-347::INFO::2012-03-28 15:07:51,359::task::1157::TaskManager.Task::(prepare) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::aborting: Task is aborted: 'Wrong Master domain o
r its version' - code 324

Comment 1 Dafna Ron 2012-03-28 13:57:32 UTC
Created attachment 573357 [details]
logs

Comment 2 Eduardo Warszawski 2012-05-02 13:42:38 UTC
The pool metadata is not refreshed due to an sdc issue. (HSM)

In addition:
The host reach this situation after a lot of misleading operations of the engine, like disconnect the storage and try to connect the pool.
The 2nd SD in the pool, 78627e5c-87f7-4492-bc41-a832c5955492 was unreacheable over the whole log.
In spite of that was choosed as master and was attempted to connect this HSM to a this unreacheable master.
The flow should be revised.

http://gerrit.ovirt.org/#change,4085

Comment 4 Jakub Libosvar 2012-05-09 16:22:49 UTC
Verified using vdsm-4.9.6-10.el6.x86_64

Comment 7 errata-xmlrpc 2012-12-04 18:56:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.