Bug 825732
| Summary: | 3.1 - [ovirt] [storage] unable to recover pool in case meta-data is corrupted | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Haim <hateya> | ||||
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | unspecified | CC: | abaron, acathrow, bazulay, dyasny, ewarszaw, iheim, mgoldboi, mkublin, yeylon, ykaul | ||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||
| Target Release: | 3.2 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | v4.9.6-26.0 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-02-15 06:46:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 587199 [details]
vdsm log
BTW - if i restart vdsm service on host, everything goes back to normal and reconstruct master succeeds As said on BZ#835824: https://bugzilla.redhat.com/show_bug.cgi?id=825732 The getDomains() call during the disconnectStoragePool() was removed in Change-Id: I7c33ecdc2cd21c5afb445a339f400bc990b93998 http://gerrit.ovirt.org/#/c/3929/ Since the MSD domain MD was corrupted the disconnect failed. disconnectStoragePool should success with the actual code. getSpmStatus should induce reconstruct if the MSD MD is broken: http://gerrit.ovirt.org/#/c/6083/3 |
Description of problem: story: - 1 host - 2 storage domains, data, 1 is master - master storage domain meta-data is corrupted - engine tries to reconstruct master domain to second domain * engine send spmStop command to vdsm, vdsm catches exception and returns None * engine send getSpmStatus command to vdsm, vdsm fails with meta-data seal is broken * engine send disconnectStoragePool to vdsm, vdsm fails again with same error as above this flow continues on a loop for ever. Thread-202130::INFO::2012-05-28 12:01:29,487::logUtils::37::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='c504b880-23a2-4503-aa11-6cfdf50c98f1', options=None) Thread-202130::DEBUG::2012-05-28 12:01:29,488::lvm::464::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,488::lvm::466::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,488::lvm::475::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,489::lvm::487::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,489::lvm::349::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,490::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"a%1hateya-upstream-lun101|360014059a3d79162a7c45c8ac96537ba%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 795a9f86-d1ef-46e1-b338-288cc725d59f' (cwd None) Thread-202130::DEBUG::2012-05-28 12:01:29,759::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0 Thread-202130::DEBUG::2012-05-28 12:01:29,762::lvm::376::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-202130::DEBUG::2012-05-28 12:01:29,762::persistentDict::226::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=[u'PV0=pv:mpathaj,uuid:5rDU3L-gJ0X-bIl3-RNzZ-A3oW-dLeY-nP2qNQ,pestart:0,pecount:397,mapoffset:0', 'DESCRIPTION=lion-pool', 'TYPE=ISCSI', 'VGUUID=J71m7n-32EB-2dXo-CEWu-2FHV-3e21-V8vKKG', 'LOGBLKSIZE=512', 'LEASERETRIES=3', 'SDUUID=795a9f86-d1ef-46e1-b338-288cc725d59f', 'LOCKRENEWALINTERVALSEC=5', 'PHYBLKSIZE=512', 'VERSION=2', 'CLASS=Data', 'LOCKPOLICY=', 'LEASETIMESEC=60', 'IOOPTIMEOUTSEC=10', 'POOL_DESCRIPTION=new_pool', 'ROLE=Master', 'POOL_UUID=c504b880-23a2-4503-aa11-6cfdf50c98f1', 'MASTER_VERSION=1', u'POOL_DOMAINS=795a9f86-d1ef-46e1-b338-288cc725d59f:Active,86473814-49c4-4259-b5f2-37f1d72f5e90:Active', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=3', '_SHA_CKSUM=e3a9cafa572573a6122352c5as'] Thread-202130::WARNING::2012-05-28 12:01:29,763::persistentDict::264::Storage.PersistentDict::(refresh) data seal is broken metadata declares `e3a9cafa572573a6122352c5as` should be `e3a9cafa572573a6122352c563947a6659f5bd46` (lines={u'PV0': u'pv:mpathaj,uuid:5rDU3L-gJ0X-bIl3-RNzZ-A3oW-dLeY-nP2qNQ,pestart:0,pecount:397,mapoffset:0', 'VERSION': '2', 'LEASETIMESEC': '60', 'DESCRIPTION': 'lion-pool', 'MASTER_VERSION': '1', 'LOCKRENEWALINTERVALSEC': '5', 'VGUUID': 'J71m7n-32EB-2dXo-CEWu-2FHV-3e21-V8vKKG', 'LEASERETRIES': '3', 'POOL_DESCRIPTION': 'new_pool', 'POOL_SPM_LVER': '3', 'SDUUID': '795a9f86-d1ef-46e1-b338-288cc725d59f', 'PHYBLKSIZE': '512', 'ROLE': 'Master', 'LOGBLKSIZE': '512', 'LOCKPOLICY': '', u'POOL_DOMAINS': u'795a9f86-d1ef-46e1-b338-288cc725d59f:Active,86473814-49c4-4259-b5f2-37f1d72f5e90:Active', 'POOL_UUID': 'c504b880-23a2-4503-aa11-6cfdf50c98f1', 'TYPE': 'ISCSI', 'CLASS': 'Data', 'IOOPTIMEOUTSEC': '10', 'POOL_SPM_ID': '1'}) Thread-202130::ERROR::2012-05-28 12:01:29,764::hsm::534::Storage.HSM::(getSpmStatus) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 529, in getSpmStatus status = {'spmStatus':pool.getSpmRole(), 'spmLver': pool.getSpmLver(), 'spmId':pool.getSpmId()} File "/usr/share/vdsm/storage/sp.py", line 133, in getSpmLver return self.getMetaParam(PMDK_LVER) File "/usr/share/vdsm/storage/sp.py", line 1434, in getMetaParam return self._metadata[key] File "/usr/share/vdsm/storage/persistentDict.py", line 85, in __getitem__ return dec(self._dict[key]) File "/usr/share/vdsm/storage/persistentDict.py", line 193, in __getitem__ with self._accessWrapper(): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper self.refresh() File "/usr/share/vdsm/storage/persistentDict.py", line 265, in refresh raise se.MetaDataSealIsBroken(declaredChecksum, computedChecksum) MetaDataSealIsBroken: Meta Data seal is broken (checksum mismatch): 'cksum = e3a9cafa572573a6122352c5as, computed_cksum = e3a9cafa572573a6122352c563947a6659f5bd46' Thread-202130::ERROR::2012-05-28 12:01:29,764::task::853::TaskManager.Task::(_setError) Task=`3dbd6b05-d21a-457d-ac9e-deb8d78ae94d`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 529, in getSpmStatus status = {'spmStatus':pool.getSpmRole(), 'spmLver': pool.getSpmLver(), 'spmId':pool.getSpmId()} File "/usr/share/vdsm/storage/sp.py", line 133, in getSpmLver return self.getMetaParam(PMDK_LVER) File "/usr/share/vdsm/storage/sp.py", line 1434, in getMetaParam return self._metadata[key] File "/usr/share/vdsm/storage/persistentDict.py", line 85, in __getitem__ return dec(self._dict[key]) File "/usr/share/vdsm/storage/persistentDict.py", line 193, in __getitem__ with self._accessWrapper(): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper self.refresh() File "/usr/share/vdsm/storage/persistentDict.py", line 265, in refresh raise se.MetaDataSealIsBroken(declaredChecksum, computedChecksum) MetaDataSealIsBroken: Meta Data seal is broken (checksum mismatch): 'cksum = e3a9cafa572573a6122352c5as, computed_cksum = e3a9cafa572573a6122352c563947a6659f5bd46'