| Summary: | [vdsm] Domain is not unmounted if attach fails due to metadata failure | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jakub Libosvar <jlibosva> | ||||
| Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | yeylon <yeylon> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | CC: | abaron, bazulay, fsimonce, iheim, smizrahi, srevivo, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-06-20 16:13:38 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
This is not reproducible on vdsm-4.9-75.el6.x86_64.
A MetaDataSealIsBroken exception is issued and StorageDomainDoesNotExist is correctly returned to rhev-m:
Thread-73::WARNING::2011-06-17 06:16:42,736::persistentDict::242::Storage.PersistentDict::(refresh) data seal is broken metadata declares `2da1e24ba793d69596096cbd21066960c28303a` should be `2da1e24ba793d69596096
cbd21066960c28303ae` (lines={'VERSION': '0', 'LEASETIMESEC': '5', 'DESCRIPTION': 'domain 2', 'LOCKPOLICY': '', 'LEASERETRIES': '3', 'SDUUID': '7218f329-b9c1-44e3-a960-964fc89a3aff', 'REMOTE_PATH': 'vm-rhdev1:/srv
/nfs/ruthexp1', 'MASTER_VERSION': '0', 'IOOPTIMEOUTSEC': '1', 'ROLE': 'Regular', 'LOCKRENEWALINTERVALSEC': '5', 'POOL_UUID': 'f5a10a36-525e-403d-8169-2ec82c1b4a56', 'TYPE': 'NFS', 'CLASS': 'Data'})
Thread-73::ERROR::2011-06-17 06:16:42,736::sdc::105::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `7218f329-b9c1-44e3-a960-964fc89a3aff`
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 101, in _findDomain
return mod.findDomain(sdUUID)
File "/usr/share/vdsm/storage/nfsSD.py", line 130, in findDomain
return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
File "/usr/share/vdsm/storage/fileSD.py", line 77, in __init__
sdUUID = metadata[sd.DMDK_SDUUID]
File "/usr/share/vdsm/storage/persistentDict.py", line 63, in __getitem__
return dec(self._dict[key])
File "/usr/share/vdsm/storage/persistentDict.py", line 171, in __getitem__
with self._accessWrapper():
File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/usr/share/vdsm/storage/persistentDict.py", line 125, in _accessWrapper
self.refresh()
File "/usr/share/vdsm/storage/persistentDict.py", line 243, in refresh
raise se.MetaDataSealIsBroken(declaredChecksum, computedChecksum)
MetaDataSealIsBroken: Meta Data seal is broken (checksum mismatch): 'cksum = 2da1e24ba793d69596096cbd21066960c28303a, computed_cksum = 2da1e24ba793d69596096cbd21066960c28303ae'
Thread-73::ERROR::2011-06-17 06:16:42,740::task::865::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/storage/spm.py", line 115, in run
return self.func(*args, **kwargs)
File "/usr/share/vdsm/storage/spm.py", line 1128, in public_attachStorageDomain
hsm.HSM.validateSdUUID(sdUUID)
File "/usr/share/vdsm/storage/hsm.py", line 98, in validateSdUUID
SDF.produce(sdUUID=sdUUID).validate()
File "/usr/share/vdsm/storage/sdf.py", line 30, in produce
newSD = cls.__sdc.lookup(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 83, in lookup
dom = self._findDomain(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 107, in _findDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('7218f329-b9c1-44e3-a960-964fc89a3aff',)
Thread-73::DEBUG::2011-06-17 06:16:42,741::task::492::TaskManager.Task::(_debug) Task 7eb64e10-fe84-4e25-be32-81844757ad79: Task._run: 7eb64e10-fe84-4e25-be32-81844757ad79 ('7218f329-b9c1-44e3-a960-964fc89a3aff',
'f5a10a36-525e-403d-8169-2ec82c1b4a56') {} failed - stopping task
There are no additional "Storage domain does not exist" looping messages that might have been caused by the bug 705058.
Since vdsm is returing the correct error message to rhev-m:
{'status': {'message': "Storage domain does not exist: ('7218f329-b9c1-44e3-a960-964fc89a3aff',)", 'code': 358}}
If we expect the storage domain to be unmounted I suggest to move this bug to the backend.
If this issue is solved in rhev-m, it should take notice of bug 694408, as the returned error code may change. We shouldn't umount, by design. We never did umount. This may also cause issues if it was the master domain. The md file gone corrupt but the lease file is still in use. What about running VMs. |
Created attachment 501798 [details] Backend + vdsm logs Description of problem: If one attempts to attach storage domain which has corrupted metadata(ie. is already in use or wrong checksum), relevant error is shown but domain is kept attached to the host. Tested on NFS with export domain. This leads to that vdsm refreshes the domain: Thread-2915::ERROR::2011-05-30 14:49:16,309::sp::107::Storage.StatsThread::(run) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 104, in run self._domain = SDF.produce(self._sdUUID) File "/usr/share/vdsm/storage/sdf.py", line 32, in produce raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('866d6426-f13a-4cfb-ace5-8ca74a8d477a',) Version-Release number of selected component (if applicable): vdsm-4.9-70.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Have export domain and corrupt it's metadata (I erased last byte of SDUUID) 2. Attach this domain to some data-center 3. Wait till error occurs Actual results: Domain is not umnounted from host Expected results: Domain is unmounted Additional info: vdsm+backend log attached