Hide Forgot
Created attachment 501798 [details] Backend + vdsm logs Description of problem: If one attempts to attach storage domain which has corrupted metadata(ie. is already in use or wrong checksum), relevant error is shown but domain is kept attached to the host. Tested on NFS with export domain. This leads to that vdsm refreshes the domain: Thread-2915::ERROR::2011-05-30 14:49:16,309::sp::107::Storage.StatsThread::(run) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 104, in run self._domain = SDF.produce(self._sdUUID) File "/usr/share/vdsm/storage/sdf.py", line 32, in produce raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('866d6426-f13a-4cfb-ace5-8ca74a8d477a',) Version-Release number of selected component (if applicable): vdsm-4.9-70.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Have export domain and corrupt it's metadata (I erased last byte of SDUUID) 2. Attach this domain to some data-center 3. Wait till error occurs Actual results: Domain is not umnounted from host Expected results: Domain is unmounted Additional info: vdsm+backend log attached
This is not reproducible on vdsm-4.9-75.el6.x86_64. A MetaDataSealIsBroken exception is issued and StorageDomainDoesNotExist is correctly returned to rhev-m: Thread-73::WARNING::2011-06-17 06:16:42,736::persistentDict::242::Storage.PersistentDict::(refresh) data seal is broken metadata declares `2da1e24ba793d69596096cbd21066960c28303a` should be `2da1e24ba793d69596096 cbd21066960c28303ae` (lines={'VERSION': '0', 'LEASETIMESEC': '5', 'DESCRIPTION': 'domain 2', 'LOCKPOLICY': '', 'LEASERETRIES': '3', 'SDUUID': '7218f329-b9c1-44e3-a960-964fc89a3aff', 'REMOTE_PATH': 'vm-rhdev1:/srv /nfs/ruthexp1', 'MASTER_VERSION': '0', 'IOOPTIMEOUTSEC': '1', 'ROLE': 'Regular', 'LOCKRENEWALINTERVALSEC': '5', 'POOL_UUID': 'f5a10a36-525e-403d-8169-2ec82c1b4a56', 'TYPE': 'NFS', 'CLASS': 'Data'}) Thread-73::ERROR::2011-06-17 06:16:42,736::sdc::105::Storage.StorageDomainCache::(_findDomain) Error while looking for domain `7218f329-b9c1-44e3-a960-964fc89a3aff` Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 101, in _findDomain return mod.findDomain(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 130, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/fileSD.py", line 77, in __init__ sdUUID = metadata[sd.DMDK_SDUUID] File "/usr/share/vdsm/storage/persistentDict.py", line 63, in __getitem__ return dec(self._dict[key]) File "/usr/share/vdsm/storage/persistentDict.py", line 171, in __getitem__ with self._accessWrapper(): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/persistentDict.py", line 125, in _accessWrapper self.refresh() File "/usr/share/vdsm/storage/persistentDict.py", line 243, in refresh raise se.MetaDataSealIsBroken(declaredChecksum, computedChecksum) MetaDataSealIsBroken: Meta Data seal is broken (checksum mismatch): 'cksum = 2da1e24ba793d69596096cbd21066960c28303a, computed_cksum = 2da1e24ba793d69596096cbd21066960c28303ae' Thread-73::ERROR::2011-06-17 06:16:42,740::task::865::TaskManager.Task::(_setError) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/spm.py", line 115, in run return self.func(*args, **kwargs) File "/usr/share/vdsm/storage/spm.py", line 1128, in public_attachStorageDomain hsm.HSM.validateSdUUID(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 98, in validateSdUUID SDF.produce(sdUUID=sdUUID).validate() File "/usr/share/vdsm/storage/sdf.py", line 30, in produce newSD = cls.__sdc.lookup(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 83, in lookup dom = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 107, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('7218f329-b9c1-44e3-a960-964fc89a3aff',) Thread-73::DEBUG::2011-06-17 06:16:42,741::task::492::TaskManager.Task::(_debug) Task 7eb64e10-fe84-4e25-be32-81844757ad79: Task._run: 7eb64e10-fe84-4e25-be32-81844757ad79 ('7218f329-b9c1-44e3-a960-964fc89a3aff', 'f5a10a36-525e-403d-8169-2ec82c1b4a56') {} failed - stopping task There are no additional "Storage domain does not exist" looping messages that might have been caused by the bug 705058. Since vdsm is returing the correct error message to rhev-m: {'status': {'message': "Storage domain does not exist: ('7218f329-b9c1-44e3-a960-964fc89a3aff',)", 'code': 358}} If we expect the storage domain to be unmounted I suggest to move this bug to the backend.
If this issue is solved in rhev-m, it should take notice of bug 694408, as the returned error code may change.
We shouldn't umount, by design. We never did umount. This may also cause issues if it was the master domain. The md file gone corrupt but the lease file is still in use. What about running VMs.