+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1446492 +++ ====================================================================== Description of problem: If the customer's LVM was manually restored without --metadatacopies switch, then it will fail to activate in RHV 4.1 as we are having new check as a part of StorageDomain.getInfo flow ( https://gerrit.ovirt.org/#/c/64433/ ) where vdsm reports PV which is having active LVM metadata to engine. So it will fail with below error as metadatacopies by default in lvm is 1. ===== 2017-04-28 01:55:51,276+0530 ERROR (jsonrpc/1) [storage.StoragePool] Couldn't read from master domain (sp:1393) Traceback (most recent call last): File "/usr/share/vdsm/storage/sp.py", line 1391, in getInfo msdInfo = self.masterDomain.getInfo() File "/usr/share/vdsm/storage/blockSD.py", line 1243, in getInfo info['vgMetadataDevice'] = self._manifest.getVgMetadataDevice() File "/usr/share/vdsm/storage/blockSD.py", line 487, in getVgMetadataDevice return os.path.basename(lvm.getVgMetadataPv(self.sdUUID)) File "/usr/share/vdsm/storage/lvm.py", line 1430, in getVgMetadataPv (vgName, pvs)) UnexpectedVolumeGroupMetadata: Volume Group metadata isn't as expected: "reason=Expected one metadata pv in vg: 3cb67522-1df1-47e3-8e85-2c116e500590, vg pvs: [PV(uuid='s8jUp5-qetw-bvg9-3pyY-YWfp-qGGb-K4NyIQ', name='/dev/mapper/360014054ae1fdf75b074e488e9d803bd', size='12482248704', vg_name='3cb67522-1df1-47e3-8e85-2c116e500590', vg_uuid='vCWEue-hbwr-KL0X-D214-gUoQ-RCex-LtkGJA', pe_start='135266304', pe_count='93', pe_alloc_count='0', mda_count='1', dev_size='12884901888', mda_used_count='1', guid='360014054ae1fdf75b074e488e9d803bd'), PV(uuid='z3YlYS-sSfq-cRQF-Qb23-IJbh-LxAR-AmXuqJ', name='/dev/mapper/360014053f404fa44d844d9198cfee437', size='52210696192', vg_name='3cb67522-1df1-47e3-8e85-2c116e500590', vg_uuid='vCWEue-hbwr-KL0X-D214-gUoQ-RCex-LtkGJA', pe_start='135266304', pe_count='389', pe_alloc_count='47', mda_count='1', dev_size='52613349376', mda_used_count='1', guid='360014053f404fa44d844d9198cfee437')]" Relevant code. def getVgMetadataPv(vgName): pvs = _lvminfo.getPvs(vgName) mdpvs = [pv for pv in pvs if not isinstance(pv, Stub) and _isMetadataPv(pv)] if len(mdpvs) != 1: raise se.UnexpectedVolumeGroupMetadata("Expected one metadata pv in " "vg: %s, vg pvs: %s" % (vgName, pvs)) return mdpvs[0].name def _isMetadataPv(pv): return pv.mda_used_count == '2' ==== As we can't increase the number of metadata copies after creating PV, then the only option will be to restore the metadata again with correct options which needs a complete downtime of the VMs. Also vdsm is only expecting 1 PV with active metadata and others should be disabled with pvchange --metadataignore y or else it will fail with same error as above even if it's created with 2 metadata copies. Version-Release number of selected component (if applicable): vdsm-4.19.10.1-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: Restore the LVM metadata manually without --metadatacopies Actual results: Storage domain will go offline if LVM metadata was restored manually Expected results: Don't fail the storage domain if mda_used_count is 1 for the PV. Additional info: (Originally by Nijin Ashok)
(In reply to nijin ashok from comment #0) > Description of problem: > > If the customer's LVM was manually restored without --metadatacopies switch, > then it will fail to activate in RHV 4.1 as we are having new check as a > part of StorageDomain.getInfo flow ( https://gerrit.ovirt.org/#/c/64433/ ) > where vdsm reports PV which is having active LVM metadata to engine. This is expected, RHV supports only RHV storage domain format, and this storage domain is in the correct format. > As we can't increase the number of metadata copies after creating PV, then > the only option will be to restore the metadata again with correct options > which needs a complete downtime of the VMs. Yes, this is what should be done, recreate the vg with the correct options. > Also vdsm is only expecting 1 PV with active metadata and others should be > disabled with pvchange --metadataignore y or else it will fail with same > error as above even if it's created with 2 metadata copies. Right, this is also part of the format. > Expected results: > > Don't fail the storage domain if mda_used_count is 1 for the PV. I don't think this expectation is feasible. We never tested this format, and we cannot guarantee that anything will work with such storage domain. The best thing we can do is to fail to activate this storage domain. The new checks are required for the removing pvs from storage domain (new feature in 4.1). We can consider not failing to activate such storage domain, and disable this feature, but I don't see how we can support such system. Liron, can engine treat the new metadata keys as optional, and disable removing pvs (with a warning) if the info is not available? (Originally by Nir Soffer)
(In reply to Nir Soffer from comment #6) > Liron, can engine treat the new metadata keys as optional, and disable > removing pvs (with a warning) if the info is not available? A quick review of the engine's code seems to show that this is indeed the behavior. I think we're OK from the engine's side. (Originally by Allon Mureinik)
I reproduced this issue by modifying vdsm to not use the --metadatacopies and --metadaignore options when creating a storage domain, so it creates an invalid storage domain with the same configuration as badly restored storage domain. This is how vdsm log looks now when we find such storage domain: 1. Starting getStorageDomainInfo requets 2017-05-08 15:24:56,054+0300 INFO (jsonrpc/5) [dispatcher] Run and protect: getStorageDomainInfo(sdUUID=u'e529906a-f20f-4ad7-99e6-20242678a58e', options=None) (logUtils:51) 2. Warning about unsupported storage domain: 2930 2017-05-08 15:24:56,258+0300 WARN (jsonrpc/5) [storage.StorageDomain] Cannot get metadata device, this storage domain is unsupported: Volume Group metadata isn't as expected: u"reason= Expected one metadata pv in vg: e529906a-f20f-4ad7-99e6-20242678a58e, vg pvs: [PV(uuid='CTcOe0-uqQJ-c3lk-02SV-PYGf-cPvS-fPBjZ8', name='/dev/mapper/360014052e489b6f5ed34881ac5ed27fd', si ze='53418655744', vg_name='e529906a-f20f-4ad7-99e6-20242678a58e', vg_uuid='CXMs11-xiGr-JzAU-5PtP-PW1j-bkjO-2jKlgc', pe_start='142606336', pe_count='398', pe_alloc_count='39', mda_count= '1', dev_size='53687091200', mda_used_count='1', guid='360014052e489b6f5ed34881ac5ed27fd'), PV(uuid='hGmFCu-i6Hm-LylO-EFLX-I5p2-hxWW-GyoTfZ', name='/dev/mapper/360014052de8e2b8007944a1a 93a82c40', size='53418655744', vg_name='e529906a-f20f-4ad7-99e6-20242678a58e', vg_uuid='CXMs11-xiGr-JzAU-5PtP-PW1j-bkjO-2jKlgc', pe_start='142606336', pe_count='398', pe_alloc_count='0' , mda_count='1', dev_size='53687091200', mda_used_count='1', guid='360014052de8e2b8007944a1a93a82c40'), PV(uuid='bWau5Z-C5Vb-Vt0G-Wjwp-2gYP-8X6m-Unocp3', name='/dev/mapper/36001405bfc49 549313946daa720cb1ba', size='53418655744', vg_name='e529906a-f20f-4ad7-99e6-20242678a58e', vg_uuid='CXMs11-xiGr-JzAU-5PtP-PW1j-bkjO-2jKlgc', pe_start='142606336', pe_count='398', pe_all oc_count='0', mda_count='1', dev_size='53687091200', mda_used_count='1', guid='36001405bfc49549313946daa720cb1ba')]" (blockSD:1243) 3. Returning response without vgMetadataDevice key: 2931 2017-05-08 15:24:56,258+0300 INFO (jsonrpc/5) [dispatcher] Run and protect: getStorageDomainInfo, Return response: {'info': {'uuid': u'e529906a-f20f-4ad7-99e6-20242678a58e', 'vguuid': 'CXMs11-xiGr-JzAU-5PtP-PW1j-bkjO-2jKlgc', 'metadataDevice': '360014052e489b6f5ed34881ac5ed27fd', 'state': 'OK', 'version': '4', 'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'poo l': [], 'name': 'bad-sd'}} (logUtils:54) (Originally by Nir Soffer)
Testing this change: 1. Create an invalid storage domain with one used metadata area on every PV 2. Run StorageDomain getInfo: # vdsm-client StorageDomain getInfo storagedomainID=e529906a-f20f-4ad7-99e6-20242678a58e { "uuid": "e529906a-f20f-4ad7-99e6-20242678a58e", "vguuid": "CXMs11-xiGr-JzAU-5PtP-PW1j-bkjO-2jKlgc", "metadataDevice": "360014052e489b6f5ed34881ac5ed27fd", "state": "OK", "version": "4", "role": "Regular", "type": "ISCSI", "class": "Data", "pool": [ "6c99f4e5-8588-46f5-a818-e11151c1d19c" ], "name": "bad-sd" } The call should succeed, not returning the vgMetadataDevice key. Here is the same request for a good sd: # vdsm-client StorageDomain getInfo storagedomainID=aed577ea-d1ca-4ebe-af80-f852c7ce59bb { "uuid": "aed577ea-d1ca-4ebe-af80-f852c7ce59bb", "type": "ISCSI", "vguuid": "7T9sFi-okfz-JZON-xDUK-n0vH-OpyH-L7IjKO", "metadataDevice": "360014052761af2654a94a70a60a7ee3f", "state": "OK", "version": "4", "role": "Master", "vgMetadataDevice": "360014052761af2654a94a70a60a7ee3f", "class": "Data", "pool": [ "6c99f4e5-8588-46f5-a818-e11151c1d19c" ], "name": "dumbo-iscsi-01" } (Originally by Nir Soffer)
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found clone flags: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ] For more info please contact: rhv-devops (Originally by rhev-integ)
Tal, this bug should be ready for QA, but the bot is complaning about the flags, see comment 16. Can you help with this? (Originally by Nir Soffer)
Verified with the following code: ------------------------------------------ ovirt-engine-4.1.2.2-0.1.el7.noarch rhevm-4.1.2.2-0.1.el7.noarch vdsm-4.19.14-1.el7ev.x86_64 Verified with the following scenario: ----------------------------------------- 1. Created an invalid storage domain with one used metadata area on every PV 2. Created a VM on the storage domain 3. Set the storage domain to maintenance 4. Activated the storage domain 5. Started the vm previously created 6. Created a new vm pon the storage domain Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1281