Description of problem: I believe the method should return as much info as possible for volumes with some sort of corrupted or missing metadata, in order to make it easier to identify it and correct what is wrong. However, in case some key is missing in the storage metadata, the dump() method does not return all known info about the volume. This happens when from_lines() raises the MetaDataKeyNotFoundError exception while reading the metadata from the storage. Exception is thrown when a aprticular key is missing. https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/volumemetadata.py#L123 For example, here the metadata is missing PUUID, while containing everything else (i.e. CAPACITY) 2020-08-21 09:24:04,426+1000 WARN (jsonrpc/3) [storage.StorageDomain] Failed to parse metadata slot 2 offset=8192: Meta Data key not found error: ("key='PUUID' lines=[bytearray(b'CAP=1073741824'), bytearray(b'CTIME=1597965175'), bytearray(b'DESCRIPTION=ytearray(b'DISKTYPE=DATA'), bytearray(b'DOMAIN=82e9c212-e4c5-4560-9ae0-bcb7b7521065'), bytearray(b'FORMAT=COW'), bytearray(b'GEN=0'), bytearray(b'IMAGE=30034319-21b1-40c6-83d3-886bd6f1490e'), bytearray(b'LEGALITY=LEGAL'), bytearray(b'TYPE=SPARSE'), byteab'VOLTYPE=LEAF'), bytearray(b'EOF')]",) (blockSD:1793) But the StorageDomain.dump() does not contain several of the known keys for this volume: "e19689e8-df8f-491b-bd9e-5dde535e440b": { "apparentsize": 1073741824, "image": "30034319-21b1-40c6-83d3-886bd6f1490e", "mdslot": 2, "parent": null, "status": "INVALID", "truesize": 1073741824 } Can we make it return all known key values when the exception is raised? Version-Release number of selected component (if applicable): vdsm-4.40.24-1.gitd177ff577.el8.x86_64 How reproducible: Always Steps to Reproduce: 1. Wipe one meta data key from the storage metadata of the volume 2. Run vdsm-client StorageDomain dump Actual results: Some known metadata keys are missing Expected results: Retrieve all known metadata
VolumeMetadata.from_lines() intentionally raise if metadata is invalid to avoid using a volume with invalid metadata. This make it impossible to use the metadata incorrectly. But when dumping metadata we want to report everything regardless of the status, so we need to make this code more flexible. We can add something like: md = VolumeMetadata.from_lines(lines, allow_invalid=True) To be used during dump, keeping the strict behavior for normal usage. This will not be an easy change since VolumeMetadata.__init__ try hard to reject invalid values. So we need to make it possible to create an instance with invalid metadata.
Created attachment 1792040 [details] metadata-helper Storage domain dump can now display all available metadata regardless of missing or invalid values. There is no simple way of verifying this so I'll attach a simple script that allows rewriting/removing metadata. Please note that the script is using vdsm api so vdsm needs to be installed on the machine where the script will be executed. The following examples should demonstrate how the dump behavior changed and help verify the patch: ================BEFORE FIX======================== Volume with correct metadata: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 191fda25-3953-48f5-9993-1f8f598a21f5 "191fda25-3953-48f5-9993-1f8f598a21f5": { "apparentsize": 1073741824, "capacity": 1073741824, "ctime": 1623926112, "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}", "disktype": "ISOF", "format": "RAW", "generation": 2, "image": "4fd3b55a-c63e-4988-a9a2-f7f1220b6556", "legality": "LEGAL", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 1073741824, "type": "PREALLOCATED", "voltype": "LEAF" }, Corrupt metadata by writing invalid value. The value for 'generation' should be convertible to integer so writing a string will result in invalid value being written to metadata volume: # ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=191fda25-3953-48f5-9993-1f8f598a21f5 --write generation="not_integer" Dump domain info and observe that all other keys that are still valid are missing: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 7 191fda25-3953-48f5-9993-1f8f598a21f5 "191fda25-3953-48f5-9993-1f8f598a21f5": { "apparentsize": 1073741824, "image": "4fd3b55a-c63e-4988-a9a2-f7f1220b6556", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "INVALID", "truesize": 1073741824 }, ================AFTER FIX======================== +++++++++++++CASE 1: missing key+++++++++++++++++ Original metadata: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52 "cbb9e9eb-747e-41e9-9f78-a1db35795e52": { "apparentsize": 1073741824, "capacity": 1073741824, "ctime": 1623930774, "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}", "disktype": "ISOF", "format": "RAW", "generation": 0, "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27", "legality": "ILLEGAL", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 1073741824, "type": "PREALLOCATED", "voltype": "LEAF" }, Removing 'description' key: # ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=cbb9e9eb-747e-41e9-9f78-a1db35795e52 --write description="" Check metadata, status is 'INVALID' and 'description' key is missing, all remaining keys with valid values are present: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52 "cbb9e9eb-747e-41e9-9f78-a1db35795e52": { "apparentsize": 1073741824, "capacity": 1073741824, "ctime": 1623930774, "disktype": "ISOF", "format": "RAW", "generation": 0, "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27", "legality": "ILLEGAL", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "INVALID", "truesize": 1073741824, "type": "PREALLOCATED", "voltype": "LEAF" }, +++++++++++++CASE 2: invalid value+++++++++++++++++ Original metadata: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52 "cbb9e9eb-747e-41e9-9f78-a1db35795e52": { "apparentsize": 1073741824, "capacity": 1073741824, "ctime": 1623930774, "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}", "disktype": "ISOF", "format": "RAW", "generation": 0, "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27", "legality": "ILLEGAL", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 1073741824, "type": "PREALLOCATED", "voltype": "LEAF" }, Write invalid value (same as example above - rewriting 'generation' to string): # ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=cbb9e9eb-747e-41e9-9f78-a1db35795e52 --write generation="not_integer" Check metadata, status is 'INVALID' and 'generation' key is missing, all remaining keys with valid values are present: # vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 15 cbb9e9eb-747e-41e9-9f78-a1db35795e52 "cbb9e9eb-747e-41e9-9f78-a1db35795e52": { "apparentsize": 1073741824, "capacity": 1073741824, "ctime": 1623930774, "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}", "disktype": "ISOF", "format": "RAW", "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27", "legality": "ILLEGAL", "mdslot": 10, "parent": "00000000-0000-0000-0000-000000000000", "status": "INVALID", "truesize": 1073741824, "type": "PREALLOCATED", "voltype": "LEAF" },
Verified successfully. versions: vdsm-4.40.70.5-1.el8ev.x86_64 ovirt-engine-4.4.7.5-0.9.el8ev.noarch Steps to reproduce: Used the script change-metadata.py that was attached by Roman to modify the data. 1) Before the modification: vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": { "apparentsize": 3221225472, "capacity": 3221225472, "ctime": 1624866571, "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "8023bc35-2292-4e81-8487-034934fee8d9", "legality": "LEGAL", "mdslot": 7, "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 3221225472, "type": "PREALLOCATED", "voltype": "LEAF" }, 2) /change-metadata.py --sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c --vol_id=6e821824-8af6-40d8-a5e4-6ffc3f26ebdd --write generation="not_integer" 3) After modification: (generation does not appear) vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": { "apparentsize": 3221225472, "capacity": 3221225472, "ctime": 1624866571, "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}", "disktype": "DATA", "format": "RAW", "image": "8023bc35-2292-4e81-8487-034934fee8d9", "legality": "LEGAL", "mdslot": 7, "parent": "00000000-0000-0000-0000-000000000000", "status": "INVALID", "truesize": 3221225472, "type": "PREALLOCATED", "voltype": "LEAF" }, 4) ./change-metadata.py --sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c --vol_id=6e821824-8af6-40d8-a5e4-6ffc3f26ebdd --write generation=0 5) Modifying to correct value (now generation is back in the data): vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": { "apparentsize": 3221225472, "capacity": 3221225472, "ctime": 1624866571, "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}", "disktype": "DATA", "format": "RAW", "generation": 0, "image": "8023bc35-2292-4e81-8487-034934fee8d9", "legality": "LEGAL", "mdslot": 7, "parent": "00000000-0000-0000-0000-000000000000", "status": "OK", "truesize": 3221225472, "type": "PREALLOCATED", "voltype": "LEAF" },
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.7]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2864