Bug 1870887

Summary: StorageDomain.dump() missing several keys for volume if one key is missing.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: vdsmAssignee: Roman Bednář <rbednar>
Status: CLOSED ERRATA QA Contact: sshmulev
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.1CC: eshenitz, lsurette, sfishbai, srevivo, tnisan, ycui
Target Milestone: ovirt-4.4.7   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-22 15:08:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1839444    
Attachments:
Description Flags
metadata-helper none

Description Germano Veit Michel 2020-08-20 23:38:11 UTC
Description of problem:

I believe the method should return as much info as possible for volumes with some sort of corrupted or missing metadata, in order to make it easier to identify it and correct what is wrong.

However, in case some key is missing in the storage metadata, the dump() method does not return all known info about the volume. This happens when from_lines() raises the MetaDataKeyNotFoundError exception while reading the metadata from the storage. Exception is thrown when a aprticular key is missing.

https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/volumemetadata.py#L123

For example, here the metadata is missing PUUID, while containing everything else (i.e. CAPACITY)

2020-08-21 09:24:04,426+1000 WARN  (jsonrpc/3) [storage.StorageDomain] Failed to parse metadata slot 2 offset=8192: Meta Data key not found error: ("key='PUUID' lines=[bytearray(b'CAP=1073741824'), bytearray(b'CTIME=1597965175'), bytearray(b'DESCRIPTION=ytearray(b'DISKTYPE=DATA'), bytearray(b'DOMAIN=82e9c212-e4c5-4560-9ae0-bcb7b7521065'), bytearray(b'FORMAT=COW'), bytearray(b'GEN=0'), bytearray(b'IMAGE=30034319-21b1-40c6-83d3-886bd6f1490e'), bytearray(b'LEGALITY=LEGAL'), bytearray(b'TYPE=SPARSE'), byteab'VOLTYPE=LEAF'), bytearray(b'EOF')]",) (blockSD:1793)

But the StorageDomain.dump() does not contain several of the known keys for this volume:

        "e19689e8-df8f-491b-bd9e-5dde535e440b": {
            "apparentsize": 1073741824,
            "image": "30034319-21b1-40c6-83d3-886bd6f1490e",
            "mdslot": 2,
            "parent": null,
            "status": "INVALID",
            "truesize": 1073741824
        }

Can we make it return all known key values when the exception is raised?

Version-Release number of selected component (if applicable):
vdsm-4.40.24-1.gitd177ff577.el8.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Wipe one meta data key from the storage metadata of the volume
2. Run vdsm-client StorageDomain dump

Actual results:
Some known metadata keys are missing

Expected results:
Retrieve all known metadata

Comment 1 Nir Soffer 2020-08-21 00:16:12 UTC
VolumeMetadata.from_lines() intentionally raise if metadata
is invalid to avoid using a volume with invalid metadata. This make
it impossible to use the metadata incorrectly.

But when dumping metadata we want to report everything regardless
of the status, so we need to make this code more flexible.

We can add something like:

   md = VolumeMetadata.from_lines(lines, allow_invalid=True)

To be used during dump, keeping the strict behavior for normal usage.

This will not be an easy change since VolumeMetadata.__init__ try hard
to reject invalid values. So we need to make it possible to create an
instance with invalid metadata.

Comment 3 Roman Bednář 2021-06-18 11:59:47 UTC
Created attachment 1792040 [details]
metadata-helper

Storage domain dump can now display all available metadata regardless of missing or invalid values.

There is no simple way of verifying this so I'll attach a simple script that allows rewriting/removing metadata.
Please note that the script is using vdsm api so vdsm needs to be installed on the machine where the script will be executed.

The following examples should demonstrate how the dump behavior changed and help verify the patch:

================BEFORE FIX========================

Volume with correct metadata:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 191fda25-3953-48f5-9993-1f8f598a21f5
        "191fda25-3953-48f5-9993-1f8f598a21f5": {
            "apparentsize": 1073741824,
            "capacity": 1073741824,
            "ctime": 1623926112,
            "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}",
            "disktype": "ISOF",
            "format": "RAW",
            "generation": 2,
            "image": "4fd3b55a-c63e-4988-a9a2-f7f1220b6556",
            "legality": "LEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 1073741824,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

Corrupt metadata by writing invalid value. The value for 'generation' should be convertible to integer so writing a string will result in invalid value being written to metadata volume:

# ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=191fda25-3953-48f5-9993-1f8f598a21f5 --write generation="not_integer"

Dump domain info and observe that all other keys that are still valid are missing:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 7 191fda25-3953-48f5-9993-1f8f598a21f5
        "191fda25-3953-48f5-9993-1f8f598a21f5": {
            "apparentsize": 1073741824,
            "image": "4fd3b55a-c63e-4988-a9a2-f7f1220b6556",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "INVALID",
            "truesize": 1073741824
        },

================AFTER FIX========================

+++++++++++++CASE 1: missing key+++++++++++++++++

Original metadata:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52
        "cbb9e9eb-747e-41e9-9f78-a1db35795e52": {
            "apparentsize": 1073741824,
            "capacity": 1073741824,
            "ctime": 1623930774,
            "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}",
            "disktype": "ISOF",
            "format": "RAW",
            "generation": 0,
            "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27",
            "legality": "ILLEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 1073741824,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

Removing 'description' key:

# ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=cbb9e9eb-747e-41e9-9f78-a1db35795e52 --write description=""

Check metadata, status is 'INVALID' and 'description' key is missing, all remaining keys with valid values are present:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52
        "cbb9e9eb-747e-41e9-9f78-a1db35795e52": {
            "apparentsize": 1073741824,
            "capacity": 1073741824,
            "ctime": 1623930774,
            "disktype": "ISOF",
            "format": "RAW",
            "generation": 0,
            "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27",
            "legality": "ILLEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "INVALID",
            "truesize": 1073741824,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },


+++++++++++++CASE 2: invalid value+++++++++++++++++

Original metadata:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 16 cbb9e9eb-747e-41e9-9f78-a1db35795e52
        "cbb9e9eb-747e-41e9-9f78-a1db35795e52": {
            "apparentsize": 1073741824,
            "capacity": 1073741824,
            "ctime": 1623930774,
            "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}",
            "disktype": "ISOF",
            "format": "RAW",
            "generation": 0,
            "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27",
            "legality": "ILLEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 1073741824,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

Write invalid value (same as example above - rewriting 'generation' to string):

# ./change-metadata.py --sd_id=40b0c8e1-e646-4e56-8420-827252743f93 --vol_id=cbb9e9eb-747e-41e9-9f78-a1db35795e52 --write generation="not_integer"

Check metadata, status is 'INVALID' and 'generation' key is missing, all remaining keys with valid values are present:

# vdsm-client StorageDomain dump sd_id=40b0c8e1-e646-4e56-8420-827252743f93 | grep -A 15 cbb9e9eb-747e-41e9-9f78-a1db35795e52
        "cbb9e9eb-747e-41e9-9f78-a1db35795e52": {
            "apparentsize": 1073741824,
            "capacity": 1073741824,
            "ctime": 1623930774,
            "description": "{\"DiskAlias\":\"test2.raw\",\"DiskDescription\":\"test2.raw\"}",
            "disktype": "ISOF",
            "format": "RAW",
            "image": "2732d8fc-6f1c-4e40-b92c-731a53ad1b27",
            "legality": "ILLEGAL",
            "mdslot": 10,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "INVALID",
            "truesize": 1073741824,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

Comment 7 sshmulev 2021-06-28 08:15:35 UTC
Verified successfully.

versions:
vdsm-4.40.70.5-1.el8ev.x86_64
ovirt-engine-4.4.7.5-0.9.el8ev.noarch

Steps to reproduce:
Used the script change-metadata.py that was attached by Roman to modify the data.

1) Before the modification:
vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd
        "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": {
            "apparentsize": 3221225472,
            "capacity": 3221225472,
            "ctime": 1624866571,
            "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}",
            "disktype": "DATA",
            "format": "RAW",
            "generation": 0,
            "image": "8023bc35-2292-4e81-8487-034934fee8d9",
            "legality": "LEGAL",
            "mdslot": 7,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 3221225472,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

2) /change-metadata.py --sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c --vol_id=6e821824-8af6-40d8-a5e4-6ffc3f26ebdd --write generation="not_integer"

3) After modification: (generation does not appear)
vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd
        "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": {
            "apparentsize": 3221225472,
            "capacity": 3221225472,
            "ctime": 1624866571,
            "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}",
            "disktype": "DATA",
            "format": "RAW",
            "image": "8023bc35-2292-4e81-8487-034934fee8d9",
            "legality": "LEGAL",
            "mdslot": 7,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "INVALID",
            "truesize": 3221225472,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

4) ./change-metadata.py --sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c --vol_id=6e821824-8af6-40d8-a5e4-6ffc3f26ebdd --write generation=0

5) Modifying to correct value (now generation is back in the data):
vdsm-client StorageDomain dump sd_id=98308030-5d49-4dc4-a7ed-c5f61bf0b08c | grep -A 16 6e821824-8af6-40d8-a5e4-6ffc3f26ebdd
        "6e821824-8af6-40d8-a5e4-6ffc3f26ebdd": {
            "apparentsize": 3221225472,
            "capacity": 3221225472,
            "ctime": 1624866571,
            "description": "{\"DiskAlias\":\"vm_Bug870887_Disk1\",\"DiskDescription\":\"\"}",
            "disktype": "DATA",
            "format": "RAW",
            "generation": 0,
            "image": "8023bc35-2292-4e81-8487-034934fee8d9",
            "legality": "LEGAL",
            "mdslot": 7,
            "parent": "00000000-0000-0000-0000-000000000000",
            "status": "OK",
            "truesize": 3221225472,
            "type": "PREALLOCATED",
            "voltype": "LEAF"
        },

Comment 12 errata-xmlrpc 2021-07-22 15:08:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.7]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2864