Bug 1870435

Summary: StorageDomain.dump() can return {"key" : None} if metadata is missing
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: vdsmAssignee: Roman Bednář <rbednar>
Status: CLOSED ERRATA QA Contact: Evelina Shames <eshames>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.4.1CC: eshenitz, lsurette, nsoffer, sfishbai, shubha.kulkarni, srevivo, tnisan, ycui
Target Milestone: ovirt-4.4.5   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-14 11:38:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1839444    

Description Germano Veit Michel 2020-08-20 05:24:44 UTC
Description of problem:

See below, keys 'parent' and 'image' are present but have value 'None':

# vdsm-client StorageDomain dump sd_id=82e9c212-e4c5-4560-9ae0-bcb7b7521065
{
....
    "volumes": {
        "2ebcc4f9-5a1e-463e-9ee2-f478509c14a1": {
            "apparentsize": 1073741824,
            "image": null,        <-----
            "mdslot": 3,
            "parent": null,       <-----
            "status": "INVALID",
            "truesize": 1073741824
        },


According to our discussion in https://gerrit.ovirt.org/#/c/109325/ VDSM should not return None.

Metadata:

# lvs -o lv_name,tags | egrep 'LV|2ebcc'
  LV                                   LV Tags                                                                             
  2ebcc4f9-5a1e-463e-9ee2-f478509c14a1 MD_3  

# dd if=/dev/82e9c212-e4c5-4560-9ae0-bcb7b7521065/metadata bs=8k count=1 skip=131
CAP=10737418240
CTIME=1596691868
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=82e9c212-e4c5-4560-9ae0-bcb7b7521065
FORMAT=COW
GEN=0
LEGALITY=LEGAL
PUUID=
TYPE=SPARSE
VOLTYPE=LEAF
EOF

Comment 1 Germano Veit Michel 2020-08-20 05:26:00 UTC
Version was: vdsm-4.40.24-1.gitd177ff577.el8.x86_64

Comment 2 Nir Soffer 2020-08-21 00:17:53 UTC
Did you hit this issue with real storage or it is a result
of modifying good metadata?

Comment 3 Germano Veit Michel 2020-08-21 00:36:03 UTC
(In reply to Nir Soffer from comment #2)
> Did you hit this issue with real storage or it is a result
> of modifying good metadata?

That's me modifying good metadata while testing dump-volume-chains for missing and corrupted keys.

Comment 5 Roman Bednář 2021-01-26 13:17:24 UTC
Reproducer does not seem to obvious, leaving a few notes here for future reference, might come in handy when verifying.

When dumping information about block storage domain vdsm attempts to read parent, image and metadata slot number from
metadata lv first. If some of those are missing it tries to use lvm tags on data lv instead(prefixed with PU_, IU_, MD_).
If both those lookups fail you should see 'null' value where the value is missing.

So to reproduce the metadata lv has to be corrupted first (harsh but works):

#dd if=/dev/random of=/dev/<VG_NAME>/metadata

Then remove some of the tags (parent uuid in this case), overriding lvm filter:

#lvchange --config 'devices/filter=[ "a|.*|" ]' --deltag "PU_00000000-0000-0000-0000-000000000000" <VG_NAME>/<LV_NAME>

Finally dump domain info, you should see parent value of 'null':

#vdsm-client StorageDomain dump sd_id=<VG_NAME>

Example output:
...
    "volumes": {
        "0bf8695a-0d18-44b0-a704-d84c664cf0f7": {
            "apparentsize": 1073741824,
            "image": "13d6169e-37d2-498c-9c3d-b752d7358861",
            "mdslot": 3,
            "parent": null,
            "status": "INVALID",
            "truesize": 1073741824
        },
...

Comment 6 Roman Bednář 2021-02-04 10:22:06 UTC
I found easier and less destructive way of reproducing the issue that can be used to verify the fix. Instead of destroying metadata lv using dd - if new lv is created in a domain/vg vdsm does not know about it which means volume information can't be found in metadata lv neither lv tags of the new lv. The example below shows how this is done with just creating lvm snapshot:

BEFORE FIX:

# lvcreate -s -L128m -n metadata_backup1 1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6/metadata --config='devices/filter = ["a|.*|"]'
  Logical volume "metadata_backup1" created.
[root@host-vm ~]# vdsm-client StorageDomain dump sd_id=1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6
{
    "metadata": {
        "alignment": 1048576,
        "block_size": 512,
        "class": "Data",
        "metadataDevice": "36001405dfd35bbfd15a472089cecec0c",
        "name": "block_iscsi_domain",
        "pool": [
            "4fc1c8aa-6245-11eb-83ae-525400ea2c38"
        ],
        "role": "Master",
        "state": "OK",
        "type": "ISCSI",
        "uuid": "1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6",
        "version": "5",
        "vgMetadataDevice": "36001405dfd35bbfd15a472089cecec0c",
        "vguuid": "7iZgSK-GtaD-ZX8P-H364-S4aq-dqSe-3IWmjF"
    },
    "volumes": {
        "metadata_backup1": {
            "image": null,
            "parent": null,
            "status": "INVALID"
        }
    }
}


AFTER FIX:

#Restart vdsmd after applying the patch:
[root@host-vm ~]# systemctl restart vdsmd

#Missing values (image and parent) now don't show up in volumes dump:
[root@host-vm ~]# vdsm-client StorageDomain dump sd_id=1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6
{
    "metadata": {
        "alignment": 1048576,
        "block_size": 512,
        "class": "Data",
        "metadataDevice": "36001405dfd35bbfd15a472089cecec0c",
        "name": "block_iscsi_domain",
        "pool": [
            "4fc1c8aa-6245-11eb-83ae-525400ea2c38"
        ],
        "role": "Master",
        "state": "OK",
        "type": "ISCSI",
        "uuid": "1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6",
        "version": "5",
        "vgMetadataDevice": "36001405dfd35bbfd15a472089cecec0c",
        "vguuid": "7iZgSK-GtaD-ZX8P-H364-S4aq-dqSe-3IWmjF"
    },
    "volumes": {
        "metadata_backup1": {
            "status": "INVALID"
        }
    }
}

Comment 7 Roman Bednář 2021-02-22 11:34:40 UTC
Patches have been merged to master, should be available in next vdsm build: 4.40.50.7

Comment 13 Evelina Shames 2021-03-03 12:05:34 UTC
(In reply to Roman Bednář from comment #6)
> I found easier and less destructive way of reproducing the issue that can be
> used to verify the fix. Instead of destroying metadata lv using dd - if new
> lv is created in a domain/vg vdsm does not know about it which means volume
> information can't be found in metadata lv neither lv tags of the new lv. The
> example below shows how this is done with just creating lvm snapshot:
> 
> BEFORE FIX:
> 
> # lvcreate -s -L128m -n metadata_backup1
> 1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6/metadata --config='devices/filter =
> ["a|.*|"]'
>   Logical volume "metadata_backup1" created.
> [root@host-vm ~]# vdsm-client StorageDomain dump
> sd_id=1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6
> {
>     "metadata": {
>         "alignment": 1048576,
>         "block_size": 512,
>         "class": "Data",
>         "metadataDevice": "36001405dfd35bbfd15a472089cecec0c",
>         "name": "block_iscsi_domain",
>         "pool": [
>             "4fc1c8aa-6245-11eb-83ae-525400ea2c38"
>         ],
>         "role": "Master",
>         "state": "OK",
>         "type": "ISCSI",
>         "uuid": "1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6",
>         "version": "5",
>         "vgMetadataDevice": "36001405dfd35bbfd15a472089cecec0c",
>         "vguuid": "7iZgSK-GtaD-ZX8P-H364-S4aq-dqSe-3IWmjF"
>     },
>     "volumes": {
>         "metadata_backup1": {
>             "image": null,
>             "parent": null,
>             "status": "INVALID"
>         }
>     }
> }
> 
> 
> AFTER FIX:
> 
> #Restart vdsmd after applying the patch:
> [root@host-vm ~]# systemctl restart vdsmd
> 
> #Missing values (image and parent) now don't show up in volumes dump:
> [root@host-vm ~]# vdsm-client StorageDomain dump
> sd_id=1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6
> {
>     "metadata": {
>         "alignment": 1048576,
>         "block_size": 512,
>         "class": "Data",
>         "metadataDevice": "36001405dfd35bbfd15a472089cecec0c",
>         "name": "block_iscsi_domain",
>         "pool": [
>             "4fc1c8aa-6245-11eb-83ae-525400ea2c38"
>         ],
>         "role": "Master",
>         "state": "OK",
>         "type": "ISCSI",
>         "uuid": "1e475c8a-8d1b-4eb6-ba2a-a95b26d568d6",
>         "version": "5",
>         "vgMetadataDevice": "36001405dfd35bbfd15a472089cecec0c",
>         "vguuid": "7iZgSK-GtaD-ZX8P-H364-S4aq-dqSe-3IWmjF"
>     },
>     "volumes": {
>         "metadata_backup1": {
>             "status": "INVALID"
>         }
>     }
> }

Verified on vdsm-4.40.50.7-1.el8ev.x86_64 with the above steps.

Moving to 'VERIFIED'.

Comment 18 errata-xmlrpc 2021-04-14 11:38:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV RHEL Host (ovirt-host) 4.4.z [ovirt-4.4.5] security, bug fix, enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1184