Bug 1713724 - When a storage domain is updated to V5 during a DC upgrade, if there are volumes with metadata that has been reset then the upgrade fails
Summary: When a storage domain is updated to V5 during a DC upgrade, if there are volu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.4.0
: ---
Assignee: Nir Soffer
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks: 1714154
TreeView+ depends on / blocked
 
Reported: 2019-05-24 15:25 UTC by Gordon Watson
Modified: 2020-09-30 08:28 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, converting a storage domain to the V5 format failed when, following an unsuccessful delete volume operation, partly-deleted volumes with cleared metadata remained in the storage domain. The current release fixes this issue. Converting a storage domain succeeds even when partly-deleted volumes with cleared metadata remain in the storage domain.
Clone Of:
: 1714154 (view as bug list)
Environment:
Last Closed: 2020-08-04 13:27:06 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
izuckerm: testing_plan_complete+


Attachments (Terms of Use)
script for analyzing metadata area (632 bytes, text/plain)
2019-05-28 15:55 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4172251 0 None None Updating the Data Center compatibility level to 4.3 can result in a non-responsive DC and no SPM 2019-05-24 21:21:08 UTC
Red Hat Product Errata RHEA-2020:3246 0 None None None 2020-08-04 13:27:38 UTC
oVirt gerrit 100286 0 'None' MERGED upgrade: Skip volumes with cleared metadata 2021-02-21 00:58:53 UTC
oVirt gerrit 100287 0 'None' MERGED tests: Test converting volume with cleared metadata 2021-02-21 00:58:53 UTC
oVirt gerrit 100324 0 'None' MERGED storage.exception: Fix cleared metadata regressions 2021-02-21 00:58:53 UTC
oVirt gerrit 100325 0 'None' MERGED upgrade: Handle volumes with invalid metadata 2021-02-21 00:58:53 UTC
oVirt gerrit 100326 0 'None' MERGED blockVolume: Don't remove metadata during delete 2021-02-21 00:58:53 UTC

Description Gordon Watson 2019-05-24 15:25:59 UTC
Description of problem:

If a strorage domain has volume metadata that has been reset (contains "NONE=######...."), then the attempt to upgrade the DC will fail. This appears to leave the DC in a non-responsive state, with no SPM, and no new VMs can be started, etc.

VDSM reports "MetaDataKeyNotFoundError" for each volume metadata area that is in this state.

The volume metadata can perhaps be in this state as a result of a failed live merge at some time in the past, where the volume should have been removed, the metadata was reset, but the removal failed.


Version-Release number of selected component (if applicable):

RHV 4.3.3


How reproducible:

I assume 100%, but I am going to try to reproduce this myself.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Nir Soffer 2019-05-25 01:25:13 UTC
Gordon, can you attach:
- lvm commands output from sosreport?
- copy of the first MiB of the metadata LV from a domain with this issue.

The attached patch should fix this issue, but it is not tested yet. I can test
in on Sunday, but we can save time if you can test it during the weekend.

Comment 9 Nir Soffer 2019-05-25 11:54:40 UTC
## Testing the fix

1. Create a DC/cluster with compatibility version 4.2
2. Create new iSCSI/FC V4 storage domain
3. Add some floating disks
4. Clear the metadata of one or more disks (see "How to clear volume metadata bellow")
5. Upgrade the cluster and DC to version 4.3

Expected result:
- Storage domain upgraded to V5 successfully
- Warning about volume with cleared metadata logged for the volume with this issue

Should be tested with with 4.3.3, reproducing this issue, and then with
a build including this fix, verifying that upgrading storage domain to V5 succeeds.


## How to clear volume metadata

This is not possible since 4.2.5, and impossible to reproduce to older version,
so we have to clear the metadata manually:

1. Find the volume metadata slot using:

  lvs -o vg_name,lv_name,tags -> MD_42

2. Write cleared metadata to the volume metadata area

    import os

    with open("/dev/vg-name/metadata, "rb+") as f:
        f.seek(42 * 512)
        f.write(b"NONE=" + (b"#" * 502) + b"\nEOF\n")
        os.fsync(f.fileno())

Comment 10 Nir Soffer 2019-05-25 11:56:44 UTC
Trying to target to 4.3.4, since this issue breaks upgrades to 4.3, and does not
have an easy workaround.

Comment 17 Avihai 2019-05-27 06:16:36 UTC
Also test installing the temp fix vdsm-4.30.16-2.gitfb7cdef.el7.x86_64 before upgrading the CD to 4.3 and at looks well.
SD/DC upgraded to V5/4.3 without issues and all hosts are up.

Comment 18 Nir Soffer 2019-05-27 21:27:06 UTC
Turns out we had the same issue 6.5 years ago, see bug 902838.

Comment 19 Nir Soffer 2019-05-27 21:56:20 UTC
Cleaning after bad gerrit script, adding unrelated patches.

Comment 20 Nir Soffer 2019-05-27 22:14:11 UTC
Cleaning again after bad gerrit hook.

Comment 21 Nir Soffer 2019-05-28 15:55:07 UTC
Looking in the metadata lv content from attachment 1573377 [details], we see 4 volumes
with cleared metadata out of 53 volumes.

volumes: 53

slot 10 is cleared
    lv: 793df8d3-6853-4408-afa0-abe414f652ee
  size: 35.00g
  tags: IU_da733dfb-491b-4d59-a2df-b2008e36de16,MD_10,PU_00000000-0000-0000-0000-000000000000

slot 11 is cleared
    lv: 772c45c4-35b9-4133-b603-4eee1c33208d
  size: 35.00g
  tags: IU_aaf92ffa-2344-4617-be60-e7ecf5baf2e8,MD_11,PU_00000000-0000-0000-0000-000000000000

slot 12 is cleared
    lv: bdc029ac-c858-418c-ae90-072bc8f1ddaf
  size: 35.00g
  tags: IU_e7c4f50c-465e-476d-9360-762f2fea73d9,MD_12,PU_00000000-0000-0000-0000-000000000000

slot 14 is cleared
    lv: a25e9312-71a3-4d06-8890-5034c56fd895
  size: 35.00g
  tags: IU_8683a8e2-2af2-4ae2-8c2d-8fd2aa7d4cdf,MD_14,PU_00000000-0000-0000-0000-000000000000

Comment 22 Nir Soffer 2019-05-28 15:55:55 UTC
Created attachment 1574397 [details]
script for analyzing metadata area

Comment 25 Daniel Gur 2019-08-28 13:13:37 UTC
sync2jira

Comment 26 Daniel Gur 2019-08-28 13:17:50 UTC
sync2jira

Comment 27 RHV bug bot 2019-10-22 17:25:35 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 28 RHV bug bot 2019-10-22 17:39:12 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 29 RHV bug bot 2019-10-22 17:46:26 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 30 RHV bug bot 2019-10-22 18:02:15 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 31 RHV bug bot 2019-11-19 11:52:42 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 32 RHV bug bot 2019-11-19 12:02:45 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 33 RHV bug bot 2019-12-13 13:17:17 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 34 RHV bug bot 2019-12-20 17:46:30 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 35 Evelina Shames 2020-01-02 14:55:28 UTC
Verified (steps in comment #9) on engine-4.4.0-0.13.master.el7

Comment 36 RHV bug bot 2020-01-08 14:49:53 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 37 RHV bug bot 2020-01-08 15:19:12 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 38 RHV bug bot 2020-01-24 19:52:20 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 46 errata-xmlrpc 2020-08-04 13:27:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246


Note You need to log in before you can comment on or make changes to this bug.