Bug 1852344
| Summary: | [Scale] Using --select for lvm cache reloads induces transient VG metadata corruption | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Amit Bawer <abawer> |
| Component: | vdsm | Assignee: | Amit Bawer <abawer> |
| Status: | CLOSED ERRATA | QA Contact: | David Vaanunu <dvaanunu> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.3.11 | CC: | bugs, lleistne, lsurette, michal.skrivanek, nsoffer, pelauter, sfishbai, srevivo, tnisan, ycui |
| Target Milestone: | ovirt-4.3.11 | Keywords: | Performance, Reopened, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-30 10:09:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Amit Bawer
2020-06-30 08:13:10 UTC
Tal, this is the ticket Nir wanted especially for the 4.3 backports of the issue for the lvm transient error on reloads (which root cause is pending fix from lvm in el7.9). Please advise what we need for 4.3 (in 4.4 the correlating patches are already merged and we don't wait for lvm fix so we don't need a special ticket there like in this case for 4.3). is it required with lvm bah, this already exists. *** This bug has been marked as a duplicate of bug 1849595 *** Michal, this is not a duplicate of bug 1849595. We have two issues: - LVM bug (bug 1842600) - the fix is to require lvm version with the fix. - Using --select in lvm command, making the LVM bug 10 times more likely and causing performance regressions. This bug is about that fix. This bug does not depend on the lvm fix, it is needed regardless of the fix. Thanks for clarification Nir, since all attached patches here are merged, is this MODIFIED? Moved to modified. Per verification, if this is ticket is tested without the lvm2 fix (pending response in https://bugzilla.redhat.com/1842600#c23) suggesting a no-regression test. Otherwise verification could wait for resolution of https://bugzilla.redhat.com/1849595 (In reply to Michal Skrivanek from comment #6) > Thanks for clarification Nir, since all attached patches here are merged, is > this MODIFIED? Right, not sure why it did not move automatically, maybe CI hooks are broken in 4.3? Tested Flow: 1. Create VM with 13 Disks (Each disk on diff SD) 2. Create VM snapshot 10 times 3. Delete VM The flow run 20 times with 2 concurrent users. Also, the test was run twice: 1. RHV 4.3.9-7 - Reproduce the problem. (vdsm-4.30.42 & lvm2-2.02.186-7) 2. RHV 4.3.11-7 - Fix verify. (vdsm-4.30.50 & lvm2-2.02.187-6) During the 1s test get many errors: :2020-08-02 00:47:06,400+0000 ERROR (monitor/d7e3b45) [storage.LVM] Reloading VGs failed vgs=['d7e3b455-ae7b-4bf9-a997-f25adc5f19b8'] rc=5 out=[' cqe0ji-XYxI-bejZ-AG2d-Vv1J-izNU-7Lvyfb|d7e3b455-ae7b-4bf9-a997-f25adc5f19b8|wz--n-|536602476544|516067164160|134217728|3998|3845|MDT_ALIGNMENT=1048576,MDT_BLOCK_SIZE=512,MDT_CLASS=Data,MDT_DESCRIPTION=MAXLUNS_3600a098038304437415d4b6a59685973,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_POOL_UUID=3fd7f5a9-738f-44c3-8e3b-760a43954106,MDT_PV0=pv:3600a098038304437415d4b6a59685973&44&uuid:3Omcg8-31Sd-SNUd-rtmP-EIwf-aCxB-Ne6r34&44&pestart:0&44&pecount:3998&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=d7e3b455-ae7b-4bf9-a997-f25adc5f19b8,MDT_TYPE=FCP,MDT_VERSION=5,MDT_VGUUID=cqe0ji-XYxI-bejZ-AG2d-Vv1J-izNU-7Lvyfb,MDT__SHA_CKSUM=9d455c6d1a9668abed2896fe76c126358f3461ad,RHAT_storage_domain|134217728|67097088|23|1|/dev/mapper/3600a098038304437415d4b6a59685973'] err=[' Metadata on /dev/mapper/3600a098038304437415d4b6a59685976 at 536848107520 has wrong VG name "" expected 44cfdf47-9464-447d-aab8-6f58f8b218d0.', ' Metadata on /dev/mapper/3600a098038304437415d4b6a59685976 at 536848107520 has wrong VG name "" expected 44cfdf47-9464-447d-aab8-6f58f8b218d0.', ' Not repairing metadata for VG 44cfdf47-9464-447d-aab8-6f58f8b218d0.', ' Recovery of volume group "44cfdf47-9464-447d-aab8-6f58f8b218d0" failed.', ' Cannot process volume group 44cfdf47-9464-447d-aab8-6f58f8b218d0'] (lvm:576) And during the 2nd test - No Errors in vdsm.log files VDSM log files (Both versions): https://drive.google.com/drive/folders/1klyclDkmUloD21pfle-giDpG7n9GUFZq Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Virtualization RHEL Host (ovirt-host) 4.3.11), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4113 |