1849595 – [scale] LVM metadata reload failures are breaking volume creation and deletion [RHV clone - 4.3.11]

Bug 1849595 - [scale] LVM metadata reload failures are breaking volume creation and deletion [RHV clone - 4.3.11]

Summary: [scale] LVM metadata reload failures are breaking volume creation and deletio...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.3.9
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	ovirt-4.3.11
Target Release:	---
Assignee:	Amit Bawer
QA Contact:	David Vaanunu
Docs Contact:
URL:
Whiteboard:
Depends On:	1837199 1842600
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-22 11:03 UTC by RHV bug bot
Modified:	2023-12-15 18:14 UTC (History)
CC List:	16 users (show)
Fixed In Version:	4.30.50
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1837199
Environment:
Last Closed:	2020-09-30 10:13:00 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5100381	None	None	None	2020-06-22 11:10:32 UTC
Red Hat Product Errata	RHSA-2020:4114	None	None	None	2020-09-30 10:13:37 UTC
oVirt gerrit	109342	master	MERGED	tests: Add stress test for lvm reloads	2021-02-10 07:59:56 UTC
oVirt gerrit	109343	master	MERGED	tests: reload: Add option to disable udev	2021-02-10 07:59:56 UTC
oVirt gerrit	109352	master	MERGED	tests: reload: Fix reload counting	2021-02-10 07:59:56 UTC
oVirt gerrit	109353	master	MERGED	tests: reload: Extract LVMRunner class	2021-02-10 07:59:56 UTC
oVirt gerrit	109354	master	MERGED	tests: reload: Add option to disable --select	2021-02-10 07:59:58 UTC
oVirt gerrit	109355	master	MERGED	tests: reload: Add option to control lvm verbosity	2021-02-10 07:59:57 UTC
oVirt gerrit	109359	master	MERGED	tests: reload: Add option to run read-only pvs/vgs/lvs commands	2021-02-10 07:59:58 UTC
oVirt gerrit	109366	master	MERGED	tests: reload: Add option for setting number of retries in lvm reload	2021-02-10 07:59:58 UTC
oVirt gerrit	109383	master	MERGED	tests: reload: Move reloader statistics into ReloaderStats class	2021-02-10 07:59:58 UTC
oVirt gerrit	109384	master	MERGED	tests: reload: Use common reload function for pvs/lvs/vgs reloaders	2021-02-10 07:59:58 UTC
oVirt gerrit	109385	master	MERGED	tests: reload: Add failures counter and amend failure logs	2021-02-10 07:59:59 UTC
oVirt gerrit	109402	master	MERGED	tests: reload: Separate write and read delay options	2021-02-10 07:59:59 UTC
oVirt gerrit	109462	master	MERGED	tests: reload: Fix --read-only default value	2021-02-10 07:59:59 UTC
oVirt gerrit	109551	ovirt-4.3	MERGED	lvm: Remove --select usage from reload pvs	2021-02-10 07:59:59 UTC
oVirt gerrit	109552	ovirt-4.3	ABANDONED	lvm: Remove --select usage from reload vgs	2021-02-10 07:59:59 UTC
oVirt gerrit	109555	ovirt-4.3	ABANDONED	lvm: Remove --select usage from reload lvs	2021-02-10 08:00:00 UTC
oVirt gerrit	109556	ovirt-4.3	ABANDONED	lvm: Remove wants_output check from cmd method	2021-02-10 08:00:01 UTC
oVirt gerrit	109811	ovirt-4.3	MERGED	logutils: Add Head class for shortening item lists in logs	2021-02-10 08:00:01 UTC
oVirt gerrit	109812	ovirt-4.3	MERGED	lvm_test: Add test for reload of invalidated pv	2021-02-10 08:00:01 UTC
oVirt gerrit	109813	ovirt-4.3	MERGED	lvm: Revert change for returned vgs entries upon reload error	2021-02-10 08:00:01 UTC
oVirt gerrit	109814	ovirt-4.3	MERGED	lvm: Remove --select usage from reload vgs	2021-02-10 08:00:01 UTC
oVirt gerrit	109815	ovirt-4.3	MERGED	lvm: Remove --select usage from reload lvs	2021-02-10 08:00:01 UTC
oVirt gerrit	109816	ovirt-4.3	MERGED	lvm: Remove wants_output check from cmd method	2021-02-10 08:00:02 UTC
oVirt gerrit	109817	ovirt-4.3	MERGED	lvm: Add log warning in case all cmd retries have failed	2021-02-10 08:00:02 UTC
oVirt gerrit	110178	ovirt-4.3	MERGED	spec: Require lvm2-2.02.187-6.el7 for x86_64 rhel dists	2021-02-10 08:00:02 UTC

Description RHV bug bot 2020-06-22 11:03:38 UTC

+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1837199 +++
======================================================================

Description of problem:

LVM reloading is randomly failing on the SPM, breaking volume creation and deletion.

On the occurences we have seen so far, it seems to happen when there are parallel volume creation/deletion on the SPM.
But it is weird since its using locking_type=1. Below you will see two examples.

Volume Creation
---------------

2020-05-12 18:02:23,014+1000 INFO  (jsonrpc/3) [vdsm.api] START createVolume(sdUUID=u'23b7f41e-9e5f-4213-940f-7beba85d550a', spUUID=u'a0785083-a04e-4635-a4e3-feed14aae231', imgUUID=u'7dc1f9a5-3221-4465-bc8b-5c3cfc766acb', size=u'1172526071808', volFormat=4, preallocate=2, diskType=u'DATA', volUUID=u'3019fde6-5d9e-422a-98bb-0e8216a4fa11', desc=u'', srcImgUUID=u'7dc1f9a5-3221-4465-bc8b-5c3cfc766acb', srcVolUUID=u'e753c9a8-e418-4426-93ec-92a273102913', initialSize=None) from=::ffff:<removed>,44378, flow_id=551210_12282_1589270542, task_id=cd7ca685-3202-4e0c-9631-51fb587deb52 (api:48)

We get to this point, when tags are replaced:

2020-05-12 18:02:27,970+1000 INFO  (tasks/8) [storage.LVM] Change LV tags (vg=23b7f41e-9e5f-4213-940f-7beba85d550a, lv=3019fde6-5d9e-422a-98bb-0e8216a4fa11, delTags=['OVIRT_VOL_INITIALIZING'], addTags=['MD_3', u'PU_e753c9a8-e418-4426-93ec-92a273102913', u'IU_7dc1f9a5-3221-4465-bc8b-5c3cfc766acb']) (lvm:1666)
2020-05-12 18:02:28,298+1000 INFO  (tasks/8) [storage.LVM] Deactivating lvs: vg=23b7f41e-9e5f-4213-940f-7beba85d550a lvs=[u'3019fde6-5d9e-422a-98bb-0e8216a4fa11'] (lvm:1621)
2020-05-12 18:02:28,782+1000 INFO  (tasks/8) [storage.ThreadPool.WorkerThread] FINISH task cd7ca685-3202-4e0c-9631-51fb587deb52 (threadPool:210)
2020-05-12 18:02:28,782+1000 INFO  (tasks/8) [storage.ThreadPool.WorkerThread] START task 28aacf22-abfa-47b3-8012-a6061067aef8 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7fe340655170>>, args=None) (threadPool:208)

And reload fails:

2020-05-12 18:02:29,447+1000 ERROR (tasks/8) [storage.LVM] Reloading LVs failed vg='a489622c-a817-4e2c-b33d-2e790279578d' lvs=[] rc=5 out=['  HeFo9S-B5lM-1jID-I27s-vsEB-RJ0o-Tvgf65|680a6641-a717-4c0c-b4f2-36535c2ff6ce|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61249)|OVIRT_VOL_INITIALIZING', '  2d7Tgp-Rd2T-bskO-MwH9-wznM-mhP6-WXiVQT|6f872c08-aba5-43ca-8c60-3951211d869b|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61241)|IU_abfa235a-66c6-4682-9b8a-f8860f00552e,MD_2,PU_d5413f26-89bd-49e5-aa69-583f07bc3c89', '  5fHjwJ-Ld40-ohqj-ZInE-KaBb-Ziu1-G69bI8|7186f983-182a-4650-9465-9fa8c4101e56|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(17561)|IU_c0601a53-6c7c-479e-905c-96df48b99d55,MD_7,PU_00000000-0000-0000-0000-000000000000', '  pKsJSz-CctB-8ny2-jXr7-Q3Tk-Et2I-hTnIrj|8dc31e7b-ca31-4c52-963f-0dd6627a068f|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(52505)|IU_6f7323c5-97cc-451a-8ee4-377deef58fb6,MD_11,PU_00000000-0000-0000-0000-000000000000', '  RAh1Tq-VowV-zetA-5s1a-YHcA-9HvI-lRDtfJ|994c7f74-3d99-45bf-8196-5fd15e04151d|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61257)|OVIRT_VOL_INITIALIZING', '  vMl0im-5Wx2-6CvW-9b76-QbLM-MxNC-UQC1fk|9bf9b153-8510-4114-beaa-42e6fb55cfc3|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(39)|IU_14995816-5e7e-453f-b0ba-5c6455075446,MD_4,PU_00000000-0000-0000-0000-000000000000', '  T6ynMJ-PYrj-SaNz-n2wV-zV3A-TCI3-Ruys82|b355616d-14ae-4343-a9ea-e41b92f8182c|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(41)|IU_615b78dd-e036-4cf9-afba-b591ade6f3a2,MD_1,PU_00000000-0000-0000-0000-000000000000', '  uUUlgj-OtDb-glzS-sPK4-757D-Rod2-fPzypS|bea071e9-2876-440c-bcc3-0fe1180ab0da|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(40)|IU_831ed560-cff5-403f-9378-bb239011eaca,MD_5,PU_00000000-0000-0000-0000-000000000000', '  H1OTMy-ZPyr-RXTX-Nrxx-tzU1-iIeU-SxLudS|ca4dc576-5051-42ca-b0d7-404bdf2da76e|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61265)|OVIRT_VOL_INITIALIZING', '  Yc4NI1-svE9-oVKV-vqyO-a99z-vBh4-gm14Mq|d5413f26-89bd-49e5-aa69-583f07bc3c89|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(35033)|IU_abfa235a-66c6-4682-9b8a-f8860f00552e,MD_9,PU_00000000-0000-0000-0000-000000000000', '  spn5mb-NCiB-NNdp-N1fO-0LUP-AjBz-sQ3PaT|ef20b99d-981d-48cc-a00c-675e860b8937|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61273)|OVIRT_VOL_INITIALIZING', '  wjjmw6-Pf3u-jSib-3IQ1-bwHR-aM3n-lcocCW|f15ba2c2-ca1f-42a3-a055-1ac4c6377a62|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(26297)|IU_d6860305-f1fa-45d1-8e60-e1113cffe384,MD_8,PU_00000000-0000-0000-0000-000000000000', '  SJS9iv-eWUe-A63p-jTtU-GKPT-WiTV-WWnSjz|f6c18a8a-bb3c-4000-8c83-6c951af6429d|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(43769)|IU_dd44dd57-857b-4590-953c-3a3cbd9fe07c,MD_10,PU_00000000-0000-0000-0000-000000000000', '  gmfjsD-nXFq-CjTA-Sgh0-WPe5-lemv-ClcaXr|ids|a489622c-a817-4e2c-b33d-2e790279578d|-wi-ao----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(29)|', '  Rkw57G-rHWI-G0kp-QrLx-wNd9-jcx5-vdp1a6|inbox|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(30)|', '  1bE5v1-poXJ-CrIu-3Rke-39tS-sHZM-sLOXuf|leases|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|2147483648|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(13)|', '  ODuI0b-iX7P-3lNy-FWrh-3VGW-3Yp8-3dpmCw|master|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(31)|', '  nP42Q1-fD6m-6fJ2-eRyi-p5dn-oMyS-jqThZ0|metadata|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|536870912|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(0)|', '  HG0p2j-vLNI-YC7h-w8Nx-rrsS-nm3N-6Hu7HP|outbox|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(4)|', '  HjXP5G-1lHk-8xMb-bvLF-UEXa-angY-AWCrxr|xleases|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(5)|'] err=['  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 8795987775488 has wrong VG name "riped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 56001', ']', '}', 'segment51 {', 'start_extent = 456', 'extent_count = 16', '', 'type = "striped"', 'stripe_" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 8795987775488 has wrong VG name "riped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 56001', ']', '}', 'segment51 {', 'start_extent = 456', 'extent_count = 16', '', 'type = "striped"', 'stripe_" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.', '  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a'] (lvm:645)

Possible commands in parallel to reload are flows from 2 other volume creations on the same SD:

2020-05-12 18:02:28,489+1000 INFO  (tasks/7) [storage.LVM] Creating LV (vg=23b7f41e-9e5f-4213-940f-7beba85d550a, lv=30aec81b-eebc-4f0f-a90a-771624f53b99, size=1024m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',), device=None, read_only=None) (lvm:1448)

2020-05-12 18:02:28,915+1000 INFO  (tasks/5) [storage.LVM] Change LV tags (vg=23b7f41e-9e5f-4213-940f-7beba85d550a, lv=1d0596dc-983a-4e92-8848-a4595e4f897f, delTags=['OVIRT_VOL_INITIALIZING'], addTags=['MD_7', u'PU_357c7605-8857-49a1-aba4-0f957d82845d', u'IU_996f0cff-97da-4c1d-a1ac-06dfea30ef0c']) (lvm:1666)

And the creation failed:

2020-05-12 18:02:29,449+1000 WARN  (tasks/8) [storage.ResourceManager] Resource factory failed to create resource '01_img_a489622c-a817-4e2c-b33d-2e790279578d.d6860305-f1fa-45d1-8e60-e1113cffe384'. Canceling request. (resourceManager:544)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 540, in registerResource
    obj = namespaceObj.factory.createResource(name, lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 193, in createResource
    lockType)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
    imgUUID=resourceName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 185, in getChain
    uuidlist = volclass.getImageVolumes(sdUUID, imgUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1559, in getImageVolumes
    return cls.manifestClass.getImageVolumes(sdUUID, imgUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 316, in getImageVolumes
    lvs = lvm.lvsByTag(sdUUID, "%s%s" % (sc.TAG_PREFIX_IMAGE, imgUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1811, in lvsByTag
    return [lv for lv in getLV(vgName) if tag in lv.tags]
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1215, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist: ('a489622c-a817-4e2c-b33d-2e790279578d/None',)
2020-05-12 18:02:29,450+1000 WARN  (tasks/8) [storage.ResourceManager.Request] (ResName='01_img_a489622c-a817-4e2c-b33d-2e790279578d.d6860305-f1fa-45d1-8e60-e1113cffe384', ReqID='a02e5fc4-d4cb-411c-af89-44f08cd7f7cd') Tried to cancel a processed request (resourceManager:188)
2020-05-12 18:02:29,451+1000 ERROR (tasks/8) [storage.TaskManager.Task] (Task='28aacf22-abfa-47b3-8012-a6061067aef8') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1946, in createVolume
    with rm.acquireResource(img_ns, imgUUID, rm.EXCLUSIVE):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 1026, in acquireResource
    return _manager.acquireResource(namespace, name, lockType, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py", line 476, in acquireResource
    raise se.ResourceAcqusitionFailed()
ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: ()
2020-05-12 18:02:29,523+1000 INFO  (tasks/8) [storage.ThreadPool.WorkerThread] FINISH task 28aacf22-abfa-47b3-8012-a6061067aef8 (threadPool:210)



Volume Deletion
---------------

2020-05-14 03:41:38,005+1000 INFO  (jsonrpc/0) [vdsm.api] START deleteVolume(sdUUID=u'a489622c-a817-4e2c-b33d-2e790279578d', spUUID=u'a0785083-a04e-4635-a4e3-feed14aae231', imgUUID=u'abfa235a-66c6-4682-9b8a-f8860f00552e', volumes=[u'ceed0744-ee48-42c1-b6ed-1a343a5a1b71'], postZero=u'false', force=u'false', discard=False) from=::ffff:<removed>,44378, flow_id=491fdad0-85de-414b-b0c9-87431cd2caed, task_id=ec19945b-407f-461f-8b69-aff142cf5b29 (api:48)

We get to this point:

2020-05-14 03:41:40,504+1000 INFO  (tasks/0) [storage.VolumeManifest] sdUUID=a489622c-a817-4e2c-b33d-2e790279578d imgUUID=abfa235a-66c6-4682-9b8a-f8860f00552e volUUID = ceed0744-ee48-42c1-b6ed-1a343a5a1b71 legality = ILLEGAL  (volume:387)
2020-05-14 03:41:40,523+1000 INFO  (tasks/0) [storage.LVM] Replacing LV tag (vg=a489622c-a817-4e2c-b33d-2e790279578d, lv=ceed0744-ee48-42c1-b6ed-1a343a5a1b71, deltag=PU_d5413f26-89bd-49e5-aa69-583f07bc3c89, addtag=PU_00000000-0000-0000-0000-000000000000) (lvm:1824)

And reload fails:

2020-05-14 03:41:42,763+1000 ERROR (tasks/0) [storage.LVM] Reloading LVs failed vg='a489622c-a817-4e2c-b33d-2e790279578d' lvs=[] rc=5 out=['  6wuNOq-5Mm6-YhSF-tu4j-UWxz-BDbF-X6LO5Y|0e982653-7c25-4bcd-9800-8b4d93a1febb|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61281)|IU__remove_me_d6860305-f1fa-45d1-8e60-e1113cffe384,MD_14,PU_00000000-0000-0000-0000-000000000000', '  Or9ezd-y2ur-nkcP-wIKe-gtPe-dXwV-Qc9Anf|368dd998-af43-4070-8603-4c77288ead0f|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61265)|IU_615b78dd-e036-4cf9-afba-b591ade6f3a2,MD_12,PU_b355616d-14ae-4343-a9ea-e41b92f8182c', '  1iB4Nt-ITO7-FyAx-Jlp6-aKcw-yOKO-ztqk7E|5e6ab4bf-2755-450e-bcff-c7afba631e33|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61273)|IU__remove_me_6f7323c5-97cc-451a-8ee4-377deef58fb6,MD_13,PU_00000000-0000-0000-0000-000000000000', '  bMOtMc-woPI-YouY-baHP-9NYd-uaWI-vOC3V0|6942127f-6c6b-472b-977d-43762e5a4749|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61257)|IU_c0601a53-6c7c-479e-905c-96df48b99d55,MD_6,PU_00000000-0000-0000-0000-000000000000', '  5fHjwJ-Ld40-ohqj-ZInE-KaBb-Ziu1-G69bI8|7186f983-182a-4650-9465-9fa8c4101e56|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(17561)|IU_c0601a53-6c7c-479e-905c-96df48b99d55,MD_7,PU_00000000-0000-0000-0000-000000000000', '  pKsJSz-CctB-8ny2-jXr7-Q3Tk-Et2I-hTnIrj|8dc31e7b-ca31-4c52-963f-0dd6627a068f|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(52505)|IU_6f7323c5-97cc-451a-8ee4-377deef58fb6,MD_11,PU_00000000-0000-0000-0000-000000000000', '  vMl0im-5Wx2-6CvW-9b76-QbLM-MxNC-UQC1fk|9bf9b153-8510-4114-beaa-42e6fb55cfc3|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(39)|IU_14995816-5e7e-453f-b0ba-5c6455075446,MD_4,PU_00000000-0000-0000-0000-000000000000', '  T6ynMJ-PYrj-SaNz-n2wV-zV3A-TCI3-Ruys82|b355616d-14ae-4343-a9ea-e41b92f8182c|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(41)|IU_615b78dd-e036-4cf9-afba-b591ade6f3a2,MD_1,PU_00000000-0000-0000-0000-000000000000', '  uUUlgj-OtDb-glzS-sPK4-757D-Rod2-fPzypS|bea071e9-2876-440c-bcc3-0fe1180ab0da|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(40)|IU_831ed560-cff5-403f-9378-bb239011eaca,MD_5,PU_00000000-0000-0000-0000-000000000000', '  Gtv5mM-ST2b-0Dab-Xk32-FyYs-6xu1-V3FRE4|ceed0744-ee48-42c1-b6ed-1a343a5a1b71|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(61241)|IU_abfa235a-66c6-4682-9b8a-f8860f00552e,MD_2,PU_00000000-0000-0000-0000-000000000000', '  Yc4NI1-svE9-oVKV-vqyO-a99z-vBh4-gm14Mq|d5413f26-89bd-49e5-aa69-583f07bc3c89|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(35033)|IU_abfa235a-66c6-4682-9b8a-f8860f00552e,MD_9,PU_00000000-0000-0000-0000-000000000000', '  wjjmw6-Pf3u-jSib-3IQ1-bwHR-aM3n-lcocCW|f15ba2c2-ca1f-42a3-a055-1ac4c6377a62|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(26297)|IU_d6860305-f1fa-45d1-8e60-e1113cffe384,MD_8,PU_00000000-0000-0000-0000-000000000000', '  SJS9iv-eWUe-A63p-jTtU-GKPT-WiTV-WWnSjz|f6c18a8a-bb3c-4000-8c83-6c951af6429d|a489622c-a817-4e2c-b33d-2e790279578d|-wi-------|1172526071808|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(43769)|IU_dd44dd57-857b-4590-953c-3a3cbd9fe07c,MD_10,PU_00000000-0000-0000-0000-000000000000', '  gmfjsD-nXFq-CjTA-Sgh0-WPe5-lemv-ClcaXr|ids|a489622c-a817-4e2c-b33d-2e790279578d|-wi-ao----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(29)|', '  Rkw57G-rHWI-G0kp-QrLx-wNd9-jcx5-vdp1a6|inbox|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(30)|', '  1bE5v1-poXJ-CrIu-3Rke-39tS-sHZM-sLOXuf|leases|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|2147483648|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(13)|', '  ODuI0b-iX7P-3lNy-FWrh-3VGW-3Yp8-3dpmCw|master|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(31)|', '  nP42Q1-fD6m-6fJ2-eRyi-p5dn-oMyS-jqThZ0|metadata|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|536870912|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(0)|', '  HG0p2j-vLNI-YC7h-w8Nx-rrsS-nm3N-6Hu7HP|outbox|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|134217728|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(4)|', '  HjXP5G-1lHk-8xMb-bvLF-UEXa-angY-AWCrxr|xleases|a489622c-a817-4e2c-b33d-2e790279578d|-wi-a-----|1073741824|0|/dev/mapper/3600a0980383135736e3f4d70794a4f49(5)|'] err=['  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 27269120 has wrong VG name "ent2 {', 'start_extent = 8', 'extent_count = 8', '', 'type = "striped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 52521', ']', '}', 'segment3 {', 'start_exten" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 27269120 has wrong VG name "ent2 {', 'start_extent = 8', 'extent_count = 8', '', 'type = "striped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 52521', ']', '}', 'segment3 {', 'start_exten" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.', '  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a'] (lvm:645)

2020-05-14 03:41:42,766+1000 WARN  (tasks/0) [storage.Volume] cannot finalize parent volume d5413f26-89bd-49e5-aa69-583f07bc3c89 (blockVolume:617)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 613, in delete
    pvol.recheckIfLeaf()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1471, in recheckIfLeaf
    return self._manifest.recheckIfLeaf()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 363, in recheckIfLeaf
    childrenNum = len(self.getChildren())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 180, in getChildren
    "%s%s" % (sc.TAG_PREFIX_PARENT, self.volUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1811, in lvsByTag
    return [lv for lv in getLV(vgName) if tag in lv.tags]
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1215, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist: ('a489622c-a817-4e2c-b33d-2e790279578d/None',)

On the example above this leaf the parent as INTERNAL, leaving the image broken with no LEAF.

Again other LVM operations in parallel, quite close to the failure, for example:
2020-05-14 03:41:42,407+1000 INFO  (tasks/3) [storage.LVM] Removing LVs (vg=a489622c-a817-4e2c-b33d-2e790279578d, lvs=[u'5e6ab4bf-2755-450e-bcff-c7afba631e33'], read_only=None) (lvm:1492)

Version-Release number of selected component (if applicable):
rhvh-4.3.9.2-0.20200324.0
  vdsm-4.30.44-1.el7ev.x86_64
  lvm2-2.02.186-7.el7_8.1.x86_64
  kernel-3.10.0-1127.el7.x86_64
  
How reproducible:
Unknown yet, but will attempt soon. The idea is:
* Use latest version with fix for BZ1553133
* Snapshot operations on VM with several disks (i.e. 10+) on block storage

(Originally by Germano Veit Michel)

Comment 5 RHV bug bot 2020-06-22 11:03:54 UTC

Customer reproduced with VDSM debug logs:

After a merge of snapshot (VM with 13 disks) 2 disks failed to adjust the metadata of the remaining volume due to a failed lvs reload.

1. Request to delete previous leaf:

2020-05-20 12:48:01,827+1000 INFO  (jsonrpc/5) [vdsm.api] START deleteVolume(sdUUID=u'23b7f41e-9e5f-4213-940f-7beba85d550a', spUUID=u'a0785083-a04e-4635-a4e3-feed14aae231', imgUUID=u'7dc1f9a5-3221-4465-bc8b-5c3cfc766acb', volumes=[u'81224d51-feba-49f2-9362-331383b44626'], postZero=u'false', force=u'false', discard=False) from=::ffff:<REMOVED>,44378, flow_id=372380d1-25b6-4a01-b5a6-e35adb4c1a67, task_id=eeadd3bd-18e6-429b-aef2-bc4ef2ad8b6e (api:48)

2. Get to the stage of replacing LV tag, which works:

2020-05-20 12:48:06,354+1000 INFO  (tasks/4) [storage.LVM] Replacing LV tag (vg=23b7f41e-9e5f-4213-940f-7beba85d550a, lv=81224d51-feba-49f2-9362-331383b44626, deltag=PU_e753c9a8-e418-4426-93ec-92a273102913, addtag=PU_00000000-0000-0000-0000-000000000000) (lvm:1824)

2020-05-20 12:48:06,354+1000 DEBUG (tasks/4) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304272725d4b3147746b2d$|^/dev/mapper/3600a0980383135736e3f4d70794a4f42$|^/dev/mapper/3600a0980383135736e3f4d70794a4f43$|^/dev/mapper/3600a0980383135736e3f4d70794a4f44$|^/dev/mapper/3600a0980383135736e3f4d70794a4f45$|^/dev/mapper/3600a0980383135736e3f4d70794a4f46$|^/dev/mapper/3600a0980383135736e3f4d70794a4f47$|^/dev/mapper/3600a0980383135736e3f4d70794a4f48$|^/dev/mapper/3600a0980383135736e3f4d70794a4f49$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4a$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4b$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4c$|^/dev/mapper/3600a0980383135736e3f4d70794a502f$|^/dev/mapper/3600a0980383135736e3f4d70794a5059$|^/dev/mapper/3600a0980383135736e3f4d70794a505a$|^/dev/mapper/3600a0980383135736e3f4d70794a5061$|^/dev/mapper/3600a0980383135736e3f4d70794a5062$|^/dev/mapper/3600a09803831357370244d6d76627842$|^/dev/mapper/3600a09803831357370244d6d76627843$|^/dev/mapper/3600a09803831357370244d6d76627844$|^/dev/mapper/3600a09803831357370244d6d76627845$|^/dev/mapper/3600a09803831357370244d6d76627846$|^/dev/mapper/3600a09803831357370244d6d76627847$|^/dev/mapper/3600a09803831357370244d6d76627848$|^/dev/mapper/3600a09803831357370244d6d76627849$|^/dev/mapper/3600a09803831357370244d6d7662784a$|^/dev/mapper/3600a09803831357370244d6d7662784b$|^/dev/mapper/3600a09803831357370244d6d7662784c$|^/dev/mapper/3600a09803831357370244d6d7662784d$|^/dev/mapper/3600a09803831357370244d6d7662784e$|^/dev/mapper/3600a09803831357370244d6d7662784f$|^/dev/mapper/3600a09803831357370244d6d76627850$|^/dev/mapper/3600a09803831357370244d6d76627851$|^/dev/mapper/3600a09803831357370244d6d76627852$|^/dev/mapper/3600a09803831357370244d6d7662792f$|^/dev/mapper/3600a09803831357370244d6d7662794e$|^/dev/mapper/3600a09803831357370244d6d7662794f$|^/dev/mapper/3600a09803831357370244d6d76627959$|^/dev/mapper/3600a09803831357370244d6d7662795a$|^/dev/mapper/3600a09803831357370244d6d76627961$|^/dev/mapper/3600a09803831357370244d6d76627962$|^/dev/mapper/3618e728372d2c3d0219cf9300ad72509$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --autobackup n --deltag PU_e753c9a8-e418-4426-93ec-92a273102913 --addtag PU_00000000-0000-0000-0000-000000000000 23b7f41e-9e5f-4213-940f-7beba85d550a/81224d51-feba-49f2-9362-331383b44626 (cwd None) (commands:198)

2020-05-20 12:48:06,733+1000 DEBUG (tasks/4) [storage.Misc.excCmd] SUCCESS: <err> = '  WARNING: This metadata update is NOT backed up.\n'; <rc> = 0 (commands:219)

3. But then reload after the tag change fails:

2020-05-20 12:48:06,733+1000 DEBUG (tasks/4) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304272725d4b3147746b2d$|^/dev/mapper/3600a0980383135736e3f4d70794a4f42$|^/dev/mapper/3600a0980383135736e3f4d70794a4f43$|^/dev/mapper/3600a0980383135736e3f4d70794a4f44$|^/dev/mapper/3600a0980383135736e3f4d70794a4f45$|^/dev/mapper/3600a0980383135736e3f4d70794a4f46$|^/dev/mapper/3600a0980383135736e3f4d70794a4f47$|^/dev/mapper/3600a0980383135736e3f4d70794a4f48$|^/dev/mapper/3600a0980383135736e3f4d70794a4f49$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4a$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4b$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4c$|^/dev/mapper/3600a0980383135736e3f4d70794a502f$|^/dev/mapper/3600a0980383135736e3f4d70794a5059$|^/dev/mapper/3600a0980383135736e3f4d70794a505a$|^/dev/mapper/3600a0980383135736e3f4d70794a5061$|^/dev/mapper/3600a0980383135736e3f4d70794a5062$|^/dev/mapper/3600a09803831357370244d6d76627842$|^/dev/mapper/3600a09803831357370244d6d76627843$|^/dev/mapper/3600a09803831357370244d6d76627844$|^/dev/mapper/3600a09803831357370244d6d76627845$|^/dev/mapper/3600a09803831357370244d6d76627846$|^/dev/mapper/3600a09803831357370244d6d76627847$|^/dev/mapper/3600a09803831357370244d6d76627848$|^/dev/mapper/3600a09803831357370244d6d76627849$|^/dev/mapper/3600a09803831357370244d6d7662784a$|^/dev/mapper/3600a09803831357370244d6d7662784b$|^/dev/mapper/3600a09803831357370244d6d7662784c$|^/dev/mapper/3600a09803831357370244d6d7662784d$|^/dev/mapper/3600a09803831357370244d6d7662784e$|^/dev/mapper/3600a09803831357370244d6d7662784f$|^/dev/mapper/3600a09803831357370244d6d76627850$|^/dev/mapper/3600a09803831357370244d6d76627851$|^/dev/mapper/3600a09803831357370244d6d76627852$|^/dev/mapper/3600a09803831357370244d6d7662792f$|^/dev/mapper/3600a09803831357370244d6d7662794e$|^/dev/mapper/3600a09803831357370244d6d7662794f$|^/dev/mapper/3600a09803831357370244d6d76627959$|^/dev/mapper/3600a09803831357370244d6d7662795a$|^/dev/mapper/3600a09803831357370244d6d76627961$|^/dev/mapper/3600a09803831357370244d6d76627962$|^/dev/mapper/3618e728372d2c3d0219cf9300ad72509$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags --select 'vg_name = 23b7f41e-9e5f-4213-940f-7beba85d550a' (cwd None) (commands:198)

2020-05-20 12:48:07,553+1000 DEBUG (tasks/4) [storage.Misc.excCmd] FAILED: <err> = '  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.\n  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a\n'; <rc> = 5 (commands:219)

---------------------------------------------------

At the same time we are doing the exact same for another disk...

2020-05-20 12:48:01,515+1000 INFO  (jsonrpc/6) [vdsm.api] START deleteVolume(sdUUID=u'23b7f41e-9e5f-4213-940f-7beba85d550a', spUUID=u'a0785083-a04e-4635-a4e3-feed14aae231', imgUUID=u'dad7a2b9-04c6-4478-9e82-7a59e1d55a6b', volumes=[u'94ce6d60-964e-41e1-b34d-802396f4f51f'], postZero=u'false', force=u'false', discard=False) from=::ffff:<REMOVED>,44378, flow_id=372380d1-25b6-4a01-b5a6-e35adb4c1a67, task_id=9379bd34-81a3-4f5d-9410-1cff58242ef5 (api:48)

This reload failed:

2020-05-20 12:48:06,913+1000 DEBUG (tasks/6) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304272725d4b3147746b2d$|^/dev/mapper/3600a0980383135736e3f4d70794a4f42$|^/dev/mapper/3600a0980383135736e3f4d70794a4f43$|^/dev/mapper/3600a0980383135736e3f4d70794a4f44$|^/dev/mapper/3600a0980383135736e3f4d70794a4f45$|^/dev/mapper/3600a0980383135736e3f4d70794a4f46$|^/dev/mapper/3600a0980383135736e3f4d70794a4f47$|^/dev/mapper/3600a0980383135736e3f4d70794a4f48$|^/dev/mapper/3600a0980383135736e3f4d70794a4f49$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4a$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4b$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4c$|^/dev/mapper/3600a0980383135736e3f4d70794a502f$|^/dev/mapper/3600a0980383135736e3f4d70794a5059$|^/dev/mapper/3600a0980383135736e3f4d70794a505a$|^/dev/mapper/3600a0980383135736e3f4d70794a5061$|^/dev/mapper/3600a0980383135736e3f4d70794a5062$|^/dev/mapper/3600a09803831357370244d6d76627842$|^/dev/mapper/3600a09803831357370244d6d76627843$|^/dev/mapper/3600a09803831357370244d6d76627844$|^/dev/mapper/3600a09803831357370244d6d76627845$|^/dev/mapper/3600a09803831357370244d6d76627846$|^/dev/mapper/3600a09803831357370244d6d76627847$|^/dev/mapper/3600a09803831357370244d6d76627848$|^/dev/mapper/3600a09803831357370244d6d76627849$|^/dev/mapper/3600a09803831357370244d6d7662784a$|^/dev/mapper/3600a09803831357370244d6d7662784b$|^/dev/mapper/3600a09803831357370244d6d7662784c$|^/dev/mapper/3600a09803831357370244d6d7662784d$|^/dev/mapper/3600a09803831357370244d6d7662784e$|^/dev/mapper/3600a09803831357370244d6d7662784f$|^/dev/mapper/3600a09803831357370244d6d76627850$|^/dev/mapper/3600a09803831357370244d6d76627851$|^/dev/mapper/3600a09803831357370244d6d76627852$|^/dev/mapper/3600a09803831357370244d6d7662792f$|^/dev/mapper/3600a09803831357370244d6d7662794e$|^/dev/mapper/3600a09803831357370244d6d7662794f$|^/dev/mapper/3600a09803831357370244d6d76627959$|^/dev/mapper/3600a09803831357370244d6d7662795a$|^/dev/mapper/3600a09803831357370244d6d76627961$|^/dev/mapper/3600a09803831357370244d6d76627962$|^/dev/mapper/3618e728372d2c3d0219cf9300ad72509$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags --select 'vg_name = 23b7f41e-9e5f-4213-940f-7beba85d550a' (cwd None) (commands:198)

2020-05-20 12:48:07,747+1000 DEBUG (tasks/6) [storage.Misc.excCmd] FAILED: <err> = '  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.\n  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a\n'; <rc> = 5 (commands:219)

If we extract just the LVM commands from the SPM logs, we see this:

Note: the 2 disks above are task/4 and task/6, you see the FAILED lvs 

12:48:06,469+1000 (tasks/8) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvremove
12:48:06,537+1000 (tasks/0) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:06,561+1000 (tasks/9) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:06,569+1000 (tasks/3) SUCCESS: <err> = ' WARNING: This metadata
12:48:06,571+1000 (tasks/3) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvremove
12:48:06,683+1000 (tasks/1) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:06,715+1000 (tasks/5) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:06,733+1000 (tasks/4) SUCCESS: <err> = ' WARNING: This metadata
12:48:06,733+1000 (tasks/4) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs               <---- task/4 reload
12:48:06,773+1000 (tasks/2) SUCCESS: <err> = ''; <rc> = 0
12:48:06,784+1000 (tasks/6) SUCCESS: <err> = ''; <rc> = 0
12:48:06,842+1000 (tasks/8) SUCCESS: <err> = ' WARNING: This metadata
12:48:06,886+1000 (tasks/2) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:06,913+1000 (tasks/6) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs               <---- task/6 reload
12:48:07,186+1000 (tasks/3) SUCCESS: <err> = ' WARNING: This metadata
12:48:07,300+1000 (tasks/0) SUCCESS: <err> = ' WARNING: This metadata
12:48:07,301+1000 (tasks/0) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvremove
12:48:07,406+1000 (tasks/9) SUCCESS: <err> = ''; <rc> = 0
12:48:07,417+1000 (tasks/1) SUCCESS: <err> = ''; <rc> = 0
12:48:07,438+1000 (tasks/5) SUCCESS: <err> = ''; <rc> = 0
12:48:07,472+1000 (tasks/9) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:07,521+1000 (tasks/5) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:07,541+1000 (tasks/1) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:07,553+1000 (tasks/4) FAILED: <err> = ' Metadata on /dev/mapper/3600a09803831357370244d6d7662784f   <---- task/4 reload fails
12:48:07,633+1000 (tasks/2) SUCCESS: <err> = ' WARNING: This metadata
12:48:07,638+1000 (tasks/2) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvremove
12:48:07,651+1000 (tasks/4) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:07,705+1000 (tasks/0) SUCCESS: <err> = ' WARNING: This metadata
12:48:07,747+1000 (tasks/6) FAILED: <err> = ' Metadata on /dev/mapper/3600a09803831357370244d6d7662784f   <---- task/6 reload fails
12:48:07,777+1000 (tasks/6) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange
12:48:08,086+1000 (tasks/9) SUCCESS: <err> = ' WARNING: This metadata
12:48:08,087+1000 (tasks/9) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:08,158+1000 (tasks/1) SUCCESS: <err> = ' Scan of VG
12:48:08,159+1000 (tasks/1) /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs
12:48:08,197+1000 (tasks/2) SUCCESS: <err> = ' WARNING: This metadata
12:48:08,359+1000 (tasks/5) SUCCESS: <err> = ''; <rc> = 0

Given that it seems to be reading garbage during lvs, it seems like its reading while something else is updating, but I don't see clear signs that locking_type 1 is not being respected.
I even try to loop lvcreate and lvs, but it did not reproduce:

As a "bonus", SD monitoring also fails sometimes, same problem:

2020-05-20 12:46:18,430+1000 ERROR (monitor/23b7f41) [storage.LVM] Reloading VGs failed vgs=['23b7f41e-9e5f-4213-940f-7beba85d550a'] rc=5 out=[] err=['  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30412288 has wrong VG name "egment68 {', 'start_extent = 672', 'extent_count = 8', '', 'type = "striped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 56481', ']', '}', 'segment69 {', 'star" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30412288 has wrong VG name "egment68 {', 'start_extent = 672', 'extent_count = 8', '', 'type = "striped"', 'stripe_count = 1', '', 'stripes = [', '"pv0", 56481', ']', '}', 'segment69 {', 'star" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.', '  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.', '  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a'] (lvm:576)

(Originally by Germano Veit Michel)

Comment 7 RHV bug bot 2020-06-22 11:04:04 UTC

I was trying to reproducing this on my labs (currently on RHV 4.4), I used lvm2-2.03.08-3.el8.x86_64 on RHEL 8.2.
I think I might have reproduced something similar...

It seems to happen if we do lvchange with activation + a change of tags (like VDSM does), which also produces this warning:
~~~
  WARNING: Combining activation change with other commands is not advised.
~~~

Preparation:
dd if=/dev/zero of=/root/block bs=1M count=1000
losetup /dev/loop0 /root/diskimage

In 1 terminal, loop this:
~~~~~~~~
# cat create_change_activate.sh 
#!/bin/sh
i="0"

while [ $i -lt 200 ]; do 
  lvcreate --config 'global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }' -L 4M myvg
  i=$[$i+1]
done

i="0"
while [ $i -lt 200 ]; do 
  lvchange --config 'global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --addtag AA myvg/lvol$i &
  i=$[$i+1]
done

i="0"
while [ $i -lt 200 ]; do 
  lvchange --config 'global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --addtag BB --deltag AA myvg/lvol$i &
  i=$[$i+1]
done

i="0"
while [ $i -lt 200 ]; do 
  lvchange --config 'global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --deltag BB myvg/lvol$i &
  i=$[$i+1]
done
~~~

In another one loop this:
~~~
# cat read.sh 
#!/bin/sh
while true; do 
  lvs --config 'global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }' --select 'vg_name = myvg'
done
~~~

Upon a few repetitions, I got a few of these:
~~~
  /dev/loop0: Checksum error at offset 280064
  WARNING: invalid metadata text from /dev/loop0 at 280064.
  WARNING: metadata on /dev/loop0 at 280064 has invalid summary for VG.
  WARNING: bad metadata text on /dev/loop0 in mda1
  WARNING: scanning /dev/loop0 mda1 failed to read metadata summary.
  WARNING: repair VG metadata on /dev/loop0 with vgck --updatemetadata.
  WARNING: scan failed to get metadata summary from /dev/loop0 PVID tcqQQGeqIlNCW9wXdph0MaigovnXHwUR
~~~

And some of these at alternate times, which seem to be similar to what VDSM gets (unable to see VG):
~~~
 Volume group "myvg" not found.
 Cannot process volume group myvg
~~~

I assume this means the locking_type=1 is not working well when combining activation and changes?

(Originally by Germano Veit Michel)

Comment 8 RHV bug bot 2020-06-22 11:04:06 UTC

(In reply to Germano Veit Michel from comment #7)
> I assume this means the locking_type=1 is not working well when combining
> activation and changes?

David, is this expected behavior in lvm2 2.02? We know that combining activation
and other changes is not advised (and fixed this in 4.4), but is it really can
cause corruption of vg metadata when lvm on one host with locking_type=1?

This looks very bad, and effects every LVM user. This is not the special RHV
setup with shared storage access by mulitple hosts.

(Originally by Nir Soffer)

Comment 9 RHV bug bot 2020-06-22 11:04:10 UTC

(In reply to Germano Veit Michel from comment #7)
> I was trying to reproducing this on my labs (currently on RHV 4.4), I used
> lvm2-2.03.08-3.el8.x86_64 on RHEL 8.2.
> I think I might have reproduced something similar...
> 
> It seems to happen if we do lvchange with activation + a change of tags
> (like VDSM does), which also produces this warning:
> ~~~
>   WARNING: Combining activation change with other commands is not advised.
> ~~~

Vdsm is not mixing activation with other commands in 4.4.


>   lvcreate --config 'global { locking_type=1 prioritise_write_locks=1
> wait_for_locks=1 use_lvmetad=0 }' -L 4M myvg

This is not the configuration used by vdsm on RHEL 8. You are missing many
settings, so any issue you had is not relevant to RHV.


> while [ $i -lt 200 ]; do 
>   lvchange --config 'global { locking_type=1 prioritise_write_locks=1
> wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --addtag AA
> myvg/lvol$i &
>   i=$[$i+1]
> done

This starts 200 concurrent commands...

> i="0"
> while [ $i -lt 200 ]; do 
>   lvchange --config 'global { locking_type=1 prioritise_write_locks=1
> wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --addtag BB --deltag
> AA myvg/lvol$i &
>   i=$[$i+1]
> done

This stats another 200 concurent commands, before the previous commands finished.

It is possible that commands will run in different order, so command may try to
remove a tag that was not added yet.

> i="0"
> while [ $i -lt 200 ]; do 
>   lvchange --config 'global { locking_type=1 prioritise_write_locks=1
> wait_for_locks=1 use_lvmetad=0 }' --autobackup n -a y --deltag BB
> myvg/lvol$i &
>   i=$[$i+1]
> done

You start another 200 concurent commands with same unpredictable ordering.

So you run up to 800 concurrent lvm command - on RHV host we can have only
10 concurrent lvm commands run by vdsm, so this scale is not relevant to 
RHV.

> Upon a few repetitions, I got a few of these:
> ~~~
>   /dev/loop0: Checksum error at offset 280064
>   WARNING: invalid metadata text from /dev/loop0 at 280064.
>   WARNING: metadata on /dev/loop0 at 280064 has invalid summary for VG.
>   WARNING: bad metadata text on /dev/loop0 in mda1
>   WARNING: scanning /dev/loop0 mda1 failed to read metadata summary.
>   WARNING: repair VG metadata on /dev/loop0 with vgck --updatemetadata.
>   WARNING: scan failed to get metadata summary from /dev/loop0 PVID
> tcqQQGeqIlNCW9wXdph0MaigovnXHwUR

I think we need LVM bug for this - even if LVM was used used with best settings,
it should never corrupt its own metadata when running on a single host.

(Originally by Nir Soffer)

Comment 10 RHV bug bot 2020-06-22 11:04:14 UTC

Created attachment 1690350 [details]
reproduce script for 4.4. on rhel 8

I created improved reproducer for rhel 8 that:
- does not mix activation with other commands (this is vdsm behavior in 4.4)
- uses same pv and vg confguration as vdsm (128m extent size, 2 metadata areas)
- uses same config as vdsm
- run 100 concurrent lvm commands instead of 800
- run lvm command in correct order so they cannot fail trying to modify sometihng
  that was not created yet
- test vg with 1000 lvs instead of 200
- no need to run separate shell with lvs command, lvs commands are run from
  the child processes

To use this reproducer, use 1 TiB device, for example:

    $ truncate -s 1t backing
    $ losetup /dev/loop0 backing
    $ rep-4.4.sh /dev/loop0 > log 2>&1

We need to create a similar version for 4.3, that:
- mix activation and other commands (fix was not backported to 4.3)
- use lvm2 2.02 options (.e.g remove hints=none)

(Originally by Nir Soffer)

Comment 11 RHV bug bot 2020-06-22 11:04:17 UTC

Created attachment 1690351 [details]
Log from reproducer run on Fedora 31

I ran the reproducer scripts on Fedora 31 with 
lvm2-2.03.09-1.fc31.x86_64

And could not reproduce the issue after more than an hour.

We need to run the same reproducer on RHEL 8.2 to be sure we don't
have an issue on 4.4.

(Originally by Nir Soffer)

Comment 12 RHV bug bot 2020-06-22 11:04:20 UTC

>   /dev/loop0: Checksum error at offset 280064
>   WARNING: invalid metadata text from /dev/loop0 at 280064.
>   WARNING: metadata on /dev/loop0 at 280064 has invalid summary for VG.
>   WARNING: bad metadata text on /dev/loop0 in mda1
>   WARNING: scanning /dev/loop0 mda1 failed to read metadata summary.
>   WARNING: repair VG metadata on /dev/loop0 with vgck --updatemetadata.
>   WARNING: scan failed to get metadata summary from /dev/loop0 PVID
> tcqQQGeqIlNCW9wXdph0MaigovnXHwUR

After this appears, could you stop all the commands and run 'lvs'?  I suspect it would report everything without any errors.  I believe the quoted issue may be a transient error, caused by a scanning optimization in which we sometimes allow unlocked scan data to be used for reporting.  A lot of concurrent commands is exactly the point at which this optimization can lead to this kind of issue.  The intent is that the optimization can report data from either just before or just after a concurrent command that's modifying metadata, in exchange for reducing the number of disk reads by half. 

This change (Nov 2019) made this optimization more accurate/strict and may resolve the problem if it's what I suspect:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=0c1316cda876849d5d1375d40e8cdc08db37c2b5

(Originally by David Teigland)

Comment 13 RHV bug bot 2020-06-22 11:04:23 UTC

David, what about the issue mentioned in comment 0?
This looks like real data corruption on a version that use locking_type=4.

(Originally by Nir Soffer)

Comment 14 RHV bug bot 2020-06-22 11:04:26 UTC

Germano, is it possible that this DC has some old hosts running RHV < 4.3.9
that do not include the fix?

Can you collect output of "rpm -q lvm2 vdsm" from all hosts in this setup?

(Originally by Nir Soffer)

Comment 15 RHV bug bot 2020-06-22 11:04:30 UTC

(In reply to Nir Soffer from comment #13)
> David, what about the issue mentioned in comment 0?
> This looks like real data corruption on a version that use locking_type=4.

Very hard to see the exact lvm command+output so I might have missed the relevant bit, but I'm just seeing the known "Recovery failed" output which is what we want from locking_type 4.

(Originally by David Teigland)

Comment 16 RHV bug bot 2020-06-22 11:04:33 UTC

(In reply to Nir Soffer from comment #13)
> David, what about the issue mentioned in comment 0?
> This looks like real data corruption on a version that use locking_type=4.

Very hard to see the exact lvm command+output so I might have missed the relevant bit, but I'm just seeing the known "Recovery failed" output which is what we want from locking_type 4.

(Originally by David Teigland)

Comment 17 RHV bug bot 2020-06-22 11:04:35 UTC

(In reply to David Teigland from comment #16)
> (In reply to Nir Soffer from comment #13)
> Very hard to see the exact lvm command+output so I might have missed the
> relevant bit, but I'm just seeing the known "Recovery failed" output which
> is what we want from locking_type 4.

Right, we don't see the command arguments in comment 0, this is available
only in DEBUG logs, and not logged on in all errors (we need to improve logging).

But we have another reproduction with mode details in comment 5:

Command:

/usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304272725d4b3147746b2d$|^/dev/mapper/3600a0980383135736e3f4d70794a4f42$|^/dev/mapper/3600a0980383135736e3f4d70794a4f43$|^/dev/mapper/3600a0980383135736e3f4d70794a4f44$|^/dev/mapper/3600a0980383135736e3f4d70794a4f45$|^/dev/mapper/3600a0980383135736e3f4d70794a4f46$|^/dev/mapper/3600a0980383135736e3f4d70794a4f47$|^/dev/mapper/3600a0980383135736e3f4d70794a4f48$|^/dev/mapper/3600a0980383135736e3f4d70794a4f49$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4a$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4b$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4c$|^/dev/mapper/3600a0980383135736e3f4d70794a502f$|^/dev/mapper/3600a0980383135736e3f4d70794a5059$|^/dev/mapper/3600a0980383135736e3f4d70794a505a$|^/dev/mapper/3600a0980383135736e3f4d70794a5061$|^/dev/mapper/3600a0980383135736e3f4d70794a5062$|^/dev/mapper/3600a09803831357370244d6d76627842$|^/dev/mapper/3600a09803831357370244d6d76627843$|^/dev/mapper/3600a09803831357370244d6d76627844$|^/dev/mapper/3600a09803831357370244d6d76627845$|^/dev/mapper/3600a09803831357370244d6d76627846$|^/dev/mapper/3600a09803831357370244d6d76627847$|^/dev/mapper/3600a09803831357370244d6d76627848$|^/dev/mapper/3600a09803831357370244d6d76627849$|^/dev/mapper/3600a09803831357370244d6d7662784a$|^/dev/mapper/3600a09803831357370244d6d7662784b$|^/dev/mapper/3600a09803831357370244d6d7662784c$|^/dev/mapper/3600a09803831357370244d6d7662784d$|^/dev/mapper/3600a09803831357370244d6d7662784e$|^/dev/mapper/3600a09803831357370244d6d7662784f$|^/dev/mapper/3600a09803831357370244d6d76627850$|^/dev/mapper/3600a09803831357370244d6d76627851$|^/dev/mapper/3600a09803831357370244d6d76627852$|^/dev/mapper/3600a09803831357370244d6d7662792f$|^/dev/mapper/3600a09803831357370244d6d7662794e$|^/dev/mapper/3600a09803831357370244d6d7662794f$|^/dev/mapper/3600a09803831357370244d6d76627959$|^/dev/mapper/3600a09803831357370244d6d7662795a$|^/dev/mapper/3600a09803831357370244d6d76627961$|^/dev/mapper/3600a09803831357370244d6d76627962$|^/dev/mapper/3618e728372d2c3d0219cf9300ad72509$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags --select 'vg_name = 23b7f41e-9e5f-4213-940f-7beba85d550a'

exit code:

5

stderr:

  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Metadata on /dev/mapper/3600a09803831357370244d6d7662784f at 30543872 has wrong VG name "triped"\nstripe_count = 1\n\nstripes = [\n"pv0", 60545\n]\n}\nsegment142 {\nstart_extent = 1736\nextent_count = 16\n\ntype = "striped"\nstri" expected 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Not repairing metadata for VG 23b7f41e-9e5f-4213-940f-7beba85d550a.\n  Recovery of volume group "23b7f41e-9e5f-4213-940f-7beba85d550a" failed.\n  Cannot process volume group 23b7f41e-9e5f-4213-940f-7beba85d550a\n

This command run on the SPM (part of delete volume flow, runs only on the SPM),
so we use locking_type=1.

Other hosts in this setup are using only locking_type=4 for all commands, except
special cases, like creating a new storage domain that no other host is accessing
yet.

Note that the previous command was lvchange - adding a tag *and* activating the lv.
In 4.4 this change not to mix activation and tag changes, but in 4.3 this is not fixed
yet.

/usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvchange --config ... --autobackup n --deltag PU_e753c9a8-e418-4426-93ec-92a273102913 --addtag PU_00000000-0000-0000-0000-000000000000 23b7f41e-9e5f-4213-940f-7beba85d550a/81224d51-feba-49f2-9362-331383b44626

(trimmed --config part, see comment 5 for full command)

Is it possible that activation + tag changes will cause data corruption when running
concurrently with other commands?

(Originally by Nir Soffer)

Comment 18 RHV bug bot 2020-06-22 11:04:38 UTC

It looks like this is just an issue with lvm 2.02 then?  A combination like that in lvchange is not known to be a corruption problem, but combined commands are largely an unknown area.  Is the resulting metadata problem a persistent state on disk?  A reproducer script like those mentioned above sounds like the best way to try and nail if there's a problem.  Concurrent commands on the same host, both using locking, should never be able to create bad metadata.

(Originally by David Teigland)

Comment 19 RHV bug bot 2020-06-22 11:04:41 UTC

(In reply to Nir Soffer from comment #9)
> Vdsm is not mixing activation with other commands in 4.4.

Right, this makes that test kind of useless, lvm from RHEL8 with VDSM logic used in RHEL7.

> This is not the configuration used by vdsm on RHEL 8. You are missing many
> settings, so any issue you had is not relevant to RHV.

I've added the ones I thought would matter, but I'm currently working on a better reproducer anyway.

> So you run up to 800 concurrent lvm command - on RHV host we can have only
> 10 concurrent lvm commands run by vdsm, so this scale is not relevant to 
> RHV.

600 actually, as the first 200 wait for return. Yes, its at much higher scale, to
attempt to reproduce the problem more easily.

The order is not important to me, as it will give specific LVM errors if something
executes out of order. I ignore those errors as I only care about unable to read lvm
metadata, failing to delete a tag that is not present I just ignore as it read the
current metadata correctly.

> I think we need LVM bug for this - even if LVM was used used with best
> settings,
> it should never corrupt its own metadata when running on a single host.

I forgot to mention it did not get corrupted, those were all transient errors when reading.
Same on the customer side, no corruption.

> We need to create a similar version for 4.3, that:
> - mix activation and other commands (fix was not backported to 4.3)
> - use lvm2 2.02 options (.e.g remove hints=none)

I'm currently working on a python vdsm api client that will do the create volumes, snapshot
volume and delete. This should be usable on any VDSM version, so we can use the same script
everywhere. And replace lvm packages to see if there is any difference.

(In reply to David Teigland from comment #12)
> After this appears, could you stop all the commands and run 'lvs'?  I
> suspect it would report everything without any errors.  I believe the quoted
> issue may be a transient error, caused by a scanning optimization in which
> we sometimes allow unlocked scan data to be used for reporting.  A lot of
> concurrent commands is exactly the point at which this optimization can lead
> to this kind of issue.  The intent is that the optimization can report data
> from either just before or just after a concurrent command that's modifying
> metadata, in exchange for reducing the number of disk reads by half. 
> 
> This change (Nov 2019) made this optimization more accurate/strict and may
> resolve the problem if it's what I suspect:
> https://sourceware.org/git/?p=lvm2.git;a=commit;
> h=0c1316cda876849d5d1375d40e8cdc08db37c2b5

Correct, its a transient error. The errors above happen and goes away.
It's a specific read that sometimes and randomly fails.

(In reply to Nir Soffer from comment #14)
> Germano, is it possible that this DC has some old hosts running RHV < 4.3.9
> that do not include the fix?
> 
> Can you collect output of "rpm -q lvm2 vdsm" from all hosts in this setup?
All same RHVH 4.3.9 with the fix, see comment #4.

Once I finish my reproducer with VDSM API that can be used on both 4.3 and 4.4 I'll
run again and update this BZ with the results.

(Originally by Germano Veit Michel)

Comment 20 RHV bug bot 2020-06-22 11:04:43 UTC

Germano, did you see comment 10? We have a reproducer.

You can modify it quickly to 4.3 - we need:
- use the configuration from:
  https://github.com/oVirt/vdsm/blob/e8bc03fcad919d489954b5c711b423084e051b7e/lib/vdsm/storage/lvm.py#L126
- merge the lvchange commands to mix activation and tag changes

Using vdsm to reproduce is not a good idea, we want something the LVM developers
can try and understand easily.

(Originally by Nir Soffer)

Comment 21 RHV bug bot 2020-06-22 11:04:46 UTC

(In reply to Nir Soffer from comment #20)
> Germano, did you see comment 10? We have a reproducer.
> 
> You can modify it quickly to 4.3 - we need:
> - use the configuration from:
>  
> https://github.com/oVirt/vdsm/blob/e8bc03fcad919d489954b5c711b423084e051b7e/
> lib/vdsm/storage/lvm.py#L126
> - merge the lvchange commands to mix activation and tag changes

Yes I saw this and modified to 4.3 already. 
It's running here on the background on RHVH 4.3.9 for about 30 min, no errors so far.

> Using vdsm to reproduce is not a good idea, we want something the LVM
> developers can try and understand easily.

Good point.

(Originally by Germano Veit Michel)

Comment 23 RHV bug bot 2020-06-22 11:04:53 UTC

With the version at comment #22, the most I get is a bunch of these:
~~~
  Scan of VG bz1837199 from /dev/loop0 found metadata seqno 2639 vs previous 2638.
~~~
But not corruption or failed reloads etc.

Looking at the logs in comment #5, I've noticed:
* VDSM adds filters for all PVs when doing some LVM commands, including from other SDs that the operation is not related
* Monitoring does some vgck
Plus:
* Nir's reproducer creates all LVs upfront. This is not exactly what VDSM is doing when the error happens. 
  It happens around times when we delete or create VLs.

So I modified the script to:
* Do vgck too
* Does addtag and deltag at the same time
* Create and delete the LVs in the main loop
* Able to run 2 scripts in parallel
  ./rep-4.3_new.sh [DEV] [VG_NAME] [DEV OF OTHER SCRIPT]
  ./rep-4.3_new.sh /dev/loop0 loop0vg /dev/loop1
  ./rep-4.3_new.sh /dev/loop1 loop1vg /dev/loop0

These 2 ran as following:
~~~
using device: /dev/loop0
using config: devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/loop0$|^/dev/loop1$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }
creating pv /dev/loop0...
  Physical volume "/dev/loop0" successfully created.
creating vg loop0vg...
  Volume group "loop0vg" successfully created
~~~

~~~
using device: /dev/loop1
using config: devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/loop1$|^/dev/loop0$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }
creating pv /dev/loop1...
  Physical volume "/dev/loop1" successfully created.
creating vg loop1vg...
  Volume group "loop1vg" successfully created
~~~

Eventually the first loop, that only does operations on loop0, failed while reading loop1:
~~~
  Incorrect checksum in metadata area header on /dev/loop1 at 1099377410048
  Failed to read mda header from /dev/loop1
  Failed to scan VG from /dev/loop1
~~~

I've tried to understand the locking that locking_type=1 does, but I couldn't reach a conclusion. Is this 1 lock for each VG?
# ll -h /var/lock/lvm/
total 0
-rwx------. 1 root root 0 May 21 11:51 V_loop0vg
-rwx------. 1 root root 0 May 21 11:51 V_loop0vg:aux
-rwx------. 1 root root 0 May 21 11:51 V_loop1vg
-rwx------. 1 root root 0 May 21 11:51 V_loop1vg:aux

I'm now also getting other interesting things, for example:

A) loop1 logs about loop0vg backup
~~~
  /etc/lvm/archive/loop0vg_27203-768988408.vg: stat failed: No such file or directory
  /etc/lvm/backup/loop0vg.tmp: rename to /etc/lvm/backup/loop0vg failed: No such file or directory
  Backup of volume group loop0vg metadata failed.
~~~

B) loop1 logs about loop0vg seqno
~~~
  Scan of VG loop0vg from /dev/loop0 found metadata seqno 5996 vs previous 5995.
~~~

Would the following be possible?
1. lvm command from loop0 sees both loop0vg and loop1vg due to the filters
2. lvm command from loop0 reads some garbage from loop1vg, as loop1 is updating it
3. lvm command from loop0 does some modification on loop1vg (like recovery?) without locking

Note the customer has several VGs on the same storage pool:

PV                                               VG
/dev/mapper/3600a0980383135736e3f4d70794a5062    02868dfb-01ad-4da7-8a12-e4fd45b70e40
/dev/mapper/3600a0980383135736e3f4d70794a4f4c    0619e67b-24d8-46a5-bc87-f6dbc4aebc13
/dev/mapper/3600a0980383135736e3f4d70794a4f46    08f533cb-4dca-4cdb-b9aa-6b4f363e9f6f
/dev/mapper/3600a09803831357370244d6d7662784d    0d647957-73f6-44eb-a93a-45a3509820ba
/dev/mapper/3600a09803831357370244d6d7662784c    0faa64d2-04ed-4354-b584-e23a8d71eb90
/dev/mapper/3600a09803831357370244d6d7662794e    10ca2b4d-28b1-4eb5-9b74-96e05dee4c43
/dev/mapper/3600a09803831357370244d6d76627849    1273d9b4-a656-4f84-a5a8-231be2c3c5c2
/dev/mapper/3600a09803831357370244d6d7662784f    23b7f41e-9e5f-4213-940f-7beba85d550a
/dev/mapper/3600a0980383135736e3f4d70794a4f4a    30ddb95d-8640-4fe7-a046-f4a47a85fb9f
/dev/mapper/3600a09803831357370244d6d7662792f    3461def3-db0c-466e-ae2c-b23e526f0abd
/dev/mapper/3600a0980383135736e3f4d70794a5059    35afeebd-4599-49f2-a7f6-1c0d46f70631
/dev/mapper/3600a0980383135736e3f4d70794a5061    396d31de-5407-44de-bdc9-3c962a841c8b
/dev/mapper/3600a0980383135736e3f4d70794a4f4b    4fa8be9f-c6e6-4c4c-b4ce-3d642c30f5de
/dev/mapper/3600a0980383135736e3f4d70794a4f47    5b6b69c5-677b-43e4-a64a-d9bb139f6006
/dev/mapper/3600a0980383135736e3f4d70794a502f    6619de7c-57db-436a-a71d-85b9377e7833
/dev/mapper/3600a09803831357370244d6d76627852    66905389-2643-4f05-aff7-dcfbd27f3835
/dev/mapper/3600a09803831357370244d6d76627847    708806a7-dfdf-41ba-bad4-d41880c8c655
/dev/mapper/3600a09803831357370244d6d76627851    76e07fce-fa1b-44e3-98e2-59802a6f692d
/dev/mapper/3600a09803831357370244d6d7662784e    7eb99b8a-8898-400d-855e-b2561995b608
/dev/mapper/3600a0980383135736e3f4d70794a4f48    819a3d94-7a43-4851-8c30-e4c7334189c0
/dev/mapper/3600a0980383135736e3f4d70794a4f42    8454ba20-ca97-4d94-8a6c-12cb60afbd03
/dev/mapper/3600a0980383135736e3f4d70794a4f45    8efe3074-8e0c-4ef7-9348-c470967b5e63
/dev/mapper/3600a09803831357370244d6d76627962    98e40f41-02ff-49ba-9eb2-bc66b3521ad1
/dev/mapper/3600a0980383135736e3f4d70794a505a    98ec4f64-e173-4aeb-8bbf-6b2789e3625e
/dev/mapper/3600a09803831357370244d6d76627844    997eea11-6308-4bc3-be4d-f8ef2f53465e
/dev/mapper/3600a09803831357370244d6d76627845    9a9a83f8-da8b-4081-b49f-d3c4dae54e65
/dev/mapper/3600a098038304272725d4b3147746b2d    a0e22bba-3d50-462c-9657-e115854136e8
/dev/mapper/3600a0980383135736e3f4d70794a4f49    a489622c-a817-4e2c-b33d-2e790279578d
/dev/mapper/3600a09803831357370244d6d76627843    b4ce7bcb-2dbf-4f31-b38b-f4415cd5f679
/dev/mapper/3600a09803831357370244d6d76627850    b8d415a4-9c4d-4b1d-8a40-8d5fb5660ceb
/dev/mapper/3600a09803831357370244d6d7662795a    b942a8aa-6071-4dea-bb3f-6fbb4816e7b2
/dev/mapper/3600a0980383135736e3f4d70794a4f43    b988631e-9453-4bd2-b77a-4aa496ebc30b
/dev/mapper/3600a0980383135736e3f4d70794a4f44    c0f62ee7-e5df-44ea-aa10-d73600cb3ff1
/dev/mapper/3600a09803831357370244d6d7662784b    c2997c15-f6d8-48e0-9850-ce8c4ff36472
/dev/mapper/3600a09803831357370244d6d76627961    c5d2ab76-9317-4d68-8dc7-18073ac2d04f
/dev/mapper/3600a09803831357370244d6d76627959    ea086bb6-3d2b-4475-8d31-e118177c6da6
/dev/mapper/3600a09803831357370244d6d76627842    ee1d5620-eea6-4c58-b278-dc931b55ddb4
/dev/mapper/3600a09803831357370244d6d76627846    efe12b01-690c-4ee1-9034-8e950bbc4231
/dev/mapper/3600a09803831357370244d6d7662784a    f51894d8-5783-4975-b90a-1d6ddfa20c21
/dev/mapper/3600a09803831357370244d6d76627848    facd02e6-5381-4ac8-b3e6-e89e6babfc37
/dev/mapper/3600a09803831357370244d6d7662794f    fd2853bc-b5e1-44ff-b840-28e9ebbffe46

Look for example at this lvs. It uses a filter with all LUNs, but its operating just on VG a489622c-a817-4e2c-b33d-2e790279578d from LUN 3600a0980383135736e3f4d70794a4f49.

2020-05-20 12:48:07,521+1000 DEBUG (tasks/5) [storage.Misc.excCmd] /usr/bin/taskset --cpu-list 0-47 /usr/bin/sudo -n /sbin/lvm lvs --config 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600a098038304272725d4b3147746b2d$|^/dev/mapper/3600a0980383135736e3f4d70794a4f42$|^/dev/mapper/3600a0980383135736e3f4d70794a4f43$|^/dev/mapper/3600a0980383135736e3f4d70794a4f44$|^/dev/mapper/3600a0980383135736e3f4d70794a4f45$|^/dev/mapper/3600a0980383135736e3f4d70794a4f46$|^/dev/mapper/3600a0980383135736e3f4d70794a4f47$|^/dev/mapper/3600a0980383135736e3f4d70794a4f48$|^/dev/mapper/3600a0980383135736e3f4d70794a4f49$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4a$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4b$|^/dev/mapper/3600a0980383135736e3f4d70794a4f4c$|^/dev/mapper/3600a0980383135736e3f4d70794a502f$|^/dev/mapper/3600a0980383135736e3f4d70794a5059$|^/dev/mapper/3600a0980383135736e3f4d70794a505a$|^/dev/mapper/3600a0980383135736e3f4d70794a5061$|^/dev/mapper/3600a0980383135736e3f4d70794a5062$|^/dev/mapper/3600a09803831357370244d6d76627842$|^/dev/mapper/3600a09803831357370244d6d76627843$|^/dev/mapper/3600a09803831357370244d6d76627844$|^/dev/mapper/3600a09803831357370244d6d76627845$|^/dev/mapper/3600a09803831357370244d6d76627846$|^/dev/mapper/3600a09803831357370244d6d76627847$|^/dev/mapper/3600a09803831357370244d6d76627848$|^/dev/mapper/3600a09803831357370244d6d76627849$|^/dev/mapper/3600a09803831357370244d6d7662784a$|^/dev/mapper/3600a09803831357370244d6d7662784b$|^/dev/mapper/3600a09803831357370244d6d7662784c$|^/dev/mapper/3600a09803831357370244d6d7662784d$|^/dev/mapper/3600a09803831357370244d6d7662784e$|^/dev/mapper/3600a09803831357370244d6d7662784f$|^/dev/mapper/3600a09803831357370244d6d76627850$|^/dev/mapper/3600a09803831357370244d6d76627851$|^/dev/mapper/3600a09803831357370244d6d76627852$|^/dev/mapper/3600a09803831357370244d6d7662792f$|^/dev/mapper/3600a09803831357370244d6d7662794e$|^/dev/mapper/3600a09803831357370244d6d7662794f$|^/dev/mapper/3600a09803831357370244d6d76627959$|^/dev/mapper/3600a09803831357370244d6d7662795a$|^/dev/mapper/3600a09803831357370244d6d76627961$|^/dev/mapper/3600a09803831357370244d6d76627962$|^/dev/mapper/3618e728372d2c3d0219cf9300ad72509$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags --select 'vg_name = a489622c-a817-4e2c-b33d-2e790279578d' (cwd None) (commands:198)

Could this interfere on the other VGs with locking_type=1? Does this make any sense?

(Originally by Germano Veit Michel)

Comment 24 RHV bug bot 2020-06-22 11:04:57 UTC

(In reply to Germano Veit Michel from comment #23)
> A) loop1 logs about loop0vg backup
> ~~~
>   /etc/lvm/archive/loop0vg_27203-768988408.vg: stat failed: No such file or
> directory
>   /etc/lvm/backup/loop0vg.tmp: rename to /etc/lvm/backup/loop0vg failed: No
> such file or directory
>   Backup of volume group loop0vg metadata failed.
> ~~~

Additional comment on this.

It seems VDSM 4.3 only adds --autobackup n on a few select commands that modify the metadata.

def createLV(vgName, lvName, size, activate=True, contiguous=False,
    cmd.extend(LVM_NOBACKUP)
def removeLVs(vgName, lvNames, read_only=None):
    cmd.extend(LVM_NOBACKUP)
def extendLV(vgName, lvName, size_mb):
    cmd = ("lvextend",) + LVM_NOBACKUP
def reduceLV(vgName, lvName, size_mb, force=False):
    cmd = ("lvreduce",) + LVM_NOBACKUP
def renameLV(vg, oldlv, newlv):
    cmd = ("lvrename",) + LVM_NOBACKUP + (vg, oldlv, newlv)
def addtag(vg, lv, tag):
    cmd = ("lvchange",) + LVM_NOBACKUP + ("--addtag", tag) + (lvname,)
def changeLVTags(vg, lv, delTags=(), addTags=()):
    cmd.extend(LVM_NOBACKUP)

So those backup attempts could be from lvs/vgs/vgck while doing lvs/vgs/vgck on another VG? 
Not sure how this works with locking or if it could be dangerous.

(Originally by Germano Veit Michel)

Comment 25 RHV bug bot 2020-06-22 11:05:00 UTC

(In reply to Germano Veit Michel from comment #24)
> (In reply to Germano Veit Michel from comment #23)
> > A) loop1 logs about loop0vg backup
> > ~~~
> >   /etc/lvm/archive/loop0vg_27203-768988408.vg: stat failed: No such file or
> > directory
> >   /etc/lvm/backup/loop0vg.tmp: rename to /etc/lvm/backup/loop0vg failed: No
> > such file or directory
> >   Backup of volume group loop0vg metadata failed.
> > ~~~
> 
> Additional comment on this.
> 
> It seems VDSM 4.3 only adds --autobackup n on a few select commands that
> modify the metadata.
...
This is the same in 4.4, we did not change anything in this area.

The reproducer I uploaded was correct, using "--autobackup n" for lvchange
command.

If you added new commands, use the same options used by vdsm to so we test
only relevant use case here.

(Originally by Nir Soffer)

Comment 26 RHV bug bot 2020-06-22 11:05:04 UTC

(In reply to Nir Soffer from comment #25)
> > It seems VDSM 4.3 only adds --autobackup n on a few select commands that
> > modify the metadata.
> ...
> This is the same in 4.4, we did not change anything in this area.

Right.

> The reproducer I uploaded was correct, using "--autobackup n" for lvchange
> command.

Yes it is, but it was not reproducing even on RHEL 7.
So I created a version that adds some more commands that do not add "--autobackup n"
and also created and removed LVs, which is what seems to trigger the issue for the customer.

> If you added new commands, use the same options used by vdsm to so we test
> only relevant use case here.

That's what I did. But I'm not sure we are reproducing the customer's problem, I'm afraid not.

I've also double checked with the customer: all hosts on RHVH 4.3.9 and no other machine has access to the LUNs.

(Originally by Germano Veit Michel)

Comment 27 RHV bug bot 2020-06-22 11:05:08 UTC

(In reply to Germano Veit Michel from comment #23)
> With the version at comment #22, the most I get is a bunch of these:
> ~~~
>   Scan of VG bz1837199 from /dev/loop0 found metadata seqno 2639 vs previous
> 2638.
> ~~~
> But not corruption or failed reloads etc.

Good, this proves that we don't have an issue with mixing activation and other
commands which was first guess.
 
> Looking at the logs in comment #5, I've noticed:
> * VDSM adds filters for all PVs when doing some LVM commands, including from
> other SDs that the operation is not related

Right, some operations like reloading pvs and vgs always use wide filter including
all the devices on the host. This is required to discover pvs that were added to a vg
on another host.

Most operations use narrow filter using only the pvs belonging to the vg. This is very
important for performance. However if a command fails, we refresh device list and build
a wider filer. If the filter is different we run the command again with the wider filter.
So even these commands can run with a wide filter.

In 4.3.9 we added a warning when retrying command with a wider filter:

 487                 log.warning(
 488                     "Command with specific filter failed or returned no data, "
 489                     "retrying with a wider filter, cmd=%r rc=%r out=%r err=%r",
 490                     full_cmd, rc, out, err)    

> * Monitoring does some vgck
> Plus:
> * Nir's reproducer creates all LVs upfront. This is not exactly what VDSM is
> doing when the error happens. 
>   It happens around times when we delete or create VLs.
> 
> So I modified the script to:
> * Do vgck too

Makes sense

> * Does addtag and deltag at the same time

Vdsm does both --addtag and --addtag --deltag, and --deltag. I think we already 
had all combinations.

> * Create and delete the LVs in the main loop

Makes sense

> * Able to run 2 scripts in parallel
>   ./rep-4.3_new.sh [DEV] [VG_NAME] [DEV OF OTHER SCRIPT]
>   ./rep-4.3_new.sh /dev/loop0 loop0vg /dev/loop1
>   ./rep-4.3_new.sh /dev/loop1 loop1vg /dev/loop0

I don't think this needed, we can just create 2 pvs and 2 vgs and run concurrent
commands on all of them, but it this is easier why not.

...
> Eventually the first loop, that only does operations on loop0, failed while
> reading loop1:
> ~~~
>   Incorrect checksum in metadata area header on /dev/loop1 at 1099377410048
>   Failed to read mda header from /dev/loop1
>   Failed to scan VG from /dev/loop1

I think this happens because all lvm commands in the test use wide filter with 
all devices, and we use --select in the lvs command. --select is reading all
the pvs on the host, getting all lvm metadata from all vgs, and finally
filtering the metadata for display purpose.

On the other hand, locking is done per vg, I guess the lvs command is reading
data from other vgs without locks, and may see inconsistent metadata if another
command modified the other vg at the same time.

> I'm now also getting other interesting things, for example:
> 
> A) loop1 logs about loop0vg backup
> ~~~
>   /etc/lvm/archive/loop0vg_27203-768988408.vg: stat failed: No such file or
> directory
>   /etc/lvm/backup/loop0vg.tmp: rename to /etc/lvm/backup/loop0vg failed: No
> such file or directory
>   Backup of volume group loop0vg metadata failed.
> ~~~

Which command is logging this?

> B) loop1 logs about loop0vg seqno
> ~~~
>   Scan of VG loop0vg from /dev/loop0 found metadata seqno 5996 vs previous
> 5995.
> ~~~

Which command logged this?

This means that some other command modified vg metadta while first command 
was using it. This is common when running commands on other hosts with
locking_type=4.
 
> Would the following be possible?
> 1. lvm command from loop0 sees both loop0vg and loop1vg due to the filters

Right

> 2. lvm command from loop0 reads some garbage from loop1vg, as loop1 is
> updating it

Right, probably lvs does not take any locks since it should be read only,
or it tries to minimize locking which may lead to reading inconsistent
metadata.

> 3. lvm command from loop0 does some modification on loop1vg (like recovery?)
> without locking

This must use locking, otherwise the vg would be corrupted.

Did you reproduce command failures during the run or only wanrings?

I think the root cause of this is that part of the fix for bug 1553133 was to 
use:

   lvs --select 'vg_name = xxx-yyy'
   lvs --select 'vg_name = xxx-yyy && (lv_name = yyy-zzz || lv_name = zzz-aaa)'
   vgs --select 'vg_name = xxx-yyy || vg_name = yyy-zzz'
   pvs --select 'pv_name = /dev/mapper/xxx || pv_name = /dev/maper/yyy'

Instead of:

   lvs xxx-yyy
   lvs xxx-yyy/yyy-zzz xxx-yyy/zzz-aaa
   vgs xxx-yyy yyy-zzz
   pvs /dev/mapper/xxx /dev/maper/yyy

The reason for this change is to avoid failures when lvm entity does not exist.
A typical case is runing:

   vgs no-such-vg

This is the way vdsm check if a vg exist when create new storage domain. For example
when creating new NFS domain, we first run vgs to find it the domain already exist.
This is probably not needed but exist in vdsm since 2009.

Before we added support for locking_type=4, such command would fail logging bogus
traceback (the vg does not exist but the code does not know that this expected).
Then we would run the command again with a wider filter (see above), and of course
it would fail again.

This sounds bad but when running with locking_type=4, commands like lvs/vgs/pvs do
fail randomly when they find inconsistent metadata. When they fail other flows may
fail as we see in this bug. So we added a retry mechanism, running command in read
only mode up to 4 times, with backoff mechanism to ensure that we slow down each
time a command fail.

With this retry mechanism, commands like "vgs no-such-vg" would fail 6 times creating
lot of noise and delaying the flow for no reason.

When using "vgs --select no-such-vg", the command does not fails, but we get empty
output, which exactly what we want. This tells us that there is no vg name no-such-vg.
We still try with a wider filter to make sure we don't have stale info about the vg
and it was modified on another host. The command with the wider filter will also not
fail and we complete the check quickly and without logging any bogus errors.

Same thing for lvs when we check if lv exists.

So I think what happens is:
- We start to use --select instead of specifying entities
- So lvs/vgs/pvs started to always read all pvs
- So these commands are more likely to see inconsistent metadata
- On the SPM we have only one retry (and only when we use narrow filter)
- So lvs/vgs/pvs are more likely to fail on the SPM.

I think --select is bad for us, we need to return to specifying pvs/vgs/lvs
when running the commands instead of using --select.

We need to add special handling for calls that are meant to check if an entity
exists like "vgs no-such-vg" "lvs vg-name/lv-name". Looks like the only way
to do this is to run command like:

 vgs test missing; echo $?
  Volume group "missing" not found.
  Cannot process volume group missing
  VG   #PV #LV #SN Attr   VSize     VFree    
  test   1   1   0 wz--n- <1024.00g <1023.00g
5

Drop stderr, since we don't know if this is an error:

 vgs test missing 2>/dev/null; echo $?
  VG   #PV #LV #SN Attr   VSize     VFree    
  test   1   1   0 wz--n- <1024.00g <1023.00g
5

And ignore exit code 5:

 vgs test missing 2>/dev/null
  VG   #PV #LV #SN Attr   VSize     VFree    
  test   1   1   0 wz--n- <1024.00g <1023.00g

Assuming that lvm command did not fail because of another error,
we an tell that "missing" was not reported in the output, so we 
can tell that it does not exist.

On the host running this command I have:

# pvs
  PV                                                    VG     Fmt  Attr PSize     PFree    
  /dev/loop0                                            test   lvm2 a--  <1024.00g <1023.00g
  /dev/mapper/luks-a010e40c-d99a-465e-b481-7da059cbd7b5 fedora lvm2 a--   <237.47g        0 

So the vgs command above will not look at pv
/dev/mapper/luks-a010e40c-d99a-465e-b481-7da059cbd7b5

And will not be affected by inconsistent metadata when this pv is modified by other
commands.

David, can you confirm my theory?

Do we have a better way to run lvm command for the purpose of checking if
a pv, vg, or lv exists, replacing --select?

(Originally by Nir Soffer)

Comment 28 RHV bug bot 2020-06-22 11:05:12 UTC

Germano, if you replace --select with explicit entities in the
reproducer, does it eliminate the failures?

Instead of this:

    lvs --config "$config" --noheadings --select "vg_name = $vg_name && lv_name = $lv_name"

Use:

    lvs --config "$config" --noheadings $vg_name/$lv_name"

(Originally by Nir Soffer)

Comment 29 RHV bug bot 2020-06-22 11:05:15 UTC

Germano, actually lvs is a special case where we can use --select in a better way.
If we use:

    lvs --config "$config" --noheadings --select "lv_name = $lv_name" $vg_name

Does it eliminate the failures?

This should read only vg-name metadata, and apply the selection on the results
from this vg, so it may not be affected by concurrent changes in other vgs.

(Originally by Nir Soffer)

Comment 30 RHV bug bot 2020-06-22 11:05:19 UTC

Hi Nir,

I need to do more tests and for more time to be more sure of this, but so far the data indicates that:

(In reply to Nir Soffer from comment #27)
> > A) loop1 logs about loop0vg backup
> > ~~~
> >   /etc/lvm/archive/loop0vg_27203-768988408.vg: stat failed: No such file or
> > directory
> >   /etc/lvm/backup/loop0vg.tmp: rename to /etc/lvm/backup/loop0vg failed: No
> > such file or directory
> >   Backup of volume group loop0vg metadata failed.
> > ~~~
> 
> Which command is logging this?

From the set I'm testing (lvcreate, lvremove, lvchange, lvs, vgs) only vgs seems to do it.

$ vgs --config 'devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/loop1$|^/dev/loop0$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings --select 'vg_name = loop1vg'

If done like below, passing the vg and not filtering on it, it seems to stop with the backups of the other loop0 vg.

vgs --config 'devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/loop1$|^/dev/loop0$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }' --noheadings loop1vg

> > B) loop1 logs about loop0vg seqno
> > ~~~
> >   Scan of VG loop0vg from /dev/loop0 found metadata seqno 5996 vs previous
> > 5995.
> > ~~~
> 
> Which command logged this?

All of them as long as there is another command running that modifies the metadata to increase seqno.
It's reproducible with a single loop on RHEL7 (works OK on RHEL8)

  Scan of VG loop1vg from /dev/loop1 found metadata seqno 18 vs previous 17.
  Scan of VG loop1vg from /dev/loop1 found metadata seqno 36 vs previous 35.

The loop to reproduce this can be as simple as follows:

  lvcreate --config "$config" --autobackup n -L 1g -n lv_${i}_${j} -an $vg_name
  lvs --config "$config" --noheadings --select "lv_name = $lv_name" $vg_name
  lvremove --config "$config" --autobackup n $vg_name/lv_${i}_${j}

> Germano, actually lvs is a special case where we can use --select in a
> better way.
> If we use:
> 
>     lvs --config "$config" --noheadings --select "lv_name = $lv_name"
> $vg_name
> 
> Does it eliminate the failures?

Continues to have the "found metadata seqno vs previous" ones. I haven't found
a way to get rid of those yet, they appear everywhere on RHEL7.

I think the biggest cause of the other errors/warnings is vgs, also done from SD monitoring. 
Note the customer has around 40 SDs. I still suspect this is somehow related to the high
number of SDs. 

Hope this helps in the meantime. I'll do a better and longer test over the weekend.

(Originally by Germano Veit Michel)

Comment 31 RHV bug bot 2020-06-22 11:05:23 UTC

(In reply to Nir Soffer from comment #27)
> # pvs
>   PV                                                    VG     Fmt  Attr
> PSize     PFree    
>   /dev/loop0                                            test   lvm2 a-- 
> <1024.00g <1023.00g
>   /dev/mapper/luks-a010e40c-d99a-465e-b481-7da059cbd7b5 fedora lvm2 a--  
> <237.47g        0 
> 
> So the vgs command above will not look at pv
> /dev/mapper/luks-a010e40c-d99a-465e-b481-7da059cbd7b5
> 
> And will not be affected by inconsistent metadata when this pv is modified
> by other commands.

It depends on which inconsistent issue we're talking about.  Every command includes two phases that read disks:

1. scanning phase, in which all devices that pass the filter are scanned for lvm metadata, without any locks.
In this phase you can get some warnings about inconsistencies if the devs are being modified at the same time from other hosts, but I don't think you would get any command failures from the phase.  This phase figures out which devices need to be read for a given VG name, i.e. vg "foo" exists on dev1 and dev2.

2. processing phase.  If the command has specified one VG name on the command line, i.e. "vgs foo", then this phase will lock foo, reread the devs for foo, i.e. dev1 and dev2, and then either report info about foo or modify foo (depending on the specific command).  This phase will not look at other devs that are not in vg foo.  If another host modifies VG foo during this phase, then you'll probably get a command failure (I don't remember which off hand, and this phase has been rewritten for rhel8, although the basic idea is the same.)

If you use "vgs --select vg_name=foo', then step 2 is repeated for every VG that was found during step 1.  Of course, only VG foo will be displayed as a result, but since step 2 is being repeated for every visible VG, there is much more chance you'll hit an error due to one of those VGs being modified somewhere while you're reading it.


> David, can you confirm my theory?
> 
> Do we have a better way to run lvm command for the purpose of checking if
> a pv, vg, or lv exists, replacing --select?

To check if a specific device is a PV, pvck is probably the most efficient, and it should only read that single device.

To check if a specific VG name exists somewhere, then you have to read every device, and 'vgs <name>' is probably best.

(Originally by David Teigland)

Comment 32 RHV bug bot 2020-06-22 11:05:28 UTC

>   Scan of VG loop1vg from /dev/loop1 found metadata seqno 18 vs previous 17.
>   Scan of VG loop1vg from /dev/loop1 found metadata seqno 36 vs previous 35.

Don't worry about these, they are harmless and are printed by the scanning phase mentioned above, and are expected if the given vg is being modified at the same time by another host.

(Originally by David Teigland)

Comment 33 RHV bug bot 2020-06-22 11:05:31 UTC

There is an interesting fact, or coincidence:

* Environment 40+ Block Storage Domains
* Several VMs and Disks
* This problem happened 3 times already. But always with a particular VM, not with the any of the other VMs.

The VM that this is happening during snapshots (which triggers several lvcreate/lvremove/lvchange) has the highest count of disks in the environment (13), spread across 2 storage domains.
The next VM with most disks has 11, but all of its disks are on the same storage domain. That VM was not affected yet.

Customer will be moving some of those 11 disks of the non-affected VM to another Storage Domain this week, it will be interesting to see if its starts hitting the problem too.

(Originally by Germano Veit Michel)

Comment 34 RHV bug bot 2020-06-22 11:05:40 UTC

(In reply to David Teigland from comment #31)
> If you use "vgs --select vg_name=foo', then step 2 is repeated for every VG
> that was found during step 1.  Of course, only VG foo will be displayed as a
> result, but since step 2 is being repeated for every visible VG, there is
> much more chance you'll hit an error due to one of those VGs being modified
> somewhere while you're reading it.

This sounds bad, looks like we cannot use --select if vg name is not specified.

But how about:

    lvs --select 'lv_name = <lv-name>' vg-name

Is this safe, never failing because another vg was modified by concurent
lvm command on the same host?

This has the important property of not failing if <lvm-name> does not 
exist, but returning no output. The caller can use this to check if 
lv-name exist.

> > Do we have a better way to run lvm command for the purpose of checking if
> > a pv, vg, or lv exists, replacing --select?
> 
> To check if a specific device is a PV, pvck is probably the most efficient,
> and it should only read that single device.

Noted
 
> To check if a specific VG name exists somewhere, then you have to read every
> device, and 'vgs <name>' is probably best.

But when this command fail, we don't know if the issue is non-existing vg
or temporary error accessing storage. This is not usable command.

I need more info on pvs command - we have 2 use cases:

- get all pvs on the system

    pvs

This commands access of pvs in the system. Can it fail some vgs were
modified while running the command?

If it does, we must add locking to ensure that no modification is done
while we run such command.

- get certain pvs

  pvs --select 'pv_name = <pv-name1> || pv_name = <pv-name1>'


For this I assume that we have the same issue we have with vgs, and this
command will access all pvs and may fail if some vg is modified?

Should vdsm add its own locking to avoid running lvs/vgs/pvs while lvm
commands that modify the vg are running? I expect lvm to handle this 
locking if needed, but if this is not solved in rhel 7, we can add our
own locking on to avoid such issues.

(Originally by Nir Soffer)

Comment 35 RHV bug bot 2020-06-22 11:05:44 UTC

Had some reproducer attempts:

1. Initiate 10 vgs
2. Loops 1000 times:
    2.1. For each vg (done in backgound to parrllel operation with other vgs):
    2.1.1   Create 10 lvs
    2.1.2   For each lv:
              # NOTE: vdsm mixes activation with other changes in 4.3.
                lvchange --config "${config}" --autobackup n -ay --addtag tag1 ${lv_path}
                lvs --config "${config}" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
                lvchange --config "${config}" --autobackup n -an --addtag tag2 --deltag tag1 ${lv_path}
                lvs --config "${config}" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
                lvchange --config "${config}" --autobackup n -ay --deltag tag2 ${lv_path}
                lvs --config "${config}" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
                lvchange --config "${config}" --autobackup n -an ${lv_path}

    2.1.3  Delete the 10 lvs
    2.2. Wait for current trial to finish


For the RHEL8.2 run, the lv activations were separated:

           # NOTE: vdsm does not mix activation with other changes in 4.4.
            lvchange --config "${config}" --autobackup n -ay ${lv_path}
            lvchange --config "${config}" --autobackup n --addtag tag1 ${lv_path}
            lvs --config "$config" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
            lvchange --config "${config}" --autobackup n -an ${lv_path}
            lvchange --config "${config}" --autobackup n --addtag tag2 --deltag tag1 ${lv_path}
            lvs --config "$config" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
            lvchange --config "${config}" --autobackup n -ay ${lv_path}
            lvchange --config "${config}" --autobackup n --deltag tag2 ${lv_path}
            lvs --config "${config}" --noheadings --select "vg_name = ${vg_name} && lv_name = ${lv_name}"
            lvchange --config "${config}" --autobackup n -an ${lv_path}





1 - RHEL8.2 with lvm2-2.03.08-3.el8.x86_64, attached script and log under test_el8.2

Seems to complete with no mixups, except of dm issues like /sys/dev/block/253:5/dm/uuid: fopen failed: No such file or directory
though using only loop devices for the test with filter filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ]
but all operation seems to go smoothly.



2 - CentOS 7.8 with lvm2-2.02.186-7.el7_8.2.x86_64, attached script and log under test_el7.8

also had same dm issues, but also had VG scan issues like

  Scan of VG bz1837199_7 from /dev/loop6 found metadata seqno 2 vs previous 1.

And later had failures to remove lvs

removing bz1837199_4/lv_4_1 bz1837199_4/lv_4_2 bz1837199_4/lv_4_3 bz1837199_4/lv_4_4 bz1837199_4/lv_4_5 bz1837199_4/lv_4_6 bz1837199_4/lv_4_7 bz1837199_4/lv_4_8 bz1837199_4/lv_4_9 bz1837199_4/lv_4_10 ...
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Failed to find logical volume "bz1837199_4/lv_4_1"
  Failed to find logical volume "bz1837199_4/lv_4_2"
  Failed to find logical volume "bz1837199_4/lv_4_3"
  Failed to find logical volume "bz1837199_4/lv_4_4"
  Failed to find logical volume "bz1837199_4/lv_4_5"
  Failed to find logical volume "bz1837199_4/lv_4_6"
  Failed to find logical volume "bz1837199_4/lv_4_7"
  Failed to find logical volume "bz1837199_4/lv_4_8"
  Failed to find logical volume "bz1837199_4/lv_4_9"
  Failed to find logical volume "bz1837199_4/lv_4_10"


So next trials were failing to create same vg/lv again since they already exist for the vg
and finised prematurely:

creating bz1837199_4/lv_4_1...
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Logical Volume "lv_4_1" already exists in volume group "bz1837199_4"

(Originally by Amit Bawer)

Comment 36 RHV bug bot 2020-06-22 11:05:48 UTC

Created attachment 1691692 [details]
test run on el7.8 per comment #35

(Originally by Amit Bawer)

Comment 37 RHV bug bot 2020-06-22 11:05:51 UTC

Created attachment 1691693 [details]
test run on el8.2 per comment #35

(Originally by Amit Bawer)

Comment 38 RHV bug bot 2020-06-22 11:05:56 UTC

(In reply to Amit Bawer from comment #35)

> And later had failures to remove lvs
> 
> removing bz1837199_4/lv_4_1 bz1837199_4/lv_4_2 bz1837199_4/lv_4_3
> bz1837199_4/lv_4_4 bz1837199_4/lv_4_5 bz1837199_4/lv_4_6 bz1837199_4/lv_4_7
> bz1837199_4/lv_4_8 bz1837199_4/lv_4_9 bz1837199_4/lv_4_10 ...
>   WARNING: Not using lvmetad because config setting use_lvmetad=0.
>   WARNING: To avoid corruption, rescan devices to make changes visible
> (pvscan --cache).
>   Failed to find logical volume "bz1837199_4/lv_4_1"
>   Failed to find logical volume "bz1837199_4/lv_4_2"
>   Failed to find logical volume "bz1837199_4/lv_4_3"
>   Failed to find logical volume "bz1837199_4/lv_4_4"
>   Failed to find logical volume "bz1837199_4/lv_4_5"
>   Failed to find logical volume "bz1837199_4/lv_4_6"
>   Failed to find logical volume "bz1837199_4/lv_4_7"
>   Failed to find logical volume "bz1837199_4/lv_4_8"
>   Failed to find logical volume "bz1837199_4/lv_4_9"
>   Failed to find logical volume "bz1837199_4/lv_4_10"
> 
> 
> So next trials were failing to create same vg/lv again since they already
> exist for the vg
> and finised prematurely:
> 
> creating bz1837199_4/lv_4_1...
>   WARNING: Not using lvmetad because config setting use_lvmetad=0.
>   WARNING: To avoid corruption, rescan devices to make changes visible
> (pvscan --cache).
>   Logical Volume "lv_4_1" already exists in volume group "bz1837199_4"
I was missing passing --config to lvremove, will retry.

(Originally by Amit Bawer)

Comment 39 RHV bug bot 2020-06-22 11:05:59 UTC

Created attachment 1691919 [details]
second run on el7.8

(Originally by Amit Bawer)

Comment 40 RHV bug bot 2020-06-22 11:06:06 UTC

Created attachment 1691919 [details]
second run on el7.8

(Originally by Amit Bawer)

Comment 41 RHV bug bot 2020-06-22 11:06:09 UTC

(In reply to Amit Bawer from comment #39)
> Created attachment 1691919 [details]
> second run on el7.8

No special errors except the usual ones for second  attempt with lvm2-2.02.186-7.el7_8.2.x86_64


 Scan of VG bz1837199_5 from /dev/loop4 found metadata seqno 50001 vs previous 49998.
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
removing bz1837199_10/lv_10_1 bz1837199_10/lv_10_2 bz1837199_10/lv_10_3 bz1837199_10/lv_10_4 bz1837199_10/lv_10_5 bz1837199_10/lv_10_6 bz1837199_10/lv_10_7 bz1837199_10/lv_10_8 bz1837199_10/lv_10_9 bz1837199_10/lv_10_10 ...
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
removing bz1837199_7/lv_7_1 bz1837199_7/lv_7_2 bz1837199_7/lv_7_3 bz1837199_7/lv_7_4 bz1837199_7/lv_7_5 bz1837199_7/lv_7_6 bz1837199_7/lv_7_7 bz1837199_7/lv_7_8 bz1837199_7/lv_7_9 bz1837199_7/lv_7_10 ...
  WARNING: This metadata update is NOT backed up.

(Originally by Amit Bawer)

Comment 42 RHV bug bot 2020-06-22 11:06:12 UTC

> But how about:
> 
>     lvs --select 'lv_name = <lv-name>' vg-name
> 
> Is this safe, never failing because another vg was modified by concurent
> lvm command on the same host?

This only processes the named vg, so that shouldn't be a problem.

> > To check if a specific VG name exists somewhere, then you have to read every
> > device, and 'vgs <name>' is probably best.
> 
> But when this command fail, we don't know if the issue is non-existing vg
> or temporary error accessing storage. This is not usable command.

I agree.  The frustrating part is that most of these things are actually fixed in the newest versions, but you are using older versions.

> - get all pvs on the system
> 
>     pvs
> 
> This commands access of pvs in the system. Can it fail some vgs were
> modified while running the command?

Yes, I believe it would, because internally it's implemented like 'vgs', by reading each VG on the system and reporting each PV in each VG.  And the problematic issue is the failure when reading a VG if another host is changing that VG at the same time (correct me if I'm wrong, I'm losing track of exactly which command failure we're studying.)

> If it does, we must add locking to ensure that no modification is done
> while we run such command.
> 
> - get certain pvs
> 
>   pvs --select 'pv_name = <pv-name1> || pv_name = <pv-name1>'
> 
> 
> For this I assume that we have the same issue we have with vgs, and this
> command will access all pvs and may fail if some vg is modified?

yes
 
> Should vdsm add its own locking to avoid running lvs/vgs/pvs while lvm
> commands that modify the vg are running? I expect lvm to handle this 
> locking if needed, but if this is not solved in rhel 7, we can add our
> own locking on to avoid such issues.

In RHEL8 lvs/vgs/pvs will not fail if another host is changing the VG concurrently, so you would not need locking there to avoid the failure.  That's a part of the rewrite of scanning/reading that we keep coming back to, and which solves so many of the lvm problems RHV has run into.  You could certainly try adding your own locking; that sounds like a fairly big project (lvmlockd uses distributed locks like this to coordinate reading/writing VGs among hosts.)

(Originally by David Teigland)

Comment 43 RHV bug bot 2020-06-22 11:06:20 UTC

Thanks for the inputs so far,

For running lvm commands we have the following retry flow mentioned:

1. Try the lvm command with specific device filter (cached /dev/mapper/* devices list).
2. If (1) finished with rc=0 and we got some output in case of lvs/pvs/vgs --select command, return rc and output.
3. Retry once again steps (1-2) but with a fresh device filter (read devices list from multipath first).
4. If current host is running in read-only mode (locking_type=4), retry the failed lvm command 4 times
   with increasing delay intervals: 0.1s, 0.2s, 0.4s, 0.8s in between retries, until first succeeds with rc=0.
5. Return the last rc, err, warn if nothing helped.

In the case shown here, the lvs command is failing for locking_type=1, so maybe we can apply fallback step (4) regardless to read-only mode.

As for other points:

 - Don't use --select for lvm calls where we cannot or do not specify a vg-name.
 - I think we don't have a case where we only check on a pv, we usually want the pv extra information, so "pvs pv-name" should probably do.
 - We do use vgck in case of block domain monitoring to see if it is still intact.

Does it sound right?

(Originally by Amit Bawer)

Comment 44 RHV bug bot 2020-06-22 11:06:24 UTC

Created attachment 1692673 [details]
reproduce script for 4.4. on rhel 8, version 2

This reproducer is more complete, simulating more of vdsm lvm usage, and more
flexible.

Usage example:

mkdir fedora-31

# Run the test in one shell 
cd fedora-31
../reproduer-4.4-v2.sh setup
../reproduer-4.4-v2.sh run

# In another shell, run the reloads
cd fedora-31
../reproduer-4.4-v2.sh reload

To clean up the environment after the test run:
../reproducer-4.4-v2.sh teardown

I could not reproduce any error yet on Fedora 31, running about 40 trials.

(Originally by Nir Soffer)

Comment 45 RHV bug bot 2020-06-22 11:06:27 UTC

(In reply to Amit Bawer from comment #42)
> Thanks for the inputs so far,
> 
> For running lvm commands we have the following retry flow mentioned:
...
I don't think this is the right direction. We learned that --select is bad
for our use case, and we don't want to run more commands with --select.

> As for other points:
> 
>  - Don't use --select for lvm calls where we cannot or do not specify a
> vg-name.
>  - I think we don't have a case where we only check on a pv, we usually want
> the pv extra information, so "pvs pv-name" should probably do.

Agree

>  - We do use vgck in case of block domain monitoring to see if it is still
> intact.

Yes, no change needed in vgck.

I think what we need to do:

- Use attachment 1692673 [details] to test on rhel 8.2, on a host with many cpus
  We really need to have parallel lvm commands. Maybe we can use a host
  in the scale lab?

- Port reporducer-4.4-v2.sh to 4.3 (new config, mix activation with tag changes)
 
- Run same test on rhel 7 host.

If we cannot reproduce, we need to go back to the logs attached to this bug
and understand whats is missing to reproduce this.

If we cannot reproduce using lvm commands, we need to write test script using
the SDK, reproducing the flows in comment 0 - running vms with many disks on 
multiple storage domain, performing snpahost and deleting snapshot.

(Originally by Nir Soffer)

Comment 46 RHV bug bot 2020-06-22 11:06:31 UTC

Mordechai, can we use a host in the scale lab for reproducing?
I think we need a host with many cpus to simulate parallel usage.

We have a reproducer script (attachment 1692673 [details]) that may reproduce
the issue on a host with many cpus pretty quickly. The only requirement
is a host running rhel 8.2, and 7.8. Based on current tests and input
from LVM team, this issue may not be reproducible in rhel 8.2.

(Originally by Nir Soffer)

Comment 47 RHV bug bot 2020-06-22 11:06:35 UTC

Created attachment 1692716 [details]
variation of reproducer4.4-v2  for 4.3

Adding reproducer-4.3-v2.sh for 4.3

Diff from the 4.4 script 

$ diff reproducer-4.3-v2.sh reproducer-4.4-v2.sh 
5c5
< trials=10000
---
> trials=1
11,12c11,12
< # Based on vdsm configuration on RHEL7.
< config="devices { preferred_names=[\"^/dev/mapper/\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"
---
> # Based on vdsm configuration on RHEL8.
> config="devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ] hints=\"none\" } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"
21c21
<         vgremove --config "$config" -f ${vg_name}
---
>         vgremove --config "$config" -f $vg_name
91,92c91,92
<                     echo "Changing lv ${lv_path} tags mixed with activation"
<                     lvchange --config "${config}" --autobackup n -ay --addtag image_id --addtag parent_none ${lv_path}
---
>                     echo "Changing lv ${lv_path} tags"
>                     lvchange --config "${config}" --autobackup n --addtag image_id --addtag parent_none ${lv_path}
102,103c102,103
<                     #echo "Activating lv ${lv_path}"
<                     #lvchange --config "${config}" --autobackup n -ay "${lv_path}"
---
>                     echo "Activating lv ${lv_path}"
>                     lvchange --config "${config}" --autobackup n -ay "${lv_path}"
111,112c111,112
<                     echo "Changing lv ${lv_path} tags mixed with deactivation"
<                     lvchange --config "${config}" --autobackup n -an --deltag parent_none --addtag parent_id "${lv_path}"
---
>                     echo "Changing lv ${lv_path} tags"
>                     lvchange --config "${config}" --autobackup n --deltag parent_none --addtag parent_id "${lv_path}"
114,115c114,115
<                     #echo "Deactivaing lv ${lv_path}"
<                     #lvchange --config "${config}" --autobackup n -an "${lv_path}"
---
>                     echo "Deactivaing lv ${lv_path}"
>                     lvchange --config "${config}" --autobackup n -an "${lv_path}"
124,125c124,125
<                     lvchange --config "${config}" --autobackup n -ay --deltag image_id --addtag remove_me ${lv_path}
<                     lvchange --config "${config}" --autobackup n -an --deltag parent_id --addtag parent_none ${lv_path}
---
>                     lvchange --config "${config}" --autobackup n --deltag image_id --addtag remove_me ${lv_path}
>                     lvchange --config "${config}" --autobackup n --deltag parent_id --addtag parent_none ${lv_path}

(Originally by Amit Bawer)

Comment 48 RHV bug bot 2020-06-22 11:06:38 UTC

(In reply to Amit Bawer from comment #46)
> Created attachment 1692716 [details]
> variation of reproducer4.4-v2  for 4.3
> 
> Adding reproducer-4.3-v2.sh for 4.3
> 
> Diff from the 4.4 script 
> 
> $ diff reproducer-4.3-v2.sh reproducer-4.4-v2.sh 

I cannot read this diff, it is good for machines.

$ diff -u reproducer-4.4-v2.sh reproducer-4.3-v2.sh 
--- reproducer-4.4-v2.sh	2020-05-27 14:43:15.642771197 +0300
+++ reproducer-4.3-v2.sh	2020-05-27 23:04:52.635226641 +0300
@@ -2,14 +2,14 @@
 
 set -e
 
-trials=1
+trials=10000

This looks a bit too much :-)

 concurrent=10
 vg_lvs=20
 vg_prefix="bz1837199"
 pv_size="50g"
 
-# Based on vdsm configuration on RHEL8.
-config="devices { ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ] hints=\"none\" } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"
+# Based on vdsm configuration on RHEL7.
+config="devices { preferred_names=[\"^/dev/mapper/\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"

preferred_names is not relevant since we use loop devices. Otherwise look good.

 
 # Clean previous setup.
 teardown() {
@@ -18,7 +18,7 @@
         vgchange --config "$config" -an ${vg_name}
 
         echo "Removing vg ${vg_name}"
-        vgremove --config "$config" -f $vg_name
+        vgremove --config "$config" -f ${vg_name}

OK

     done
 
     for devlink in $(ls loop_*); do
@@ -88,8 +88,8 @@
                     echo "Creating lv ${lv_path}"
                     lvcreate --config "${config}" --size 1g --name ${lv_name} -an ${vg_name}
 
-                    echo "Changing lv ${lv_path} tags"
-                    lvchange --config "${config}" --autobackup n --addtag image_id --addtag parent_none ${lv_path}
+                    echo "Changing lv ${lv_path} tags mixed with activation"
+                    lvchange --config "${config}" --autobackup n -ay --addtag image_id --addtag parent_none ${lv_path}

OK

 
                 done
 
@@ -99,8 +99,8 @@
                     lv_path="${vg_name}/${lv_name}"
                     dev_path="/dev/${lv_path}"
 
-                    echo "Activating lv ${lv_path}"
-                    lvchange --config "${config}" --autobackup n -ay "${lv_path}"
+                    #echo "Activating lv ${lv_path}"
+                    #lvchange --config "${config}" --autobackup n -ay "${lv_path}"

Can be deleted

 
                     echo "Doing some I/O with $dev_path"
                     dd if=/dev/zero of="${dev_path}" bs=1M count=32 oflag=direct conv=fsync
@@ -108,11 +108,11 @@
                     echo "Extending lv ${lv_path}"
                     lvextend --config "$config" --size +1g "${lv_path}"
 
-                    echo "Changing lv ${lv_path} tags"
-                    lvchange --config "${config}" --autobackup n --deltag parent_none --addtag parent_id "${lv_path}"
+                    echo "Changing lv ${lv_path} tags mixed with deactivation"
+                    lvchange --config "${config}" --autobackup n -an --deltag parent_none --addtag parent_id "${lv_path}"

OK
 
-                    echo "Deactivaing lv ${lv_path}"
-                    lvchange --config "${config}" --autobackup n -an "${lv_path}"
+                    #echo "Deactivaing lv ${lv_path}"
+                    #lvchange --config "${config}" --autobackup n -an "${lv_path}"

Can be removed.

                 done
 
                 # Prepare lvs for removal.
@@ -121,8 +121,8 @@
                     lv_path="${vg_name}/${lv_name}"
 
                     echo "Preparing ${lv_path} for removal"
-                    lvchange --config "${config}" --autobackup n --deltag image_id --addtag remove_me ${lv_path}
-                    lvchange --config "${config}" --autobackup n --deltag parent_id --addtag parent_none ${lv_path}
+                    lvchange --config "${config}" --autobackup n -ay --deltag image_id --addtag remove_me ${lv_path}
+                    lvchange --config "${config}" --autobackup n -an --deltag parent_id --addtag parent_none ${lv_path}

In the real flow we activate the lv when we set the remove_me tag.

                 done
 
                 # Discard and remove lvs.

Additional change needed - since we activate the lv when marking for 
removal, we don't need to activate before discarding, and we also did
not deactivate lvs before removal. This is why we see these warnings:

    Removing active volume %s/%s

@@ -131,14 +131,12 @@
                     lv_path="${vg_name}/${lv_name}"
                     dev_path="/dev/${lv_path}"
 
-                    echo "Activating lv ${lv_path}"
-                    lvchange --config "${config}" --autobackup n -ay "${lv_path}"
+                    # In 4.3 the lv is activate beofre when marked for removal.
 
                     echo "Discarding lv ${lv_path}"
                     blkdiscard --step 32m "${dev_path}"
 
-                    echo "Deactivaing lv ${lv_path}"
-                    lvchange --config "${config}" --autobackup n -an "${lv_path}"
+                    # In 4.3 we remove activa lv.
 
                     echo "Removing ${lv_path}"
                     lvremove --config "${config}" -f "${lv_path}"

(Originally by Nir Soffer)

Comment 49 RHV bug bot 2020-06-22 11:06:41 UTC

Reproduced it quite easily now

Modified the script from comment #46 with:
* Add dm-delay devices on top of the loop device
* In reload do 'vgs' too

In the reload loop I suddenly started getting tons of these, then it stopped for a while:
~~~
  Metadata on /dev/mapper/slow_25 at 1182720 has wrong VG name "" expected slow_25.
  Metadata on /dev/mapper/slow_25 at 1182720 has wrong VG name "" expected slow_25.
  Not repairing metadata for VG slow_25.
  Recovery of volume group "slow_25" failed.
  Cannot process volume group slow_25
~~~

~~~
  Metadata on /dev/mapper/slow_27 at 1049088 has wrong VG name "" expected slow_27.
  Metadata on /dev/mapper/slow_27 at 1049088 has wrong VG name "" expected slow_27.
  Not repairing metadata for VG slow_27.
  Recovery of volume group "slow_27" failed.
  Cannot process volume group slow_27
~~~

Specs:
* 32 CPUs (2x Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz)
* 64GB RAM

Script is just a hack on top of comment #46, teardown is *totally broken*

$ diff -u reproducer-4.3-v2.sh reproducer-4.3-v2_delay.sh 
--- reproducer-4.3-v2.sh	2020-05-28 09:41:56.059120489 +1000
+++ reproducer-4.3-v2_delay.sh	2020-05-28 11:08:47.423025124 +1000
@@ -3,22 +3,25 @@
 set -e
 
 trials=10000
-concurrent=10
+concurrent=40
 vg_lvs=20
-vg_prefix="bz1837199"
+vg_prefix="slow"
 pv_size="50g"
 
 # Based on vdsm configuration on RHEL7.
-config="devices { preferred_names=[\"^/dev/mapper/\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/loop[0-9]|\", \"r|.*|\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"
+config="devices { preferred_names=[\"^/dev/mapper/\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ \"a|^/dev/mapper/slow_[0-9]+|\", \"r|.*|\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 }"
 
 # Clean previous setup.
 teardown() {
-    for vg_name in $(vgs --noheadings --select 'vg_name =~ ^bz1837199_[0-9]+' -o vg_name); do
+    for vg_name in $(vgs --noheadings --select 'vg_name =~ ^slow_[0-9]+' -o vg_name); do
         echo "Deactivating lvs in ${vg_name}"
         vgchange --config "$config" -an ${vg_name}
 
         echo "Removing vg ${vg_name}"
         vgremove --config "$config" -f ${vg_name}
+
+        echo "Removing dm-slow ${devlink}"
+        dmsetup remove ${vg_name} 
     done
 
     for devlink in $(ls loop_*); do
@@ -57,8 +60,12 @@
         dev=$(losetup --find --show "${backing}")
         echo "Created loop device ${dev}"
 
+        sectors=$(blockdev --getsize "${dev}")
+        echo "Creating dm-slow on ${dev}"
+        echo "0 ${sectors} delay ${dev} 0 3" | dmsetup create ${vg_name}
+
         echo "Creating devlink ${devlink}"
-        ln -s "${dev}" "${devlink}"
+        ln -s "/dev/mapper/${vg_name}" "${devlink}"
 
         echo "Creating pv ${dev}..."
         # Using vdsm configuration.
@@ -66,7 +73,7 @@
 
         echo "Creating vg ${vg_name}"
         # Using vdsm configution.
-        vgcreate --physicalextentsize 128m "${vg_name}" "${devlink}"
+        vgcreate --config "$config" --physicalextentsize 128m "${vg_name}" "${devlink}"
     done
 }
 
@@ -163,7 +170,7 @@
             vg_name="${vg_prefix}_${vg}"
 
             while true; do
-                echo "Reloading lvs in $vg_name"
+                vgs --config "${config}" --noheadings --select "vg_name = ${vg_name}" > /dev/null
                 lvs --config "${config}" --noheadings --select "vg_name = ${vg_name}" > /dev/null
             done
         ) &

(Originally by Germano Veit Michel)

Comment 50 RHV bug bot 2020-06-22 11:06:44 UTC

(In reply to Germano Veit Michel from comment #48)
> Reproduced it quite easily now
> 
> Modified the script from comment #46 with:
> * Add dm-delay devices on top of the loop device

Thanks! this was very helpful.

> * In reload do 'vgs' too
> 
> In the reload loop I suddenly started getting tons of these, then it stopped
> for a while:
> ~~~
>   Metadata on /dev/mapper/slow_25 at 1182720 has wrong VG name "" expected
> slow_25.
>   Metadata on /dev/mapper/slow_25 at 1182720 has wrong VG name "" expected
> slow_25.
>   Not repairing metadata for VG slow_25.
>   Recovery of volume group "slow_25" failed.
>   Cannot process volume group slow_25
> ~~~

I can reproduce these as well.

> Specs:
> * 32 CPUs (2x Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz)
> * 64GB RAM

This looks like a great machine to reproduce, but with delay device and some
tuning, I could reproduce reliably on VM with 8 CPUs.

> Script is just a hack on top of comment #46, teardown is *totally broken*

No problem, I fixed all the issues in the new version.

I rewrote the script in python since it became too complicated to use and do
not provide useful results as is.

I posted the new version as a new storage stress test here:
https://gerrit.ovirt.org/c/109342/

Here is example help:

$ tests/storage/stress/reload.py -h
usage: reload.py [-h] [--trials TRIALS] [--vg-count VG_COUNT]
                 [--lv-count LV_COUNT] [--pv-size PV_SIZE]
                 [--delay-msec DELAY_MSEC] [--debug]
                 {setup,teardown,run}

positional arguments:
  {setup,teardown,run}

optional arguments:
  -h, --help            show this help message and exit
  --trials TRIALS       Number of trials
  --vg-count VG_COUNT   Number of vgs
  --lv-count LV_COUNT   Number of lvs per vg
  --pv-size PV_SIZE     Size of pv in GiB
  --delay-msec DELAY_MSEC
                        Number of milliseconds to delay I/O
  --debug               Show debug logs

The defaults are good enough to reproduce, but you may find the options useful
for getting faster results or for testing changes in the tool.

See the top of the script for more info on how to run this.

If you want to suggest changes or share modified version, please post
a patch on top of this to gerrit.

(Originally by Nir Soffer)

Comment 51 RHV bug bot 2020-06-22 11:06:48 UTC

Created attachment 1693574 [details]
Log using reproducer from https://gerrit.ovirt.org/c/109342/ on Fedora 31

This is a run log on Fedora 31 using 
lvm2-2.03.09-1.fc31.x86_64

No errors after 91 minutes and 9,692 successful reloads.

2020-05-30 01:33:25,981 INFO    (reload/pv) Stats: reloads=3540 errors=0 error_rate=0.00% avg_time=1.558 med_time=1.510 min_time=0.312 max_time=7.323
2020-05-30 01:33:25,981 INFO    (reload/lv) Stats: reloads=3319 errors=0 error_rate=0.00% avg_time=1.660 med_time=1.722 min_time=0.304 max_time=7.375
2020-05-30 01:33:25,998 INFO    (reload/vg) Stats: reloads=2833 errors=0 error_rate=0.00% avg_time=1.947 med_time=1.904 min_time=0.328 max_time=10.210

(Originally by Nir Soffer)

Comment 52 RHV bug bot 2020-06-22 11:06:52 UTC

Created attachment 1693575 [details]
Log using reproducer from https://gerrit.ovirt.org/c/109342/ on CentOS 7.8

Log from CentOS 7.8 using
lvm2-2.02.186-7.el7_8.1.x86_64

Reproduce 573 errors in 12,366 successful reloads.

2020-05-30 01:26:25,346 INFO    (reload/vg) Stats: reloads=3455 errors=170 error_rate=4.92% avg_time=1.369 med_time=1.216 min_time=0.148 max_time=9.041
2020-05-30 01:26:25,583 INFO    (reload/lv) Stats: reloads=4155 errors=198 error_rate=4.77% avg_time=1.140 med_time=1.092 min_time=0.147 max_time=6.240
2020-05-30 01:26:25,622 INFO    (reload/pv) Stats: reloads=4756 errors=205 error_rate=4.31% avg_time=0.990 med_time=0.925 min_time=0.147 max_time=5.961

Example errors:

2020-05-30 01:13:35,685 ERROR   (reload/pv) Reloading pv failed: Command ('pvs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_ca
che_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} b
ackup { retain_min=50 retain_days=0}', '--noheadings', '--select', 'pv_name = /dev/mapper/delay0000000000000000000000000006') failed rc=5 out='/dev/mapper/delay0000000000000
000000000000006 bz1837199-000000000000000000000-0006 lvm2 a--  <2.00t 1.46t' err='Metadata on /dev/mapper/delay0000000000000000000000000000 at 128238592 has wrong VG name " 
= "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\nextent_count = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0"" expected bz1837199-00000000000000
0000000-0000.\n  Metadata on /dev/mapper/delay0000000000000000000000000000 at 128238592 has wrong VG name " = "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\next
ent_count = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0"" expected bz1837199-000000000000000000000-0000.\n  Not repairing metadata for VG bz1837199-00000000
0000000000000-0000.\n  Recovery of volume group "bz1837199-000000000000000000000-0000" failed.\n  Cannot process volume group bz1837199-000000000000000000000-0000'

2020-05-30 01:13:35,900 ERROR   (reload/vg) Reloading vg failed: Command ('vgs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_ca
che_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} b
ackup { retain_min=50 retain_days=0}', '--noheadings', '--select', 'vg_name = bz1837199-000000000000000000000-0002') failed rc=5 out='bz1837199-000000000000000000000-0002   
1 272   0 wz--n- <2.00t <1.47t' err='Scan of VG bz1837199-000000000000000000000-0009 from /dev/mapper/delay0000000000000000000000000009 found metadata seqno 2220 vs previous
 2219.\n  Metadata on /dev/mapper/delay0000000000000000000000000000 at 128238592 has wrong VG name " = "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\nextent_cou
nt = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0"" expected bz1837199-000000000000000000000-0000.\n  Metadata on /dev/mapper/delay00000000000000000000000000
00 at 128238592 has wrong VG name " = "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\nextent_count = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0
"" expected bz1837199-000000000000000000000-0000.\n  Not repairing metadata for VG bz1837199-000000000000000000000-0000.\n  Recovery of volume group "bz1837199-0000000000000
00000000-0000" failed.\n  Cannot process volume group bz1837199-000000000000000000000-0000'

2020-05-30 01:13:36,117 ERROR   (reload/lv) Reloading lv failed: Command ('lvs', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_ca
che_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/delay[0-9]+$|", "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0} b
ackup { retain_min=50 retain_days=0}', '--noheadings', '--select', 'vg_name = bz1837199-000000000000000000000-0006 && lv_name = lv-0000000000000000000000000000-0353') failed
 rc=5 out='lv-0000000000000000000000000000-0353 bz1837199-000000000000000000000-0006 -wi------- 2.00g' err='Scan of VG bz1837199-000000000000000000000-0009 from /dev/mapper/
delay0000000000000000000000000009 found metadata seqno 2220 vs previous 2219.\n  Metadata on /dev/mapper/delay0000000000000000000000000000 at 128238592 has wrong VG name " =
 "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\nextent_count = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0"" expected bz1837199-000000000000000
000000-0000.\n  Metadata on /dev/mapper/delay0000000000000000000000000000 at 128238592 has wrong VG name " = "host1"\nsegment_count = 2\n\nsegment1 {\nstart_extent = 0\nexte
nt_count = 8\n\ntype = "striped"\nstripe_count = 1\n\nstripes = [\n"pv0"" expected bz1837199-000000000000000000000-0000.\n  Not repairing metadata for VG bz1837199-000000000
000000000000-0000.\n  Recovery of volume group "bz1837199-000000000000000000000-0000" failed.\n  Cannot process volume group bz1837199-000000000000000000000-0000'

(Originally by Nir Soffer)

Comment 53 RHV bug bot 2020-06-22 11:06:56 UTC

Please use latest version of the stress test for testing:
https://gerrit.ovirt.org/c/109355

(Originally by Nir Soffer)

Comment 54 RHV bug bot 2020-06-22 11:06:59 UTC

Adding results from CentOS 7.8:

# time ../reload.py run 2>run.log

2020-05-31 21:51:17,783 INFO    (reload/pv) Stats: reloads=5263 errors=200 error_rate=3.80% avg_time=0.851 med_time=0.824 min_time=0.140 max_time=2.422
2020-05-31 21:51:17,902 INFO    (reload/vg) Stats: reloads=4196 errors=172 error_rate=4.10% avg_time=1.068 med_time=1.037 min_time=0.135 max_time=3.533
2020-05-31 21:51:17,780 INFO    (reload/lv) Stats: reloads=4789 errors=198 error_rate=4.13% avg_time=0.935 med_time=0.942 min_time=0.136 max_time=2.633

real 74m41.284s
user 72m12.264s
sys 27m3.581s

# time ../reload.py run --no-udev
...
2020-05-31 20:09:15,257 INFO    (reload/pv) Stats: reloads=5192 errors=192 error_rate=3.70% avg_time=0.862 med_time=0.829 min_time=0.132 max_time=2.407
2020-05-31 20:09:15,461 INFO    (reload/vg) Stats: reloads=4178 errors=157 error_rate=3.76% avg_time=1.071 med_time=1.020 min_time=0.132 max_time=5.709
2020-05-31 20:09:15,262 INFO    (reload/lv) Stats: reloads=4736 errors=177 error_rate=3.74% avg_time=0.945 med_time=0.953 min_time=0.133 max_time=2.393

real 74m37.137s
user 73m5.567s
sys 25m14.252s

# time ../reload.py run --no-udev --no-select

2020-05-31 23:08:25,646 INFO    (reload/pv) Stats: reloads=6311 errors=38 error_rate=0.60% avg_time=0.699 med_time=0.648 min_time=0.130 max_time=2.107
2020-05-31 23:08:25,490 INFO    (reload/vg) Stats: reloads=7163 errors=26 error_rate=0.36% avg_time=0.616 med_time=0.619 min_time=0.130 max_time=2.366
2020-05-31 23:08:25,522 INFO    (reload/lv) Stats: reloads=7580 errors=19 error_rate=0.25% avg_time=0.582 med_time=0.596 min_time=0.129 max_time=1.184

real 73m33.318s
user 52m53.458s
sys 21m37.100s

We can see that disabling obtain_device_list_from_udev gives very small
improvement - error rate 3.7% vs 4.1%.

Replacing usage of --select with explicit pv or vg names is a huge
improvement. Reloads are much faster and more consistent, we do 
almost 2X times more reloads (21,054 vs 14,106) and error rate is
10 times lower (0.39% vs 3.7%)

I think this is very clear that removing usage of --select is the
right way.

However we still have errors - I think we need to enable also retries
for reload commands when using locking_type=1. We can add retry= flag
to enable retry for certain commands, instead retrying only read only
commands.

The easiest way to validate this is to add retry mechanism to the stress
test.

Regardless I think the root cause is LVM bug and it should be fixed in LVM.

(Originally by Nir Soffer)

Comment 55 RHV bug bot 2020-06-22 11:07:01 UTC

Created attachment 1693953 [details]
run.log - run log with defaults, using udev and --select

(Originally by Nir Soffer)

Comment 56 RHV bug bot 2020-06-22 11:07:04 UTC

Created attachment 1693954 [details]
run-no-udev.log - run log without udev and with --select

(Originally by Nir Soffer)

Comment 57 RHV bug bot 2020-06-22 11:07:09 UTC

Created attachment 1693955 [details]
run-no-udev-no-select.log - run logs without udev or --select

(Originally by Nir Soffer)

Comment 58 RHV bug bot 2020-06-22 11:07:13 UTC

The root cause is LVM bug 1842600, but we learned that using --select in
vdsm is very bad and we need to fix it regardless of the lvm bug.

This also exposed issue with udev in rhel 8.2 (bug 1837199) bug 1842053 
tracks the needed change in vdsm to mitigate this.

(Originally by Nir Soffer)

Comment 59 RHV bug bot 2020-06-22 11:07:15 UTC

No reload errors encountered during reproducer runs with current brew lvm2 fix for bz1842600:

https://bugzilla.redhat.com/show_bug.cgi?id=1842600#c8

(Originally by Amit Bawer)

Comment 61 RHV bug bot 2020-06-22 11:07:21 UTC

Amit, can you please provide a clear verification scenario?

(Originally by Avihai Efrat)

Comment 62 RHV bug bot 2020-06-22 11:07:25 UTC

Hi Avihai,

Per your question, please note the following:

1. The root cause is to be fixed in an lvm2 rpm (official version to be determined, see bz1842600).
2. Current Vdsm fixes attached to this ticket are not acked yet, but they are involved only with reducing the issue occurrence by removing the usage of --select from lvm commands in Vdsm.
3. The reproducer script used multiple VGs being modified (add lv, change lv tags, extend lv, write to lv, remove lv) while lvm pvs, vgs, lvs commands were executed, all in parallel. The reproduction was using 10 VGs having 500 LVs each undergoing this process, it was reproduced using RHEL7.8 machine with 8 CPUs (didn't reproduce with less than that). You can see reproduction tests parameters also in https://bugzilla.redhat.com/show_bug.cgi?id=1842600#c8

So given the above, I'd suggest to test it with the proper lvm2 rpm, when available for verification. If you want to Vdsm test it before fix to see if it reproduces on QE env then I think it's more of a Scale issue since we need multiple VGs (domains) having disks added and removed while lvm reloads occur as part of the operations.

(Originally by Amit Bawer)

Comment 63 RHV bug bot 2020-06-22 11:09:59 UTC

Hi Avihai,

Per your question, please note the following:

1. The root cause is to be fixed in an lvm2 rpm (official version to be determined, see bz1842600).
2. Current Vdsm fixes attached to this ticket are not acked yet, but they are involved only with reducing the issue occurrence by removing the usage of --select from lvm commands in Vdsm.
3. The reproducer script used multiple VGs being modified (add lv, change lv tags, extend lv, write to lv, remove lv) while lvm pvs, vgs, lvs commands were executed, all in parallel. The reproduction was using 10 VGs having 500 LVs each undergoing this process, it was reproduced using RHEL7.8 machine with 8 CPUs (didn't reproduce with less than that). You can see reproduction tests parameters also in https://bugzilla.redhat.com/show_bug.cgi?id=1842600#c8

So given the above, I'd suggest to test it with the proper lvm2 rpm, when available for verification. If you want to Vdsm test it before fix to see if it reproduces on QE env then I think it's more of a Scale issue since we need multiple VGs (domains) having disks added and removed while lvm reloads occur as part of the operations.

(Originally by Amit Bawer)

Comment 64 RHV bug bot 2020-06-22 11:10:15 UTC

Hi Avihai,

Per your question, please note the following:

1. The root cause is to be fixed in an lvm2 rpm (official version to be determined, see bz1842600).
2. Current Vdsm fixes attached to this ticket are not acked yet, but they are involved only with reducing the issue occurrence by removing the usage of --select from lvm commands in Vdsm.
3. The reproducer script used multiple VGs being modified (add lv, change lv tags, extend lv, write to lv, remove lv) while lvm pvs, vgs, lvs commands were executed, all in parallel. The reproduction was using 10 VGs having 500 LVs each undergoing this process, it was reproduced using RHEL7.8 machine with 8 CPUs (didn't reproduce with less than that). You can see reproduction tests parameters also in https://bugzilla.redhat.com/show_bug.cgi?id=1842600#c8

So given the above, I'd suggest to test it with the proper lvm2 rpm, when available for verification. If you want to Vdsm test it before fix to see if it reproduces on QE env then I think it's more of a Scale issue since we need multiple VGs (domains) having disks added and removed while lvm reloads occur as part of the operations.

(Originally by Amit Bawer)

Comment 65 Avihai 2020-07-02 05:49:26 UTC

According to Amit's last remark, this is a scale bug that should be tested by scale QE team.
Changing qa contact to Mordechai.

Comment 66 Michal Skrivanek 2020-07-07 06:36:02 UTC

*** Bug 1852344 has been marked as a duplicate of this bug. ***

Comment 67 Nir Soffer 2020-07-07 12:06:16 UTC

Amit, this bug requires LVM fix from bug 1842600 - according
to that bug, we must require lvm2-2.02.187-6.el7.

Moving to NEW since we do not require that version yet and no
patch was posted with this requirement.

Comment 68 Nir Soffer 2020-07-07 12:06:54 UTC

See comment 67

Comment 69 Amit Bawer 2020-07-07 12:24:39 UTC

Added https://gerrit.ovirt.org/#/c/110178/

Comment 75 David Vaanunu 2020-08-03 10:26:55 UTC

Tested Flow:
1. Create VM with 13 Disks (Each disk on diff SD)
2. Create VM snapshot 10 times
3. Delete VM

The flow run 20 times with 2 concurrent users.

Also, the test was run twice:
1. RHV 4.3.9-7 - Reproduce the problem. (vdsm-4.30.42 & lvm2-2.02.186-7)
2. RHV 4.3.11-7 - Fix verify. (vdsm-4.30.50 & lvm2-2.02.187-6)


During the 1s test get many errors:
:2020-08-02 00:47:06,400+0000 ERROR (monitor/d7e3b45) [storage.LVM] Reloading VGs failed vgs=['d7e3b455-ae7b-4bf9-a997-f25adc5f19b8'] rc=5 out=['  cqe0ji-XYxI-bejZ-AG2d-Vv1J-izNU-7Lvyfb|d7e3b455-ae7b-4bf9-a997-f25adc5f19b8|wz--n-|536602476544|516067164160|134217728|3998|3845|MDT_ALIGNMENT=1048576,MDT_BLOCK_SIZE=512,MDT_CLASS=Data,MDT_DESCRIPTION=MAXLUNS_3600a098038304437415d4b6a59685973,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_POOL_UUID=3fd7f5a9-738f-44c3-8e3b-760a43954106,MDT_PV0=pv:3600a098038304437415d4b6a59685973&44&uuid:3Omcg8-31Sd-SNUd-rtmP-EIwf-aCxB-Ne6r34&44&pestart:0&44&pecount:3998&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=d7e3b455-ae7b-4bf9-a997-f25adc5f19b8,MDT_TYPE=FCP,MDT_VERSION=5,MDT_VGUUID=cqe0ji-XYxI-bejZ-AG2d-Vv1J-izNU-7Lvyfb,MDT__SHA_CKSUM=9d455c6d1a9668abed2896fe76c126358f3461ad,RHAT_storage_domain|134217728|67097088|23|1|/dev/mapper/3600a098038304437415d4b6a59685973'] err=['  Metadata on /dev/mapper/3600a098038304437415d4b6a59685976 at 536848107520 has wrong VG name "" expected 44cfdf47-9464-447d-aab8-6f58f8b218d0.', '  Metadata on /dev/mapper/3600a098038304437415d4b6a59685976 at 536848107520 has wrong VG name "" expected 44cfdf47-9464-447d-aab8-6f58f8b218d0.', '  Not repairing metadata for VG 44cfdf47-9464-447d-aab8-6f58f8b218d0.', '  Recovery of volume group "44cfdf47-9464-447d-aab8-6f58f8b218d0" failed.', '  Cannot process volume group 44cfdf47-9464-447d-aab8-6f58f8b218d0'] (lvm:576)

And during the 2nd - No Errors in vdsm.log files


VDSM log files (Both versions):
https://drive.google.com/drive/folders/1klyclDkmUloD21pfle-giDpG7n9GUFZq

Comment 79 errata-xmlrpc 2020-09-30 10:13:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Virtualization security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4114

Note You need to log in before you can comment on or make changes to this bug.