1693075 – After live merge, top volume still has INTERNAL volume type due to unable to read lvm metadata properly

Bug 1693075 - After live merge, top volume still has INTERNAL volume type due to unable to read lvm metadata properly

Summary: After live merge, top volume still has INTERNAL volume type due to unable to ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1553133
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	4.2.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.4.1
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	Avihai
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	gss_rhv_4_3_4 1715026
TreeView+	depends on / blocked

Reported:	2019-03-27 06:39 UTC by Germano Veit Michel
Modified:	2024-06-13 22:04 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1715026 (view as bug list)
Environment:
Last Closed:	2020-03-21 15:37:27 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	100381	0	'None'	MERGED	lvm: Log LVM warnings for successful commands	2021-02-08 17:44:50 UTC
oVirt gerrit	100410	0	'None'	MERGED	lvm: Suppress unhelpful warnings	2021-02-08 17:44:50 UTC

Description Germano Veit Michel 2019-03-27 06:39:48 UTC

Description of problem:

Current chain:     xxx < 20d6814c < 7a5d99a4 (top)
Live Merge result: xxx < 20d6814c (top)
Problem: 20d6814c VOLTYPE is set to INTERNAL instead of LEAF.

Checking the logs, it seems this is a result of the deleteVolume on the SPM failing to do [1]

2019-03-22 07:24:32,251+1300 INFO  (jsonrpc/1) [vdsm.api] START deleteVolume(sdUUID=u'0b0107b2-9d2a-4871-b4d6-76bdffba84cd', spUUID=u'58117215-0261-0167-0311-0000000002b6', imgUUID=u'778f6bf5-77e8-4f75-a41d-a1d5341f0c3d', volumes=[u'7a5d99a4-ad44-41f1-94cd-1d67b3d8c13c'], postZero=u'false', force=u'false', discard=False) from=::ffff:10.1.224.10,59366, flow_id=dcdde73b-3b1f-4259-89c1-3b63ff21a67a, task_id=478b1e32-aea5-4710-850a-8b525e5c165f (api:46)

2019-03-22 07:24:56,108+1300 WARN  (tasks/2) [storage.Volume] cannot finalize parent volume 20d6814c-1234-480d-8128-17fee729927d (blockVolume:625)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 617, in delete
    self.setParent(sc.BLANK_UUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1494, in setParent
    self.setParentMeta(puuid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 756, in setParentMeta
    return self._manifest.setParentMeta(puuid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 270, in setParentMeta
    self.setMetaParam(sc.PUUID, puuid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 327, in setMetaParam
    meta = self.getMetadata()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 100, in getMetadata
    metaId = self.getMetadataId()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 81, in getMetadataId
    return (self.sdUUID, self.getMetaOffset())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 85, in getMetaOffset
    md = getVolumeTag(self.sdUUID, self.volUUID, sc.TAG_PREFIX_MD)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 773, in getVolumeTag
    tags = lvm.getLV(sdUUID, volUUID).tags
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 970, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist: (u'0b0107b2-9d2a-4871-b4d6-76bdffba84cd/7a5d99a4-ad44-41f1-94cd-1d67b3d8c13c',)

This happened because the "lvm lvs" preceding it failed. And it failed with a very weird error:

2019-03-22 07:24:56,107+1300 WARN  (tasks/2) [storage.LVM] lvm lvs failed: 5 [] ['  Metadata on /dev/mapper/36000d31000e8a600000000000000000c at 27487763185152 has wrong VG name "= [', '"pv1", 167317', ']', '}', 'segment9 {', 'start_extent = 88', 'extent_count = 8', '', 'type = "striped"', 'stripe_count = 1', '', 'stripes = [', '"pv1", 16751" expected 0b0107b2-9d2a-4871-b4d6-76bdffba84cd.', '  Volume group "0b0107b2-9d2a-4871-b4d6-76bdffba84cd" not found', '  Cannot process volume group 0b0107b2-9d2a-4871-b4d6-76bdffba84cd'] (lvm:431)

2019-03-22 07:24:56,107+1300 WARN  (tasks/2) [storage.LVM] lv: 7a5d99a4-ad44-41f1-94cd-1d67b3d8c13c not found in lvs vg: 0b0107b2-9d2a-4871-b4d6-76bdffba84cd response (lvm:627)


So the VG name was "= [', '"pv1", 167317', ']', '}', 'segm....." which is actually a random piece of LVM metadata on that PV.

* The LVM metadata for this SD is 1.6MB (1300 LVs, high fragmentation of LV segments)
* There are several snapshots created/deleted around these times (Commvault)

Differently to BZ1553133, here the VG is not getting corrupted, it just fails to properly read the metadata for short periods of time.

It looks like there is some improper locking of lvm commands, or it is reading the lvm metadata while something else is still writing?

[1] https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/blockVolume.py#L616

Version-Release number of selected component (if applicable):
vdsm-4.20.46-1.el7ev.x86_64
kernel-3.10.0-957.1.3.el7.x86_64
lvm2-2.02.180-10.el7_6.2.x86_64

How reproducible:
Not yet

Comment 1 Germano Veit Michel 2019-03-27 06:45:53 UTC

This also seem to have caused the metadata not being written at all for a newly created volume, I'll update with more details tomorrow when I finish that one.

Comment 3 Germano Veit Michel 2019-03-27 06:55:08 UTC

(In reply to Germano Veit Michel from comment #0)
> Differently to BZ1553133, here the VG is not getting corrupted, it just
> fails to properly read the metadata for short periods of time.

Not entirely sure actually. But here the seqno of the metadata is sequential, there are no jumps. Still there are not that many in the first 128MB as each is 1.6MB.

Comment 5 Tal Nisan 2019-04-01 08:12:52 UTC

Nir, this seems like the same root cause as in bug 1553133 although the outcome is a bit different (no corruption), the solution should fix both since we change the locking mechanism, right?

Comment 6 Nir Soffer 2019-05-28 07:20:36 UTC

This seems like like the temporary random failures reported by qe.

David, is this a result of lvm on another host, trying to "fix" the metadata during lvchange --refresh or other command that should not change the metadata?

Comment 7 David Teigland 2019-05-28 15:40:39 UTC

(In reply to Nir Soffer from comment #6)
> This seems like like the temporary random failures reported by qe.
> 
> David, is this a result of lvm on another host, trying to "fix" the metadata
> during lvchange --refresh or other command that should not change the
> metadata?

While one host is writing VG metadata (as a normal change, not fixing anything), if another host tries to read the VG (e.g. lvs) at the same time, the lvs may see bad VG metadata and report errors.  The locking_type 4 change from bug 1553133 should prevent the 'lvs' from "fixing" any transient problem it sees.

It shouldn't be difficult to test this with lvm directly.  On one host run a loop that does repeated 'lvcreate' in the VG, and on the second host run a loop that does repeated 'lvs' in the same VG.  I expect you may see the 'lvs' report errors periodically, especially when the VG metadata grows large.

Comment 8 Nir Soffer 2019-05-28 17:07:45 UTC

(In reply to David Teigland from comment #7)
> (In reply to Nir Soffer from comment #6)
In this case we see an error in lvs command on the SPM - the only host adding, removing
or extending lvs.

Is it possible that we had:

1. host A: create/extend/delete/change tags -> writing new metadata
2. host B: run lvs/vgs/pvs/lvchange --refresh -> try fix inconsistent metadata
3. host A: run lvs -> see inconsistent metadata from bad fix by host B, fix it again

Step 1 and 2 may happen in parallel.

Comment 9 David Teigland 2019-05-28 17:20:56 UTC

> Is it possible that we had:
> 
> 1. host A: create/extend/delete/change tags -> writing new metadata
> 2. host B: run lvs/vgs/pvs/lvchange --refresh -> try fix inconsistent
> metadata
> 3. host A: run lvs -> see inconsistent metadata from bad fix by host B, fix
> it again
> 
> Step 1 and 2 may happen in parallel.

If host B was able to modify the metadata, then host A could report errors and try to fix things again.  If the metadata buffer is large enough, you might be able to see evidence of those changes from previous copies of the metadata in the buffer.

Comment 10 Nir Soffer 2019-05-28 19:21:50 UTC

David, do we have a way to detect when lvm try to fix inconsistent metadata?
can we enable specific debug logs that will reveal when this happens?

We already collect anything lvm write to stderr. Currently this goes to 
debug log that is disable by default but we can change it to log warnings
if this can help to diagnose such issues.

Comment 11 David Teigland 2019-05-28 19:35:24 UTC

Is the code not using the locking_type 4 change from bug 1553133 which should prevent the unwanted repair?

If not, then you would probably see this:
WARNING: Inconsistent metadata found for VG %s - updating to use version %u"

Comment 12 Nir Soffer 2019-05-28 19:38:31 UTC

(In reply to David Teigland from comment #11)
> Is the code not using the locking_type 4 change from bug 1553133 which
> should prevent the unwanted repair?

Not yet, integrating locking_type 4 is not that easy.

> If not, then you would probably see this:
> WARNING: Inconsistent metadata found for VG %s - updating to use version %u"

Sounds good, I'll change the logging to log lvm stderr as warning.

Comment 13 Nir Soffer 2019-05-28 21:53:47 UTC

I think the attached patch is the best we can do for now. With the improved
log, we will warnings when lvm try to fix metadata.

I think should deliver the improved log in 4.3.5, and reopen this bug when
we have more data from users.

Avihai, I think your team filed one or more bugs about the same issue, which
may be a duplicate of this bug.

Comment 15 Nir Soffer 2019-05-28 22:31:19 UTC

Possible duplicate: bug 1637405

Comment 16 Avihai 2019-05-29 10:20:30 UTC

(In reply to Nir Soffer from comment #13)
> I think the attached patch is the best we can do for now. With the improved
> log, we will warnings when lvm try to fix metadata.
> 
> I think should deliver the improved log in 4.3.5, and reopen this bug when
> we have more data from users.
> 
> Avihai, I think your team filed one or more bugs about the same issue, which
> may be a duplicate of this bug.

In the bug you mentioned deleteVolume VDSM error looks different so I'm not sure about its similarity

VDSM:
2019-05-04 21:55:30,237+0300 ERROR (tasks/7) [storage.TaskManager.Task] (Task='3f8954bd-093f-4dff-a69e-266965267a33') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1967, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 588, in delete
    self.validateDelete()

However Bug 1666795 which were resolved at 4.3.2 looks with a similar errors in VDSM log(VolumeDoesNotExist) , maybe this is more related?

Comment 18 Nir Soffer 2019-05-29 18:30:21 UTC

(In reply to Avihai from comment #16)

The error in vdsm log does not matter. It is a result of the actual issue, not
the root cause.

When we run the automation tests again with the fix, we may get warnings
from LVM showing the root cause.

The expected result is:

1. Random flow fail on the SPM, trying to access LVM with a message about 
   bad LVM metadata, like the one from comment 0:

    WARN  (tasks/2) [storage.LVM] lvm lvs failed: 5 [] ['  Metadata on
    /dev/mapper/36000d31000e8a600000000000000000c at 27487763185152 has
    wrong VG name "= [', '"pv1", 167317', ']', '}', 'segment9 {',...

2. At the same time, on another host, WARN log message with:

    WARN [storage.LVM] Command [...] succeeded with warnings: ['  WARNING:
    Inconsistent metadata found for VG xxx-yyy-zzz - updating to use version yy']

In this case we know that LVM command running on that host tried to fix
the VG metadata, which may corrupt the VG metadata - bug 1553133.

Comment 19 Avihai 2019-05-30 06:01:11 UTC

(In reply to Nir Soffer from comment #18)
> (In reply to Avihai from comment #16)
> 
> The error in vdsm log does not matter. It is a result of the actual issue,
> not
> the root cause.
> 
> When we run the automation tests again with the fix, we may get warnings
> from LVM showing the root cause.
> 
> The expected result is:
> 
> 1. Random flow fail on the SPM, trying to access LVM with a message about 
>    bad LVM metadata, like the one from comment 0:
> 
>     WARN  (tasks/2) [storage.LVM] lvm lvs failed: 5 [] ['  Metadata on
>     /dev/mapper/36000d31000e8a600000000000000000c at 27487763185152 has
>     wrong VG name "= [', '"pv1", 167317', ']', '}', 'segment9 {',...
> 
> 2. At the same time, on another host, WARN log message with:
> 
>     WARN [storage.LVM] Command [...] succeeded with warnings: ['  WARNING:
>     Inconsistent metadata found for VG xxx-yyy-zzz - updating to use version
> yy']
> 
> In this case we know that LVM command running on that host tried to fix
> the VG metadata, which may corrupt the VG metadata - bug 1553133.

Moved the discussion to clone 4.3 bug 1693075, please update all info there.
Also see my comment about retargeting this bug to 4.3.6 in that bug/

Comment 20 Nir Soffer 2019-07-04 16:56:57 UTC

David, with see these LVM warnings in ovirt system tests:

2019-07-04 12:11:45,478-0400 WARN  (jsonrpc/5) [storage.LVM] Command ['/sbin/lvm', 'lvs', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600140525911f82f9ce4e34b70c73bc2$|^/dev/mapper/3600140541592ee797ff458aba08a28a4$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--noheadings', '--units', 'b', '--nosuffix', '--separator', '|', '--ignoreskippedcluster', '-o', 'uuid,name,vg_name,attr,size,seg_start_pe,devices,tags', '05a08319-1d06-41a1-a7ad-fccd5fa3768e'] succeeded with warnings: ['  WARNING: Cannot find matching striped segment for 05a08319-1d06-41a1-a7ad-fccd5fa3768e/97fec0b7-b94c-46d1-a47d-494b4f0ccdcc.'] (lvm:391)

This warning repeated 35 times on a host which is not the SPM.

What is the meaning of this warning?

Comment 21 David Teigland 2019-07-08 16:08:49 UTC

That think it means that this host has an active LV that has been modified (e.g. extended) from another host, but has not been refreshed on this host.  So, lvm prints this warning when it seems the metadata for the LV does not match the LV active in device mapper.

Comment 22 Nir Soffer 2019-07-13 00:04:58 UTC

(In reply to David Teigland from comment #21)
> That think it means that this host has an active LV that has been modified
> (e.g. extended) from another host, but has not been refreshed on this host. 
> So, lvm prints this warning when it seems the metadata for the LV does not
> match the LV active in device mapper.

This may happen during automatic extend flow.

We see these warnings now in our system tests (49 warnings per run):

2019-07-12 15:18:34,034-0400 WARN  (jsonrpc/7) [storage.LVM] Command ['/sbin/lvm', 'lvchange', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600140510698bd4b8b84143b0f7dbbc7$|^/dev/mapper/360014053544df693e4b4f63a6ba4a428$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '-a', 'y', '--deltag', u'IU_651741de-1549-4659-88bb-37d117044df9', '--addtag', u'IU__remove_me_651741de-1549-4659-88bb-37d117044df9', '91047f09-f2d7-4453-8e11-483fed09556d/fdd994be-e45e-43b2-a614-9b9626cb0aef'] succeeded with warnings: ['  Scan of VG 91047f09-f2d7-4453-8e11-483fed09556d from /dev/mapper/360014053544df693e4b4f63a6ba4a428 found metadata seqno 72 vs previous 71.', '  Scan of VG 91047f09-f2d7-4453-8e11-483fed09556d from /dev/mapper/360014053544df693e4b4f63a6ba4a428 found metadata seqno 72 vs previous 71.'] (lvm:391)

This should run only on the SPM host, so I wonder how lvm can see inconsistent
metadata if this is the only host changing the VG/LVs?

Does lvm try to fix the metadata in this flow?

Comment 23 Nir Soffer 2019-07-13 00:09:53 UTC

This is the complete flow when this error happens:

2019-07-12 15:11:56,157-0400 DEBUG (jsonrpc/3) [jsonrpc.JsonRpcServer] Calling 'Image.delete' in bridge with {u'force': u'false', u'imageID': u'cba6f922-1def-40b7-b197-c65a1
088847d', u'postZero': u'false', u'storagepoolID': u'5eddd851-93f5-482d-b0c7-7245d98e66ea', u'storagedomainID': u'91047f09-f2d7-4453-8e11-483fed09556d', u'discard': True} (_
_init__:329)

2019-07-12 15:11:56,174-0400 INFO  (jsonrpc/3) [vdsm.api] START deleteImage(sdUUID=u'91047f09-f2d7-4453-8e11-483fed09556d', spUUID=u'5eddd851-93f5-482d-b0c7-7245d98e66ea', i
mgUUID=u'cba6f922-1def-40b7-b197-c65a1088847d', postZero=u'false', force=u'false', discard=True) from=::ffff:192.168.201.4,59046, flow_id=7240e365, task_id=457a38dc-b1a5-473
5-816f-65e6840ada4a (api:48)

2019-07-12 15:11:56,543-0400 WARN  (jsonrpc/3) [storage.LVM] Command ['/sbin/lvm', 'lvchange', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/3600140510698bd4b8b84143b0f7dbbc7$|^/dev/mapper/360014053544df693e4b4f63a6ba4a428$|", "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '-a', 'y', '--deltag', u'IU_cba6f922-1def-40b7-b197-c65a1088847d', '--addtag', u'IU__remove_me_cba6f922-1def-40b7-b197-c65a1088847d', '91047f09-f2d7-4453-8e11-483fed09556d/034748fe-3c34-4d05-88d4-00a2a3856513'] succeeded with warnings: ['  Scan of VG 91047f09-f2d7-4453-8e11-483fed09556d from /dev/mapper/360014053544df693e4b4f63a6ba4a428 found metadata seqno 31 vs previous 30.', '  Scan of VG 91047f09-f2d7-4453-8e11-483fed09556d from /dev/mapper/360014053544df693e4b4f63a6ba4a428 found metadata seqno 31 vs previous 30.'] (lvm:391)

2019-07-12 15:11:56,567-0400 INFO  (jsonrpc/3) [vdsm.api] FINISH deleteImage return=None from=::ffff:192.168.201.4,59046, flow_id=7240e365, task_id=457a38dc-b1a5-4735-816f-65e6840ada4a (api:54)

This flow runs only on the SPM.

Tested with latest CentOS 7.6.

Comment 24 Nir Soffer 2019-07-13 00:44:15 UTC

On the other host in this setup, we see these LVM commands related to this VG
before this warning on the SPM host:

2019-07-12 15:02:30,610-0400 INFO  (jsonrpc/4) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['metadata'] (lvm:1448)
2019-07-12 15:02:30,904-0400 INFO  (jsonrpc/4) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['ids'] (lvm:1448)
2019-07-12 15:02:31,092-0400 INFO  (jsonrpc/4) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['leases'] (lvm:1448)
2019-07-12 15:02:31,452-0400 INFO  (jsonrpc/4) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['inbox', 'outbox', 'xleases', 'master'] (lvm:1448)
2019-07-12 15:02:39,893-0400 INFO  (jsonrpc/2) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['c8ab8b74-dfe7-4306-9924-bc970d4a1074'] (lvm:1448)
2019-07-12 15:02:40,718-0400 INFO  (jsonrpc/4) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['c8ab8b74-dfe7-4306-9924-bc970d4a1074'] (lvm:1456)
2019-07-12 15:02:41,415-0400 INFO  (jsonrpc/0) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1448)
2019-07-12 15:02:42,159-0400 INFO  (jsonrpc/3) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1456)
2019-07-12 15:04:13,009-0400 INFO  (jsonrpc/0) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['c8ab8b74-dfe7-4306-9924-bc970d4a1074'] (lvm:1448)
2019-07-12 15:04:14,238-0400 INFO  (jsonrpc/5) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['c8ab8b74-dfe7-4306-9924-bc970d4a1074'] (lvm:1456)
2019-07-12 15:04:38,777-0400 INFO  (jsonrpc/5) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1448)
2019-07-12 15:04:40,551-0400 INFO  (jsonrpc/4) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1456)
2019-07-12 15:08:37,610-0400 INFO  (jsonrpc/4) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['b798074f-c891-459f-aafe-fb54ce6d0134', 'c8ab8b74-dfe7-4306-9924-bc970d4a1074'] (lvm:1448)
2019-07-12 15:08:38,313-0400 INFO  (jsonrpc/0) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['b798074f-c891-459f-aafe-fb54ce6d0134'] (lvm:1456)
2019-07-12 15:09:40,533-0400 INFO  (vm/762f2d6f) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1448)
2019-07-12 15:10:34,554-0400 INFO  (mailbox-hsm/2) [storage.LVM] Refreshing LVs (vg=91047f09-f2d7-4453-8e11-483fed09556d, lvs=('122328ea-bf51-43a6-8b10-fb0800aaa832',)) (lvm:1472)
2019-07-12 15:11:20,133-0400 INFO  (migsrc/762f2d6f) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['034748fe-3c34-4d05-88d4-00a2a3856513'] (lvm:1448)
2019-07-12 15:11:34,255-0400 INFO  (migsrc/762f2d6f) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['034748fe-3c34-4d05-88d4-00a2a3856513'] (lvm:1456)
2019-07-12 15:11:34,491-0400 INFO  (migsrc/762f2d6f) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['89750db5-703b-4da3-a971-a831422c319e'] (lvm:1448)
2019-07-12 15:11:35,045-0400 INFO  (migsrc/762f2d6f) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['89750db5-703b-4da3-a971-a831422c319e'] (lvm:1456)
2019-07-12 15:11:35,397-0400 INFO  (jsonrpc/4) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1456)
2019-07-12 15:11:41,532-0400 INFO  (jsonrpc/0) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['89750db5-703b-4da3-a971-a831422c319e'] (lvm:1448)
2019-07-12 15:11:42,054-0400 INFO  (jsonrpc/0) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['89750db5-703b-4da3-a971-a831422c319e'] (lvm:1456)
2019-07-12 15:11:42,629-0400 INFO  (vm/762f2d6f) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['122328ea-bf51-43a6-8b10-fb0800aaa832'] (lvm:1448)
2019-07-12 15:11:44,411-0400 INFO  (vm/762f2d6f) [storage.LVM] Activating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['034748fe-3c34-4d05-88d4-00a2a3856513'] (lvm:1448)
2019-07-12 15:11:55,847-0400 INFO  (vm/762f2d6f) [storage.LVM] Deactivating lvs: vg=91047f09-f2d7-4453-8e11-483fed09556d lvs=['034748fe-3c34-4d05-88d4-00a2a3856513'] (lvm:1456)

Is it possible that activating or deactivating LVs or refreshing another LV
from another hosts caused LVM to try to fix the VG metadata without logging
any warnings?

Comment 25 David Teigland 2019-07-15 14:49:00 UTC

"found metadata seqno 31 vs previous 30" -- these messages usually mean that the metadata has changed between scanning consecutive metadata areas for the VG.  This is during the label scan phase where VG locks are not held, so it could happen legitimately from time to time if you're reading a VG while it's being written from another host.  LVM will not try to repair when it sees this.  I think I should look at a completely debug logs of the LVM commands running during one of these events, to check exactly what's happening.  It would be best to log lvm commands from all hosts during this event in case there are any changes being made that we don't expect.

Comment 26 Daniel Gur 2019-08-28 13:15:07 UTC

sync2jira

Comment 27 Daniel Gur 2019-08-28 13:20:09 UTC

sync2jira

Comment 28 Marina Kalinin 2020-03-21 14:22:23 UTC

Hi Nir,

Lot's of work has been done on this bug so far. But no fix yet. 
With the work done on bug 1553133, could there be any progress on this bz as well?

Comment 29 Nir Soffer 2020-03-21 15:37:27 UTC

This looks like a duplicate of bug 1553133.

*** This bug has been marked as a duplicate of bug 1553133 ***

Comment 30 Nir Soffer 2020-05-20 23:09:00 UTC

Removing stale need infos.

Note You need to log in before you can comment on or make changes to this bug.