Description of problem: lvchange --refresh is periodically called on system RHV internal LVs (ids,metadata, ....) Version-Release number of selected component (if applicable): RHEV 3.6,4.0,4.1 How reproducible: 100% Steps to Reproduce: 1.Create at least one SD and establish SPM 2. Run the following on the SPM grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l Actual results: In the specific env (40SDs) in 30 min grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l 1176 Expected results: The refresh is not called on these LVs Additional info: This seems to be introduced with Bug 1358348 - VM qcow2 disk got corrupted after live migration https://bugzilla.redhat.com/show_bug.cgi?id=1358348
(In reply to Roman Hodain from comment #0) > Description of problem: > lvchange --refresh is periodically called on system RHV internal LVs > (ids,metadata, ....) > > Version-Release number of selected component (if applicable): > RHEV 3.6,4.0,4.1 Does it affect 4.1 as well? 4.0.7? Guy, can you share the result when you do a similar test on your current setup? > > How reproducible: > 100% > > Steps to Reproduce: > 1.Create at least one SD and establish SPM > 2. Run the following on the SPM > grep -E > 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' > /var/log/vdsm/vdsm.log | wc -l > > Actual results: > In the specific env (40SDs) in 30 min > grep -E > 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' > /var/log/vdsm/vdsm.log | wc -l > 1176 > > Expected results: > The refresh is not called on these LVs > > Additional info: > This seems to be introduced with > Bug 1358348 - VM qcow2 disk got corrupted after live migration > https://bugzilla.redhat.com/show_bug.cgi?id=1358348
On 4.0.7, with a single domain, ~1000 disks: root@ucs1-b420-2 ~]# lvs |wc -l 1110 [root@ucs1-b420-2 ~]# lvs |grep -c metadata 3 [root@ucs1-b420-2 ~]# grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log |wc -l 24 (In ~15 minutes). So close to the number above. I guess they have so many SDs because of the limit of disks per SD? Well, in 4.0.7 (as seen above) 1000 or so disks in a single SD work well, so perhaps that would help.
(In reply to Roman Hodain from comment #0) > Description of problem: > lvchange --refresh is periodically called on system RHV internal LVs > (ids,metadata, ....) > Expected results: > The refresh is not called on these LVs Some of these lvs may change and need to be refresh. These refreshes come from code trying to activate the lvs. If the lvs are already active, we refresh them. This logic was added after we had corrupted disks caused by lv left active, and modified on the spm host, and use again without refreshing the lv, leading to corruption of the qemu image, reading behind the end of the lv. I think the solution for this bug is proper storage domain life cycle management, I started to work on this here: https://gerrit.ovirt.org/56876 Once we have this, we only need to refresh the lv that may change on another host. This may be triggered by the domain monitor during periodic domain refreshes. We can also refresh on demand, for example when trying to access an offset which is after the end of an lv.
(In reply to Nir Soffer from comment #7) > (In reply to Roman Hodain from comment #0) > > Description of problem: > > lvchange --refresh is periodically called on system RHV internal LVs > > (ids,metadata, ....) > > Expected results: > > The refresh is not called on these LVs > > Some of these lvs may change and need to be refresh. These refreshes come > from > code trying to activate the lvs. If the lvs are already active, we refresh > them. > > This logic was added after we had corrupted disks caused by lv left active, > and > modified on the spm host, and use again without refreshing the lv, leading > to corruption of the qemu image, reading behind the end of the lv. > > I think the solution for this bug is proper storage domain life cycle > management, > I started to work on this here: > https://gerrit.ovirt.org/56876 > > Once we have this, we only need to refresh the lv that may change on another > host. > This may be triggered by the domain monitor during periodic domain refreshes. > > We can also refresh on demand, for example when trying to access an offset > which > is after the end of an lv. I thought that these devices are static in size and as we do not extend them then I do not see the reason for the refresh. I may be missing something. Can you give me some more backround about this?
(In reply to Roman Hodain from comment #8) > (In reply to Nir Soffer from comment #7) > > (In reply to Roman Hodain from comment #0) > > > Description of problem: > > > lvchange --refresh is periodically called on system RHV internal LVs > > > (ids,metadata, ....) > > > Expected results: > > > The refresh is not called on these LVs > > > > Some of these lvs may change and need to be refresh. These refreshes come > > from > > code trying to activate the lvs. If the lvs are already active, we refresh > > them. > > > > This logic was added after we had corrupted disks caused by lv left active, > > and > > modified on the spm host, and use again without refreshing the lv, leading > > to corruption of the qemu image, reading behind the end of the lv. > > > > I think the solution for this bug is proper storage domain life cycle > > management, > > I started to work on this here: > > https://gerrit.ovirt.org/56876 > > > > Once we have this, we only need to refresh the lv that may change on another > > host. > > This may be triggered by the domain monitor during periodic domain refreshes. > > > > We can also refresh on demand, for example when trying to access an offset > > which > > is after the end of an lv. > > I thought that these devices are static in size and as we do not extend them > then I do not see the reason for the refresh. I may be missing something. > > Can you give me some more backround about this? Most of these lvs never change. - ids - 8MiB, never extended - leases - 2048MiB, not extended in current code, but we consider extending it to support more than 1900 disks per storage domain. - xleases - 1024MiB, not extended in current code, but it should. - inbox - 16MiB, never extended - outbox - 16MiB, never extended - metadata - at least 512MiB, may be extended when extending vg or resizing a pv all hosts access this volume to get volume metadata - master - 1024MiB, never extended All sizes are rounded up to lvm extent size (128MiB). All of these are activate in one lvchange command in many places. If some of the special lvs are already active, we refresh them instead. We may improve this by adding a refresh option to lvm.activateLVs, and when activating the special lvs, perform one call for the static lvs without refresh, and one call for the dynamic lvs with refresh.
Adding bad example during connectStoragePool: 2017-03-23 16:49:54,237-0400 INFO (jsonrpc/5) [dispatcher] Run and protect: connectStoragePool(spUUID=u'd117bf29-20c2-4b66-b95a-e8391fb1d216', hostID=1, msdUUID=u'ab1eece5-dd95-4082-8e0f-3a887cde2519', masterVersion=1, domainsMap={u'ab1eece5-dd95-4082-8e0f-3a887cde2519': u'active'}, options=None) (logUtils:51) 2017-03-23 16:49:54,238-0400 INFO (jsonrpc/5) [storage.StoragePoolMemoryBackend] new storage pool master version 1 and domains map {u'ab1eece5-dd95-4082-8e0f-3a887cde2519': u'Active'} (spbackends:450) 2017-03-23 16:49:54,512-0400 INFO (periodic/3) [dispatcher] Run and protect: repoStats(options=None) (logUtils:51) 2017-03-23 16:49:54,513-0400 INFO (periodic/3) [dispatcher] Run and protect: repoStats, Return response: {u'ab1eece5-dd95-4082-8e0f-3a887cde2519': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00124888', 'lastCheck': '3.5', 'valid': True}} (logUtils:54) 2017-03-23 16:49:54,544-0400 INFO (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata'] (lvm:1291) 2017-03-23 16:49:54,679-0400 INFO (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['ids'] (lvm:1291) 2017-03-23 16:49:54,741-0400 INFO (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['leases'] (lvm:1291) 2017-03-23 16:49:54,944-0400 INFO (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata', 'leases', 'ids', 'inbox', 'outbox', 'xleases', 'master'] (lvm:1291) 2017-03-23 16:49:55,331-0400 INFO (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata', 'leases', 'ids', 'inbox', 'outbox', 'xleases', In the same flow: - we refresh metadata and ids and leases 3 times - we refresh inbox, outbox, xleases, and master 2 times - ids, inbox, outbox, and master should never be refreshed - we do 5 lvm calls instead of one call for the lv that need to be refreshed
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4. Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.
Nir, was that bug solved as well as a part of the LVM filter work?
(In reply to Tal Nisan from comment #16) > Nir, was that bug solved as well as a part of the LVM filter work? No, this is not related to lvm filter.
It is not clear what is the value of trying to optimizing the refreshes. I think the first thing to do is to measure what is the load generated by the refreshes, for example by disable all refreshes. Then we can estimate what is the possible improvement that we can make.
(In reply to Nir Soffer from comment #18) > It is not clear what is the value of trying to optimizing the refreshes. > > I think the first thing to do is to measure what is the load generated by > the > refreshes, for example by disable all refreshes. Then we can estimate what > is the > possible improvement that we can make. Guy, I believe you've done exactly that in the past? Can you share you experience?
Issue should be fixed by the posted patches. 1. We log now only once per refresh, previously we logged twice for each refresh, increasing the noise 2. The special lvs are never refreshed when activated 3. The special lvs are never refreshed during vdsm stratup 4. The metadata lv is refreshed now only when trying to read or write after the end of the lv. When we start to extend the leases and xleases volumes, we will have to refresh them like the metadata volume is refreshed.
Verified with the following code: ---------------------------------------- ovirt-engine-4.2.1.2-0.1.el7.noarch vdsm-4.20.14-22.git543a886.el7.centos.x86_64 Verified with the following scenario; ---------------------------------------- Steps to Reproduce: 1.Create at least one SD and establish SPM 2. Run the following on the SPM grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l Actual results: In the specific env (40SDs) in 30 min grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l The refresh was not called on these LV Moving to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync