Bug 1432039 - lvchange --refresh generates unneeded load on lvm
Summary: lvchange --refresh generates unneeded load on lvm
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.2.1
: ---
Assignee: Nir Soffer
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks: 1386732
TreeView+ depends on / blocked
 
Reported: 2017-03-14 12:02 UTC by Roman Hodain
Modified: 2020-06-11 13:24 UTC (History)
12 users (show)

Fixed In Version: vdsm v4.20.14
Doc Type: Bug Fix
Doc Text:
Previously, VDSM was refreshing active logical volumes that did not change (or never change) and do not need refresh, increasing the load on the storage server, delaying other LVM operations, and adding noise to the logs. Now, VDSM only refreshes logical volumes that have been changed, so there are no more useless refresh operations.
Clone Of:
Environment:
Last Closed: 2018-05-15 17:51:25 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:1489 0 None None None 2018-05-15 17:53:40 UTC
oVirt gerrit 85887 0 master MERGED lvm: Do not refresh lvs static lvs 2021-02-16 15:02:17 UTC
oVirt gerrit 85913 0 master MERGED blockSD: Refresh the metadata lv only if needed 2021-02-16 15:02:17 UTC
oVirt gerrit 85914 0 master MERGED lvm: Do not refresh special lvs during lvm bootstrap 2021-02-16 15:02:17 UTC
oVirt gerrit 85915 0 master MERGED lvm: Log only once when we refresh active lvs 2021-02-16 15:02:17 UTC

Description Roman Hodain 2017-03-14 12:02:12 UTC
Description of problem:
     lvchange --refresh is periodically called on system RHV internal LVs (ids,metadata, ....)

Version-Release number of selected component (if applicable):
     RHEV 3.6,4.0,4.1

How reproducible:
     100%

Steps to Reproduce:
     1.Create at least one SD and establish SPM
     2. Run the following on the SPM 
          grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l

Actual results:
    In the specific env (40SDs) in 30 min
         grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l
         1176

Expected results:
    The refresh is not called on these LVs

Additional info:
    This seems to be introduced with 
         Bug 1358348 - VM qcow2 disk got corrupted after live migration
         https://bugzilla.redhat.com/show_bug.cgi?id=1358348

Comment 2 Yaniv Kaul 2017-03-14 12:15:38 UTC
(In reply to Roman Hodain from comment #0)
> Description of problem:
>      lvchange --refresh is periodically called on system RHV internal LVs
> (ids,metadata, ....)
> 
> Version-Release number of selected component (if applicable):
>      RHEV 3.6,4.0,4.1

Does it affect 4.1 as well? 4.0.7?
Guy, can you share the result when you do a similar test on your current setup?

> 
> How reproducible:
>      100%
> 
> Steps to Reproduce:
>      1.Create at least one SD and establish SPM
>      2. Run the following on the SPM 
>           grep -E
> 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)'
> /var/log/vdsm/vdsm.log | wc -l
> 
> Actual results:
>     In the specific env (40SDs) in 30 min
>          grep -E
> 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)'
> /var/log/vdsm/vdsm.log | wc -l
>          1176
> 
> Expected results:
>     The refresh is not called on these LVs
> 
> Additional info:
>     This seems to be introduced with 
>          Bug 1358348 - VM qcow2 disk got corrupted after live migration
>          https://bugzilla.redhat.com/show_bug.cgi?id=1358348

Comment 3 Yaniv Kaul 2017-03-14 12:22:32 UTC
On 4.0.7, with a single domain, ~1000 disks:
root@ucs1-b420-2 ~]# lvs |wc -l
1110

[root@ucs1-b420-2 ~]# lvs |grep -c metadata
3

[root@ucs1-b420-2 ~]# grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log |wc -l
24

(In ~15 minutes). So close to the number above.


I guess they have so many SDs because of the limit of disks per SD? Well, in 4.0.7 (as seen above) 1000 or so disks in a single SD work well, so perhaps that would help.

Comment 7 Nir Soffer 2017-03-14 17:50:34 UTC
(In reply to Roman Hodain from comment #0)
> Description of problem:
>      lvchange --refresh is periodically called on system RHV internal LVs
> (ids,metadata, ....)
> Expected results:
>     The refresh is not called on these LVs

Some of these lvs may change and need to be refresh. These refreshes come from 
code trying to activate the lvs. If the lvs are already active, we refresh them.

This logic was added after we had corrupted disks caused by lv left active, and 
modified on the spm host, and use again without refreshing the lv, leading
to corruption of the qemu image, reading behind the end of the lv.

I think the solution for this bug is proper storage domain life cycle management,
I started to work on this here:
https://gerrit.ovirt.org/56876

Once we have this, we only need to refresh the lv that may change on another host.
This may be triggered by the domain monitor during periodic domain refreshes.

We can also refresh on demand, for example when trying to access an offset which
is after the end of an lv.

Comment 8 Roman Hodain 2017-03-16 10:53:23 UTC
(In reply to Nir Soffer from comment #7)
> (In reply to Roman Hodain from comment #0)
> > Description of problem:
> >      lvchange --refresh is periodically called on system RHV internal LVs
> > (ids,metadata, ....)
> > Expected results:
> >     The refresh is not called on these LVs
> 
> Some of these lvs may change and need to be refresh. These refreshes come
> from 
> code trying to activate the lvs. If the lvs are already active, we refresh
> them.
> 
> This logic was added after we had corrupted disks caused by lv left active,
> and 
> modified on the spm host, and use again without refreshing the lv, leading
> to corruption of the qemu image, reading behind the end of the lv.
> 
> I think the solution for this bug is proper storage domain life cycle
> management,
> I started to work on this here:
> https://gerrit.ovirt.org/56876
> 
> Once we have this, we only need to refresh the lv that may change on another
> host.
> This may be triggered by the domain monitor during periodic domain refreshes.
> 
> We can also refresh on demand, for example when trying to access an offset
> which
> is after the end of an lv.

I thought that these devices are static in size and as we do not extend them then I do not see the reason for the refresh. I may be missing something.

Can you give me some more backround about this?

Comment 9 Nir Soffer 2017-03-16 12:01:06 UTC
(In reply to Roman Hodain from comment #8)
> (In reply to Nir Soffer from comment #7)
> > (In reply to Roman Hodain from comment #0)
> > > Description of problem:
> > >      lvchange --refresh is periodically called on system RHV internal LVs
> > > (ids,metadata, ....)
> > > Expected results:
> > >     The refresh is not called on these LVs
> > 
> > Some of these lvs may change and need to be refresh. These refreshes come
> > from 
> > code trying to activate the lvs. If the lvs are already active, we refresh
> > them.
> > 
> > This logic was added after we had corrupted disks caused by lv left active,
> > and 
> > modified on the spm host, and use again without refreshing the lv, leading
> > to corruption of the qemu image, reading behind the end of the lv.
> > 
> > I think the solution for this bug is proper storage domain life cycle
> > management,
> > I started to work on this here:
> > https://gerrit.ovirt.org/56876
> > 
> > Once we have this, we only need to refresh the lv that may change on another
> > host.
> > This may be triggered by the domain monitor during periodic domain refreshes.
> > 
> > We can also refresh on demand, for example when trying to access an offset
> > which
> > is after the end of an lv.
> 
> I thought that these devices are static in size and as we do not extend them
> then I do not see the reason for the refresh. I may be missing something.
> 
> Can you give me some more backround about this?

Most of these lvs never change.

- ids - 8MiB, never extended
- leases - 2048MiB, not extended in current code, but we consider extending 
  it to support more than 1900 disks per storage domain.
- xleases - 1024MiB, not extended in current code, but it should.
- inbox - 16MiB, never extended
- outbox - 16MiB, never extended
- metadata - at least 512MiB, may be extended when extending vg or resizing a pv
  all hosts access this volume to get volume metadata
- master - 1024MiB, never extended

All sizes are rounded up to lvm extent size (128MiB).

All of these are activate in one lvchange command in many places. If some of
the special lvs are already active, we refresh them instead.

We may improve this by adding a refresh option to lvm.activateLVs, and when 
activating the special lvs, perform one call for the static lvs without refresh,
and one call for the dynamic lvs with refresh.

Comment 14 Nir Soffer 2017-03-23 23:39:55 UTC
Adding bad example during connectStoragePool:

2017-03-23 16:49:54,237-0400 INFO  (jsonrpc/5) [dispatcher] Run and protect: connectStoragePool(spUUID=u'd117bf29-20c2-4b66-b95a-e8391fb1d216', hostID=1, msdUUID=u'ab1eece5-dd95-4082-8e0f-3a887cde2519', masterVersion=1, domainsMap={u'ab1eece5-dd95-4082-8e0f-3a887cde2519': u'active'}, options=None) (logUtils:51)
2017-03-23 16:49:54,238-0400 INFO  (jsonrpc/5) [storage.StoragePoolMemoryBackend] new storage pool master version 1 and domains map {u'ab1eece5-dd95-4082-8e0f-3a887cde2519': u'Active'} (spbackends:450)
2017-03-23 16:49:54,512-0400 INFO  (periodic/3) [dispatcher] Run and protect: repoStats(options=None) (logUtils:51)
2017-03-23 16:49:54,513-0400 INFO  (periodic/3) [dispatcher] Run and protect: repoStats, Return response: {u'ab1eece5-dd95-4082-8e0f-3a887cde2519': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00124888', 'lastCheck': '3.5', 'valid': True}} (logUtils:54)
2017-03-23 16:49:54,544-0400 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata'] (lvm:1291)
2017-03-23 16:49:54,679-0400 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['ids'] (lvm:1291)
2017-03-23 16:49:54,741-0400 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['leases'] (lvm:1291)
2017-03-23 16:49:54,944-0400 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata', 'leases', 'ids', 'inbox', 'outbox', 'xleases', 'master'] (lvm:1291)
2017-03-23 16:49:55,331-0400 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=ab1eece5-dd95-4082-8e0f-3a887cde2519 lvs=['metadata', 'leases', 'ids', 'inbox', 'outbox', 'xleases', 

In the same flow:

- we refresh metadata and ids and leases 3 times
- we refresh inbox, outbox, xleases, and master 2 times
- ids, inbox, outbox, and master should never be refreshed
- we do 5 lvm calls instead of one call for the lv that need to be refreshed

Comment 15 Allon Mureinik 2017-07-02 20:38:19 UTC
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4.

Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.

Comment 16 Tal Nisan 2017-12-27 13:17:36 UTC
Nir, was that bug solved as well as a part of the LVM filter work?

Comment 17 Nir Soffer 2018-01-01 23:35:41 UTC
(In reply to Tal Nisan from comment #16)
> Nir, was that bug solved as well as a part of the LVM filter work?

No, this is not related to lvm filter.

Comment 18 Nir Soffer 2018-01-01 23:45:27 UTC
It is not clear what is the value of trying to optimizing the refreshes.

I think the first thing to do is to measure what is the load generated by the 
refreshes, for example by disable all refreshes. Then we can estimate what is the
possible improvement that we can make.

Comment 19 Yaniv Kaul 2018-01-02 07:35:43 UTC
(In reply to Nir Soffer from comment #18)
> It is not clear what is the value of trying to optimizing the refreshes.
> 
> I think the first thing to do is to measure what is the load generated by
> the 
> refreshes, for example by disable all refreshes. Then we can estimate what
> is the
> possible improvement that we can make.

Guy, I believe you've done exactly that in the past? Can you share you experience?

Comment 20 Nir Soffer 2018-01-02 19:32:42 UTC
Issue should be fixed by the posted patches.

1. We log now only once per refresh, previously we logged twice for each refresh,
   increasing the noise

2. The special lvs are never refreshed when activated

3. The special lvs are never refreshed during vdsm stratup

4. The metadata lv is refreshed now only when trying to read or write after the
   end of the lv.

When we start to extend the leases and xleases volumes, we will have to refresh
them like the metadata volume is refreshed.

Comment 21 Kevin Alon Goldblatt 2018-01-25 14:35:59 UTC
Verified with the following code:
----------------------------------------
ovirt-engine-4.2.1.2-0.1.el7.noarch
vdsm-4.20.14-22.git543a886.el7.centos.x86_64

Verified with the following scenario;
----------------------------------------

Steps to Reproduce:
     1.Create at least one SD and establish SPM
     2. Run the following on the SPM 
          grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l

Actual results:
    In the specific env (40SDs) in 30 min
         grep -E 'lvchange.*--refresh.*(metadata|ids|leases|master|inbox|outbox)' /var/log/vdsm/vdsm.log | wc -l

The refresh was not called on these LV


Moving to VERIFIED

Comment 26 errata-xmlrpc 2018-05-15 17:51:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1489

Comment 27 Franta Kust 2019-05-16 13:07:10 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.