Bug 1379365 - Performance regression when manipulating larger metadata
Summary: Performance regression when manipulating larger metadata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Rajnoha
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1377984
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-26 13:24 UTC by Peter Rajnoha
Modified: 2016-11-04 04:19 UTC (History)
12 users (show)

Fixed In Version: lvm2-2.02.166-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1377984
Environment:
Last Closed: 2016-11-04 04:19:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Peter Rajnoha 2016-09-26 13:24:42 UTC
I'm creating a clone of original upstream bug report and proposing this for RHEL 7.3 as blocker to get this into 7.3 still to fix performance regression when processing LVM metadata with high number of LVs.

+++ This bug was initially created as a clone of Bug #1377984 +++

Description of problem:

lvm2 code has now  O(n)^2 complexity when dealing with lvm2 metadata.
This has been introduced with commit: 687029cbbd5b97d545363b4f7448b5a1fe71f3c5
With this patch operations with dm_config_tree are significantly slowed down since each operation now searched for duplicate entries through the whole list of dm_config_nodes.

This patch was introduced in version 2.02.113



Version-Release number of selected component (if applicable):
2.02.165

How reproducible:


Steps to Reproduce:
1. create large metadata
2. run i.e.  lvs  command
3. double metadata size
4. compare time of lvs command - in correct version this should scale linearly


Actual results:
O(n)^2  complexity

Expected results:
O(n) complexity

Additional info:

--- Additional comment from Peter Rajnoha on 2016-09-23 15:47:39 CEST ---

These patches avoid checking for duplicate config nodes in metadata config trees (which causes the complexity as mentioned in comment #0). When constructing metadata config trees, duplicate nodes are already avoided by direct metadata validation before such metadata are written to disk.

The patches:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=7563e69cf13126af5889de147c092b9d2e490648

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=e40fbd08c8e3da43d07aabe58bd5549105056908

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=045772aa30ba1c5ab9dff730ce25ca6346c7150a

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=f1cad4c7103ac0edc0b724217b50d062326deb43

--- Additional comment from Peter Rajnoha on 2016-09-23 15:48:26 CEST ---

Scheduled for lvm2 v2.02.166 upstream release.

--- Additional comment from Zdenek Kabelac on 2016-09-26 15:16:22 CEST ---

Some performance numbers take on  T61  2.2GHz CPU  4G RAM:
with kernel: 4.7.2-201.fc24.x86_64

(These are for 'stable set of metadata LV' I keep for performance testing - and were used whenever I'm doing some regression testings:
PV sits on loop device on tmpfs  backend file (so no disk I/O latencies)


Plan 'time lvs' on VG  with:

----- 1700LV -----

non-lvmetad: 0.6s
lvmetad:     0.3s   (0.8s with initial scan)

fixed_non-lvmetad: 0.4s
fixed_lvmetad:     0.2s  (0.5s with initial scan)

----- 7200LV  -----

non-lvmetad: 5.0s
lvmetad:     2.7s        (7.6s with inital scan)
(CPU time consumed by lvmetad by 2 'lvs' queries: 1.3s)


fixed_non-lvmetad: 1.0s
fixed_lvmetad:     0.7s      (1.5s with initial scan)
(CPU time consumed by lvmetad by 2 'lvs' queries: 0.4s)





As one can immediately observe - patches from  comment 1  do have major impact and already improve things a lot.

Other note to take - we still do have some other 'minor' regressions even with 'fixed' version as timings for 'lvmetad' case  are  'too good' compared with non-lvmetad case.

However these needs more clever structures to be used - so outside of 7.3 range.

Comment 3 Corey Marthaler 2016-09-29 15:01:35 UTC
Marking verified sanityonly. lvs times seem to be relative inconsistent to draw any firm conclusions with a small sample size, that said none took as long as reported originally.

[root@host-116 ~]# lvs -a -o +devices | wc -l
1473
[root@host-116 ~]# time lvs
real    0m3.266s
user    0m0.499s
sys     0m1.392s


[root@host-116 ~]# lvs -a -o +devices | wc -l
1660
[root@host-116 ~]# time lvs
real    0m0.783s
user    0m0.203s
sys     0m0.234s


[root@host-116 ~]# lvs -a -o +devices | wc -l
3797
[root@host-116 ~]# time lvs
real    0m1.022s
user    0m0.442s
sys     0m0.522s



3.10.0-510.el7.x86_64
lvm2-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-libs-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
lvm2-cluster-2.02.166-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-event-libs-1.02.135-1.el7    BUILT: Wed Sep 28 02:26:52 CDT 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016

Comment 5 errata-xmlrpc 2016-11-04 04:19:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.