Bug 1432385 - 'imgbase layout' takes very long time to complete on rhv-h hypervisor with a large number of LUNs attached
Summary: 'imgbase layout' takes very long time to complete on rhv-h hypervisor with a ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: imgbased
Version: 4.0.6
Hardware: All
OS: Linux
medium
high
Target Milestone: ovirt-4.1.1-1
: ---
Assignee: Ryan Barry
QA Contact: dguo
URL:
Whiteboard: node
Depends On:
Blocks: 1437505
TreeView+ depends on / blocked
 
Reported: 2017-03-15 09:39 UTC by Igor Netkachev
Modified: 2020-05-14 15:49 UTC (History)
17 users (show)

Fixed In Version: imgbased-0.9.19-0.1.el7ev
Doc Type: Bug Fix
Doc Text:
Previously, running imgbase layout took a very long time to complete on Red Hat Virtualization Hosts (RHVH) that had a very large number of LUNs or other LVM devices attached, due to the fact that it did not filter LVM commands with known tags. In this release, the LVM devices are filtered, which dramatically improves the performance for systems with a large number of LUNs.
Clone Of:
: 1437505 (view as bug list)
Environment:
Last Closed: 2017-04-20 19:04:41 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
rbarry: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1114 0 normal SHIPPED_LIVE redhat-virtualization-host bug fix and enhancement update 2017-04-20 22:57:46 UTC
oVirt gerrit 74720 0 master MERGED imgbase|lvm: filter LVs always 2020-06-18 11:34:09 UTC
oVirt gerrit 74889 0 ovirt-4.1 MERGED imgbase|lvm: filter LVs always 2020-06-18 11:34:09 UTC
oVirt gerrit 75103 0 ovirt-4.0 MERGED imgbase|lvm: filter LVs always 2020-06-18 11:34:09 UTC
oVirt gerrit 75312 0 master MERGED lvm: don't overwrite cmd when filtering 2020-06-18 11:34:09 UTC
oVirt gerrit 75313 0 ovirt-4.1 MERGED lvm: don't overwrite cmd when filtering 2020-06-18 11:34:09 UTC
oVirt gerrit 75314 0 ovirt-4.0 MERGED lvm: don't overwrite cmd when filtering 2020-06-18 11:34:08 UTC

Description Igor Netkachev 2017-03-15 09:39:11 UTC
Description of problem:
If there's a large number of LUNs attached to the hypervisor then 'imgbase layout' takes very long time to complete, at the same time 'imgbase check' and basic LVM commands like 'lvs', 'vgs', 'pvs' do not have such problem.

Version-Release number of selected component (if applicable):
Red Hat Virtualization Host 4.0 (el7.3)
Kernel 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
imgbased-0.8.11-0.1.el7ev.noarch


How reproducible:
100%

Steps to Reproduce:
1. Install latest RHV-H host
2. Attach 250+ LUNs
3. run 'imgbase layout'

Actual results:
Command takes ages to complete (most probably because it scans the devices which it should not scan)

Expected results:
Command completes in shorter time

Additional info:

note the difference between 'imbase check' and 'imgbase layout' (the latter takes ~70 min to complete):

#time imgbase check
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK

real    0m13.138s
user    0m6.414s
sys     0m0.812s


#time imgbase layout
rhvh-4.0-0.20170104.0
 +- rhvh-4.0-0.20170104.0+1
rhvh-4.0-0.20170201.0
 +- rhvh-4.0-0.20170201.0+1

real    70m19.711s
user    5m42.503s
sys     11m15.482s


It also has been noted by the customer that 'yum' command also runs with huge delay, should we open a separate bug on that?

Comment 3 Sandro Bonazzola 2017-03-16 10:22:51 UTC
Can you please re-test with 4.0.7 which has been released yesterday?
It should have improvements in this area

Comment 4 Igor Netkachev 2017-03-21 12:08:05 UTC
Sandro,

Customer has just re-tested with the latest RHV-H and the problem is still there:

# time imgbase check
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK

real    0m16.509s
user    0m8.052s
sys     0m1.818s


# time imgbase layout

rhvh-4.0-0.20170104.0
 +- rhvh-4.0-0.20170104.0+1
rhvh-4.0-0.20170201.0
 +- rhvh-4.0-0.20170201.0+1
rhvh-4.0-0.20170307.0
 +- rhvh-4.0-0.20170307.0+1

real    95m14.719s
user    10m0.454s
sys     24m36.242s


--
Kind Regards,
Igor Netkachev
Technical Support Engineer
Red Hat Global Support Services

Comment 5 Ryan Barry 2017-03-21 22:08:14 UTC
Imgbased should just be checking "lvs -oname, tags" and filtering. No scanning of lvm beyond tags is done, though the code may not be very optimized for large sets.

What is the runtime when checking tags? Is there a test system I can look at?

Comment 6 dguo 2017-03-23 08:30:01 UTC
QE attempted to reproduce this issue in-house, the details steps are as follow, and we will keep the environment to Ryan by email for further investigation and debug

Since we don't have an environment as complicated as the customer's, we figured out a way to simulate the customer's situation, which is to create 1000 lvs, and then run command "time imgbase layout". 

Test versions:
redhat-virtualization-host-4.0-20170307.1
imgbased-0.8.16-0.1.el7ev.noarch

Steps:
1. Install RHVH on the local disk of a machine which has 21 iSCSI luns.
2. Register to engine, and attach the 21 iSCSI luns in one storage domain.
3. Run a script to create 1000 lvs by "lvcreate -L 1500M 78c869fe-4447-4a49-9a8d-5a132c7cb6a1"
4. Run "time imgbase layout"

Results:
1. After step4, the cmd failed with errors, took a long time:

[root@bootp-73-75-11 ~]# time imgbase layout
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/imgbased/__main__.py", line 51, in <module>
    CliApplication()
  File "/usr/lib/python2.7/site-packages/imgbased/__init__.py", line 82, in CliApplication
    app.hooks.emit("post-arg-parse", args)
  File "/usr/lib/python2.7/site-packages/imgbased/hooks.py", line 120, in emit
    cb(self.context, *args)
  File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 172, in post_argparse
    print(layout.dumps())
  File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 200, in dumps
    return self.app.imgbase.layout()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 138, in layout
    return self.naming.layout()
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout
    tree = self.tree(lvs)
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 205, in tree
    names = datasource()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 91, in list_our_lv_names
    our_lvs = [lv for lv in lvs if has_our_tag(lv)]
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 89, in has_our_tag
    for tag in our_tags)
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 89, in <genexpr>
    for tag in our_tags)
  File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 239, in tags
    self.lvm_name]).split(",")
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 331, in lvs
    return self.call(["lvs"] + args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 385, in call
    return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 326, in call
    stdout = call(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 120, in call
    return subprocess.check_output(*args, **kwargs).strip()
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['lvs', '--noheadings', '-olv_tags', u'78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol779']' returned non-zero exit status 5

real    81m44.297s
user    2m56.766s
sys    6m51.057s

2. Try to run "imgbase layout" again, this time it succeed without errors, and total time is shorter:

[root@bootp-73-75-11 ~]# time imgbase layout
rhvh-4.0-0.20170307.0
 +- rhvh-4.0-0.20170307.0+1

real    13m43.746s
user    1m27.120s
sys    0m54.498s

Additional info:
1. When running "time imgbase layout", cmd "lvs --noheadings -olv_tags vgname/lvname" will be called for each lv.

[root@bootp-73-75-11 ~]# ps -ef |grep lvs
root      4489  4346  0 14:51 pts/0    00:00:00 lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol111
root      4491 50102  0 14:51 pts/1    00:00:00 grep --color=auto lvs


2. The runtime of "lvs --noheadings -olv_tags vgname/lvname" while running "time imgbase layout" for the first time is:

[root@bootp-73-75-11 ~]# time lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol139
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
        

real    0m3.081s
user    0m0.090s
sys    0m0.393s

3. The runtime of "lvs --noheadings -olv_tags vgname/lvname" after "time imgbase layout" finished is:

[root@bootp-73-75-11 ~]# time lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol139
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
        

real    0m0.252s
user    0m0.041s
sys    0m0.026s

We can see from the results that the time of "lvs --noheadings -olv_tags vgname/lvname" before "time imgbase layout" finished is much longer than the one after "time imgbase layout" finished, and this may be why the first time running "time imgbase layout" consumed much more time than the second time.

Comment 12 Sandro Bonazzola 2017-04-07 07:46:34 UTC
All patches merged, moving to modified, pending build

Comment 13 Emma Heftman 2017-04-09 11:24:25 UTC
Hi Ryan
I'm working on the doc text and it's not clear to me what the --select was used for in this context as I only see it mentioned in the doc text and not in the description of the bug itself. Could you please clarify or reword in a way that makes it clear to users what the issue was.
Thanks a lot. Emma

Comment 14 dguo 2017-04-17 08:10:39 UTC
Just tested on rhvh-4.1-0.20170413.0 with the same step as in comment6, got a reasonable result:

[root@localhost ~]# lvs |wc -l
  WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
1016

[root@localhost ~]# time imgbase w
[INFO] You are on rhvh-4.1-0.20170413.0+1

real	0m3.107s
user	0m0.253s
sys	0m0.312s
[root@localhost ~]# rpm -q imgbased
imgbased-0.9.22-0.1.el7ev.noarch
[root@localhost ~]# time imgbase w
[INFO] You are on rhvh-4.1-0.20170413.0+1

real	0m3.088s
user	0m0.263s
sys	0m0.300s
[root@localhost ~]# time imgbase layout
rhvh-4.1-0.20170413.0
 +- rhvh-4.1-0.20170413.0+1

real	0m16.014s
user	0m0.649s
sys	0m1.434s

So after the bug's status is updated to ON_QA, QE will verified this bug.

Comment 16 dguo 2017-04-20 07:04:05 UTC
Verified base on comment 14

Comment 17 errata-xmlrpc 2017-04-20 19:04:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1114


Note You need to log in before you can comment on or make changes to this bug.