Bug 1432385
Summary: | 'imgbase layout' takes very long time to complete on rhv-h hypervisor with a large number of LUNs attached | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Igor Netkachev <inetkach> | |
Component: | imgbased | Assignee: | Ryan Barry <rbarry> | |
Status: | CLOSED ERRATA | QA Contact: | dguo | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.0.6 | CC: | cshao, dfediuck, dguo, eheftman, gklein, huzhao, inetkach, jiawu, mgoldboi, qiyuan, rbarry, rhodain, sbonazzo, weiwang, yaniwang, ycui, yzhao | |
Target Milestone: | ovirt-4.1.1-1 | Keywords: | Performance, ZStream | |
Target Release: | --- | Flags: | rbarry:
needinfo-
|
|
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | node | |||
Fixed In Version: | imgbased-0.9.19-0.1.el7ev | Doc Type: | Bug Fix | |
Doc Text: |
Previously, running imgbase layout took a very long time to complete on Red Hat Virtualization Hosts (RHVH) that had a very large number of LUNs or other LVM devices attached, due to the fact that it did not filter LVM commands with known tags. In this release, the LVM devices are filtered, which dramatically improves the performance for systems with a large number of LUNs.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1437505 (view as bug list) | Environment: | ||
Last Closed: | 2017-04-20 19:04:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1437505 |
Description
Igor Netkachev
2017-03-15 09:39:11 UTC
Can you please re-test with 4.0.7 which has been released yesterday? It should have improvements in this area Sandro, Customer has just re-tested with the latest RHV-H and the problem is still there: # time imgbase check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK real 0m16.509s user 0m8.052s sys 0m1.818s # time imgbase layout rhvh-4.0-0.20170104.0 +- rhvh-4.0-0.20170104.0+1 rhvh-4.0-0.20170201.0 +- rhvh-4.0-0.20170201.0+1 rhvh-4.0-0.20170307.0 +- rhvh-4.0-0.20170307.0+1 real 95m14.719s user 10m0.454s sys 24m36.242s -- Kind Regards, Igor Netkachev Technical Support Engineer Red Hat Global Support Services Imgbased should just be checking "lvs -oname, tags" and filtering. No scanning of lvm beyond tags is done, though the code may not be very optimized for large sets. What is the runtime when checking tags? Is there a test system I can look at? QE attempted to reproduce this issue in-house, the details steps are as follow, and we will keep the environment to Ryan by email for further investigation and debug Since we don't have an environment as complicated as the customer's, we figured out a way to simulate the customer's situation, which is to create 1000 lvs, and then run command "time imgbase layout". Test versions: redhat-virtualization-host-4.0-20170307.1 imgbased-0.8.16-0.1.el7ev.noarch Steps: 1. Install RHVH on the local disk of a machine which has 21 iSCSI luns. 2. Register to engine, and attach the 21 iSCSI luns in one storage domain. 3. Run a script to create 1000 lvs by "lvcreate -L 1500M 78c869fe-4447-4a49-9a8d-5a132c7cb6a1" 4. Run "time imgbase layout" Results: 1. After step4, the cmd failed with errors, took a long time: [root@bootp-73-75-11 ~]# time imgbase layout Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/imgbased/__main__.py", line 51, in <module> CliApplication() File "/usr/lib/python2.7/site-packages/imgbased/__init__.py", line 82, in CliApplication app.hooks.emit("post-arg-parse", args) File "/usr/lib/python2.7/site-packages/imgbased/hooks.py", line 120, in emit cb(self.context, *args) File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 172, in post_argparse print(layout.dumps()) File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 200, in dumps return self.app.imgbase.layout() File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 138, in layout return self.naming.layout() File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout tree = self.tree(lvs) File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 205, in tree names = datasource() File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 91, in list_our_lv_names our_lvs = [lv for lv in lvs if has_our_tag(lv)] File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 89, in has_our_tag for tag in our_tags) File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 89, in <genexpr> for tag in our_tags) File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 239, in tags self.lvm_name]).split(",") File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 331, in lvs return self.call(["lvs"] + args, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 385, in call return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 326, in call stdout = call(*args, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 120, in call return subprocess.check_output(*args, **kwargs).strip() File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['lvs', '--noheadings', '-olv_tags', u'78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol779']' returned non-zero exit status 5 real 81m44.297s user 2m56.766s sys 6m51.057s 2. Try to run "imgbase layout" again, this time it succeed without errors, and total time is shorter: [root@bootp-73-75-11 ~]# time imgbase layout rhvh-4.0-0.20170307.0 +- rhvh-4.0-0.20170307.0+1 real 13m43.746s user 1m27.120s sys 0m54.498s Additional info: 1. When running "time imgbase layout", cmd "lvs --noheadings -olv_tags vgname/lvname" will be called for each lv. [root@bootp-73-75-11 ~]# ps -ef |grep lvs root 4489 4346 0 14:51 pts/0 00:00:00 lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol111 root 4491 50102 0 14:51 pts/1 00:00:00 grep --color=auto lvs 2. The runtime of "lvs --noheadings -olv_tags vgname/lvname" while running "time imgbase layout" for the first time is: [root@bootp-73-75-11 ~]# time lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol139 WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). real 0m3.081s user 0m0.090s sys 0m0.393s 3. The runtime of "lvs --noheadings -olv_tags vgname/lvname" after "time imgbase layout" finished is: [root@bootp-73-75-11 ~]# time lvs --noheadings -olv_tags 78c869fe-4447-4a49-9a8d-5a132c7cb6a1/lvol139 WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). real 0m0.252s user 0m0.041s sys 0m0.026s We can see from the results that the time of "lvs --noheadings -olv_tags vgname/lvname" before "time imgbase layout" finished is much longer than the one after "time imgbase layout" finished, and this may be why the first time running "time imgbase layout" consumed much more time than the second time. All patches merged, moving to modified, pending build Hi Ryan I'm working on the doc text and it's not clear to me what the --select was used for in this context as I only see it mentioned in the doc text and not in the description of the bug itself. Could you please clarify or reword in a way that makes it clear to users what the issue was. Thanks a lot. Emma Just tested on rhvh-4.1-0.20170413.0 with the same step as in comment6, got a reasonable result: [root@localhost ~]# lvs |wc -l WARNING: Not using lvmetad because config setting use_lvmetad=0. WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache). 1016 [root@localhost ~]# time imgbase w [INFO] You are on rhvh-4.1-0.20170413.0+1 real 0m3.107s user 0m0.253s sys 0m0.312s [root@localhost ~]# rpm -q imgbased imgbased-0.9.22-0.1.el7ev.noarch [root@localhost ~]# time imgbase w [INFO] You are on rhvh-4.1-0.20170413.0+1 real 0m3.088s user 0m0.263s sys 0m0.300s [root@localhost ~]# time imgbase layout rhvh-4.1-0.20170413.0 +- rhvh-4.1-0.20170413.0+1 real 0m16.014s user 0m0.649s sys 0m1.434s So after the bug's status is updated to ON_QA, QE will verified this bug. Verified base on comment 14 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1114 |