Bug 1155679 - virt-who performance issue for large ESX installation
Summary: virt-who performance issue for large ESX installation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: virt-who
Version: 6.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Radek Novacek
QA Contact: gaoshang
URL: http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks: 1075802 1154684 1168221 1197596
TreeView+ depends on / blocked
 
Reported: 2014-10-22 15:32 UTC by Thom Carlin
Modified: 2019-07-11 08:17 UTC (History)
18 users (show)

Fixed In Version: virt-who-0.12-4.el6
Doc Type: Bug Fix
Doc Text:
Previously, the virt-who agent was too slow when reading the association between hosts and guests from VMWare ESX systems. As a consequence, when communicating with large ESX (or vCenter) deployments, it took a lot of time to send updates about virtual guests to the Subscription Asset Manager (SAM) and Red Hat Satellite. With this update, virt-who uses an improved method to obtain host-guest association, which accelerates the aforementioned process.
Clone Of:
: 1197596 (view as bug list)
Environment:
Last Closed: 2015-07-06 08:42:35 UTC


Attachments (Terms of Use)
virt-who.0-12-esx-rhsm.log (223.79 KB, text/plain)
2015-04-02 16:36 UTC, William
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1377 normal SHIPPED_LIVE virt-who bug fix and enhancement update 2015-07-20 17:58:35 UTC

Description Thom Carlin 2014-10-22 15:32:41 UTC
Description of problem:

Customer has around 100 ESX hypervisors in 1 vCenter.  Running virt-who --one-shot takes approximately one hour to run.


Version-Release number of selected component (if applicable):

6.6

How reproducible:

Everytime

Steps to Reproduce:
1. virt-who --debug --one-shot --esx [and esx parameters]
2.
3.

Actual results:

1 hour passes

Expected results:

Under a minute to run

Additional info:

virt-who is needed for Sat6 Unlimited Guest licensing

Comment 2 Tom McKay 2014-11-05 12:57:11 UTC
Created redmine issue http://projects.theforeman.org/issues/8279 from this bug

Comment 10 Radek Novacek 2015-02-26 09:46:31 UTC
This bug is addressed by upstream and will be fixed by rebase in RHEL-6.7.

Comment 11 Radek Novacek 2015-02-27 19:35:32 UTC
Fixed by rebase to virt-who-0.12-1.el6.

Comment 14 xingge 2015-03-10 09:52:01 UTC
Hi,
  we want to verify this bug but we don't have the environment to do the performance testing, we only have two ESX machines. we did a lot of research about how to do performance testing with two machines but we failed, so is there someone else can help us to do the performance testing?

Comment 15 Jan Kurik 2015-03-10 10:26:58 UTC
Performance testing is typically done by Performance Engineering team https://home.corp.redhat.com/wiki/performance-engineering-rhel . I am not sure whether they can do the sort of testing you need, however we can at least ask Thomas.

@Thomas: may I ask you please to help xingge with the testing, if possible ?

Comment 16 Tom Tracy 2015-03-10 12:25:06 UTC
Jan
     We do not have the infrastructure to test an environment like this. I do not know who has esx machines capable of doing this testing. 

Tom

Comment 17 William 2015-04-02 16:36:36 UTC
Created attachment 1010250 [details]
virt-who.0-12-esx-rhsm.log

Tested in customer environment. Getting a large amount of noise in the logs now.

Comment 18 Radek Novacek 2015-04-07 05:55:09 UTC
Setting VIRTWHO_DEBUG=0 in /etc/sysconfig/virt-who should reduce the noise. If not, I can remove some of the logging messages.

Comment 19 Radek Novacek 2015-04-07 08:06:56 UTC
I've investigated the error messages more deeply and it looks like it's caused by the fact that ESXi/vCenter splits the guest list to more parts when it's big enough and virt-who only reads the first part and therefore it produces incomplete host/guest association. It probably fixes itself during next iteration, but it should still be corrected.

I'll make updated version for testing. Moving the bug to ASSIGNED for now.

Comment 20 Radek Novacek 2015-04-07 10:22:06 UTC
Fixed in virt-who-0.12-4.el6.

Please try following build, it should fix the "noise" in the logs:

https://brewweb.devel.redhat.com/taskinfo?taskID=8949195

Comment 22 Liushihui 2015-06-12 02:40:57 UTC
Hi William,

Would you mind to help us to verify it again on customer environment since we still don't have the environment to do the performance testing? Thanks.

Liushihui

Comment 25 xingge 2015-06-25 11:52:02 UTC
As we do not have the environment to test this bug so I'll close this bug. If the bug shows again in the customer's environment please reopen this bug .

Comment 26 xingge 2015-06-25 13:08:46 UTC
Thom Carlin reached out to the RHC project managers so they can contact the 2 customers to verify the bug, so reopen it and wait for the results

Comment 27 xingge 2015-07-01 07:54:40 UTC
Hi Thom,
  Since this bug is listed in the virt-who errata [1] which is needed to push out asap along with the rhel6.7 release, and for now, in the errata [1] this bug is the only bug which is not verified, so could you please help coordinate to get this bug verified at your earliest convenience so that we can push out the errata timely? Thanks!

Comment 28 Liushihui 2015-07-02 04:31:51 UTC
Summary:
As we haven't the env with many hosts/guests, we only check it with three hosts and one guests.we just verify the response time which virt-who get data from vcenter. the response time in the new version virt-who-0.12-10.el6.noarch is much less than the old version virt-who-0.10-8.el6.noarch.
About the problem of redundant noise log still need Thom Carlin's help to verify it. 

In the older version of virt-who-0.10-8.el6.noarch
The time of virt-who get data from vcenter is about 8s, please see the log as the following:
2015-07-02 11:15:22,370 [INFO]  @virtwho.py:442 - Using virt-who configuration: virt-who
2015-07-02 11:15:22,370 [DEBUG]  @virtwho.py:170 - Starting infinite loop with 3600 seconds interval
2015-07-02 11:15:22,519 [INFO]  @esx.py:134 - start scan
2015-07-02 11:15:27,898 [INFO]  @esx.py:164 - start get comput resource
2015-07-02 11:15:28,149 [INFO]  @esx.py:196 - start get host
2015-07-02 11:15:28,378 [INFO]  @esx.py:218 - start get vm
2015-07-02 11:15:30,875 [INFO]  @esx.py:325 - end scan
2015-07-02 11:15:30,875 [INFO]  @esx.py:327 - ==== time 8 
========================the response time is 8s============================
2015-07-02 11:15:30,913 [INFO]  @subscriptionmanager.py:119 - Sending update in hosts-to-guests mapping: {564d4b40-300e-98fd-fc79-5ae3c1547e2d: [], aee4ff00-8c33-11e2-994a-6c3be51d959a: [564dab7d-3b72-51a1-eeda-586036106892, 4227d611-0abc-fc5e-7538-07ebd83fa9ba, 421aa84b-a49e-e01c-fa72-0b570372dd9d, 564daa52-c518-b2f0-3c05-a343285910e1]}

In the new version of virt-who-0.12-10.el6.noarch:
The time of virt-who get data from vcenter is less than 1s, please see the log as the following:
2015-07-02 12:17:41,300 [INFO]  @virtwho.py:563 - Using configuration "env/cmdline" ("esx" mode)
2015-07-02 12:17:41,324 [DEBUG]  @virtwho.py:151 - Starting infinite loop with 3600 seconds interval
2015-07-02 12:17:41,463 [DEBUG]  @esx.py:56 - Log into ESX
2015-07-02 12:17:42,554 [DEBUG]  @esx.py:59 - Creating ESX event filter
2015-07-02 12:17:42,903 [INFO]  @esx.py:186 - end scan
2015-07-02 12:17:42,912 [INFO]  @esx.py:138 - ==== time 0 
========================the response time is 0s============================
2015-07-02 12:17:42,912 [DEBUG]  @esx.py:142 - Waiting for ESX changes
2015-07-02 12:17:42,923 [INFO]  @subscriptionmanager.py:124 - Sending upda
te in hosts-to-guests mapping: {564d4b40-300e-98fd-fc79-5ae3c1547e2d: [], aee4ff00-8c33-11e2-994a-6c3be51d959a: [564dab7d-3b72-51a1-eeda-586036106892, 4227d611-0abc-fc5e-7538-07ebd83fa9ba, 421aa84b-a49e-e01c-fa72-0b570372dd9d, 564daa52-c518-b2f0-3c05-a343285910e1], 564d9e7a-4128-92b6-7284-6335f6b399be: []}

Comment 29 Eko 2015-07-06 08:42:35 UTC
Because we need to push Errata according to the schedule, change this bug to CLOSED status, and we will test the performance issue on RHEL7.2.

Comment 30 Eko 2015-07-06 09:25:46 UTC
We don't have the environment to test the performance issue, but we do some analysis according to the source code, there are some enhancement for this issue according to the source code. closed this bug and continue to verify this issue on rhel7.2.


Note You need to log in before you can comment on or make changes to this bug.