Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1396050 - fence_vmware_soap makes high CPU usage
fence_vmware_soap makes high CPU usage
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fence-agents (Show other bugs)
7.2
x86_64 Linux
unspecified Severity urgent
: rc
: ---
Assigned To: Oyvind Albrigtsen
Miroslav Lisik
:
Depends On:
Blocks: 1420851
  Show dependency treegraph
 
Reported: 2016-11-17 05:59 EST by Runming Long
Modified: 2018-05-10 17:39 EDT (History)
9 users (show)

See Also:
Fixed In Version: fence-agents-4.0.11-82.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-10 08:13:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3158731 None None None 2017-08-21 15:07 EDT
Red Hat Knowledge Base (Article) 3409381 None None None 2018-04-12 09:40 EDT
Red Hat Product Errata RHBA-2018:0758 None None None 2018-04-10 08:15 EDT

  None (edit)
Description Runming Long 2016-11-17 05:59:33 EST
Description of problem:

Please help to analyze fence_vmware_so makes high CPU usage is normal? why ? It's bug? 

~~~
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                            

55694 root      20   0  386244 229636   5360 R  96.4  1.4   0:15.80 fence_vmware_so                                                                                                                    

55701 root      20   0  382160 225400   5360 R  96.4  1.4   0:15.23 fence_vmware_so  

The CPU usage is too high, but it's just for 20s,interval of one minute.

I'm in my experiment environment was tested, the result was the same. 

I update the fence-agents-vmware-soap from 4.0.11-27.el7.x86_64 to fence-agents-vmware-soap-4.0.11-47.el7.x86_64, the result was the same.

Version-Release number of selected component (if applicable):

Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
pacemaker-1.1.13-10.el7.x86_64 
corosync-2.3.4-7.el7.x86_64
pcs-0.9.143-15.el7.x86_64 

How reproducible:

It's always show when it running.

Steps to Reproduce:
1.top

Actual results:

The fence_vmware_so makes high CPU usage.

Expected results:


Additional info:
Comment 2 Marek Grac 2016-11-18 03:17:55 EST
There are not too many of loops in fence_vmware_soap, so this should not happend and it looks like a bug.

Can you send me a verbose output? (verbose=1 or -v on command line)

Does your VMWare hosts a lot of virtual machines?
Comment 3 Runming Long 2016-11-18 05:17:33 EST
Thank you for your attention. I had a experiment environment a few days ago, but now I haven't. 

It's easy to reappear. It's just for 20s,interval of one minute, you can see the CPU usage is too high during this time. 

It's a long page about verbose output, I kept it in my experiment environment, but it was gone, so I cann't send you this log.

I will rebuild a new experiment environment if I can found a vmware platform. 

(In reply to Marek Grac from comment #2)
> There are not too many of loops in fence_vmware_soap, so this should not
> happend and it looks like a bug.
> 
> Can you send me a verbose output? (verbose=1 or -v on command line)
> 
> Does your VMWare hosts a lot of virtual machines?
Comment 4 Marek Grac 2016-11-18 05:21:17 EST
If it is done every one minute, I will suspect that is monitoring action which lists all available VM. It can take a while on VMWare side but without logs I can't tell if we are doing something wrong.
Comment 5 Runming Long 2016-11-21 01:00:24 EST
Hi Marek

>>I will suspect that is monitoring action which lists all available VM

Exactly, in my test environment, we has only several VMs, the fence monitor configuration like following, the monitor check fence status every 60s by default.
~~~
...
<op id="vmware_fence1-monitor-interval-60s" interval="60s" name="monitor"/>
...
<op id="vmware_fence2-monitor-interval-60s" interval="60s" name="monitor"/>
...
~~~

and we monitor the fence status by manual, here is the command

~~~
# fence_vmware_soap -o status
~~~
by executing above command, the vm will spent a lot of cpu resources ( we allocate 2 cores and one socket to this vm as same as customer's environment, and we can observe this phenomena by using "top", when executing "fence_vmware_soap -o status" , the cpu utility rise to 80%~100% immediately, each process continue about 20s) 

Even worse, our test environment have been broken, so we can't re-produce this issue. Did you have a VCenter environment to test this issue ?
Comment 6 Marek Grac 2016-11-23 08:11:55 EST
Hi,

I have tested this issue on our vCenter 5.5 and I can confirm it. 

I was able to track the issue down to a specific line that opens the connection and login user into VMWare. 

conn = Client(url + "/vimService.wsdl", location=url, transport=RequestsTransport(verify=verify), headers=headers)

This line took the majority of execution time. The bug might be in package python-suds that we are using but I'm not sure about it. When fence agent is executed with verbose flag (-v / verbose=1) communication log has more than 80MB what might be issue on VMWare side.
Comment 17 John Beranek 2017-09-13 09:18:21 EDT
We're seeing this in our VMware-hosted Pacemaker clusters too, and the CPU usage is seriously high.

In our case the fence agent is contacting a vCenter which hosts 892 VMs.

It takes around 13 seconds to run "stonith_admin -Q vmware_fence".
Comment 18 John Beranek 2017-09-13 09:27:49 EDT
I don't quite see why the fence agent would need to list all VMs, and not just the VMs in the cluster...
Comment 19 Marek Grac 2017-09-13 09:54:00 EDT
@John:

AFAIK the problem is in the login process which takes a really long time (and 60MB of data from VMWare). If you have worked with their API (in any language) and you have some tips how to improve it, I would be glad to implement it.
Comment 20 John Beranek 2017-09-13 13:50:43 EDT
@Marek - I may take you up on that, as I've used the VMware API, mostly using vmware's Perl modules, and not seen this shirt off slowdown/CPU hit.
Comment 21 John Beranek 2017-09-13 13:52:13 EDT
Hmm, using BZ with phone autocorrect not advisable... "sort of" instead of "shirt off"!
Comment 22 John Beranek 2017-09-13 15:08:31 EDT
An optional use of vCenter 6.5's REST API would be nice and much simpler, but I guess that would be fence-agents-vmware-vcenter-rest, and not fence-agents-vmware-soap.

Just trying it here, I can:

* Get a session token in about half a second
* Find a VM by name in under a second
* Fetch full details of a VM in under a second.

As for the SOAP API, with an old Perl script of mine, I can fetch datastore utilisation for all datastores a datacenter in under a second.
Comment 23 John Beranek 2017-09-13 19:44:46 EDT
For something more comparable, sample pyvmomi scripts can do useful work in under a second too. pyvmomi appears to use its own SOAP implementation and not suds though.

With a bit of debug in fence_vmware_soap I determined that it is indeed fetching all VMs with the VMware API, and then attempting to match the fence "plug".

In our environment we are "lucky" in that of the >800 VMs in the vCenter, the user used to query vCenter only has access to 36 VMs, but still running "status" takes 10s.

If I supply a username that can read all VMs, the time only goes up to 12s, hence proving it's at least not the VM power status fetching that's slow, but something more in the connection/login phase, as you said.
Comment 24 Oyvind Albrigtsen 2017-10-20 10:20:46 EDT
https://github.com/ClusterLabs/fence-agents/pull/153
Comment 25 John Beranek 2017-10-21 04:57:55 EDT
Oyvind: Oh, thank you for the new agent!

We shall have to give it a go, once I figure out how to add a new fence agent.
Comment 26 John Beranek 2017-10-23 05:54:55 EDT
OK, first feedback on fence_vmware_rest.py ... first I tried it on a CentOS 6 machine, and it didn't work for me, at least when run the way I can run fence_vmware_soap. This is presumably because it relies upon something new in the common fence agent library:

./fence_vmware_rest.py --ssl --ssl-insecure -a vcenter.example.com -l vcenter_user -p password -o status -n vm-name
Traceback (most recent call last):
  File "./fence_vmware_rest.py", line 183, in <module>
    main()
  File "./fence_vmware_rest.py", line 175, in main
    conn = connect(options)
  File "./fence_vmware_rest.py", line 78, in connect
    logging.debug("Failed: {}".format(e))
ValueError: zero length field name in format

I then tried it on a CentOS 7 machine and it works:

time ./fence_vmware_rest.py --ssl --ssl-insecure -a vcenter.example.com -l vcenter_user -p password -o status -n vm-name
0.14s user 0.12s system 10% cpu 2.502 total

So, a decent time, and not a lot of system time or CPU...a massive improvement over fence_vmware_soap!
Comment 27 Oyvind Albrigtsen 2017-10-23 06:11:28 EDT
Great.

This bz is for RHEL7, so I didnt test it on RHEL6.
Comment 33 errata-xmlrpc 2018-04-10 08:13:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0758

Note You need to log in before you can comment on or make changes to this bug.