This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 600141 - network interface functions cause very high CPU utilization in libvirtd
network interface functions cause very high CPU utilization in libvirtd
Product: Fedora
Classification: Fedora
Component: augeas (Show other bugs)
All Linux
low Severity medium
: ---
: ---
Assigned To: David Lutterkort
Fedora Extras Quality Assurance
Depends On:
Blocks: 609228
  Show dependency treegraph
Reported: 2010-06-03 22:15 EDT by Zachary Amsden
Modified: 2013-04-30 19:41 EDT (History)
13 users (show)

See Also:
Fixed In Version: augeas-0.7.2-3.fc13
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 609214 609228 (view as bug list)
Last Closed: 2010-07-22 22:30:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:

Attachments (Terms of Use)

  None (edit)
Description Zachary Amsden 2010-06-03 22:15:32 EDT
Description of problem:

top - 16:07:01 up 11 days,  4:56,  6 users,  load average: 0.14, 0.11, 0.05
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
Cpu(s): 11.7%us,  2.9%sy,  0.0%ni, 83.1%id,  2.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1982808k total,  1942568k used,    40240k free,    98920k buffers
Swap:  3964920k total,    69992k used,  3894928k free,   621348k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
20194 root      20   0  608m  41m 3512 S 13.3  2.1   0:06.98 libvirtd 

Version-Release number of selected component (if applicable):

I've seen this with all versions of libvirt, RHEL5,6 and Fedora on different machines and CPUs.  Filing upstream because that is where it should be fixed

How reproducible: 100%

Steps to Reproduce:
1. Start virt-manager
2. run top
3. ... profit?
Actual results:

Libvirtd sits around consuming 13-20% of CPU even when completely idle and no VMs are running.  Modern software shouldn't be doing whatever kind of continuous polling it is using to rack up such CPU time.  If I leave the system alone, overnight, libvirtd will rack up more CPU than X, firefox, and all other idle processes combined.

Expected results:

Additional info:
Comment 1 Cole Robinson 2010-06-09 11:24:27 EDT
It's because virt-manager is running. virt-manager polls libvirt to detect VM/network/storage state change, this polling causes lots of CPU churn. Libvirt supports async lifecycle events for domains, but virt-manager doesn't use those API yet. The other objects (network, storage, interface, host devices) don't have async API.

So the solution is:

1) Add async lifecycle API for all libvirt objects
2) Have virt-manager actually use those API
Comment 2 Daniel Berrange 2010-06-25 07:23:05 EDT
libvirtd shows about 8% CPU usage when virt-manager is running on my system. Evidence is better than guesswork, so I ran oprofile against libvirtd.

Augeas comes out on top by a country mile, followed by loads of malloc/memcpy stuff which I bet is all due to augeas too:

samples  %        image name               symbol name
31677    41.0255      /usr/lib64/
14469    18.7391           wcscoll_l
5401      6.9949           wcscmp
4913      6.3629           _int_free
4020      5.2064           _int_malloc
3780      4.8955           memcpy
2486      3.2197           malloc
1676      2.1706           malloc_consolidate
1446      1.8727           memset
1359      1.7601           wcscoll
1299      1.6824           realloc
1097      1.4207           calloc
714       0.9247           free
517       0.6696           _int_realloc
278       0.3600           __GI___strncmp_ssse3
253       0.3277           __strstr_sse2
174       0.2254           __strlen_sse2
116       0.1502           vfprintf
111       0.1438           strnlen
106       0.1373     pthread_mutex_lock
95        0.1230  libvirtd                 virLogMessage
73        0.0945     pthread_mutex_unlock
66        0.0855           __strchr_sse2
64        0.0829           __GI___strcmp_ssse3
61        0.0790           __ctype_b_loc
45        0.0583           btowc
39        0.0505           _IO_default_xsputn
35        0.0453           strcat
30        0.0389           strndup
30        0.0389  libvirtd                 virEventRunOnce

Editing virt-manager code to disable the 'update_interfaces' method in and re-run the oprofile test, libvirtd now consumes < 1% CPU and oprofile shows a completely different trace:

samples  %        image name               symbol name
1519     42.7165           memset
346       9.7300           __strstr_sse2
214       6.0180     pthread_mutex_lock
136       3.8245           __strchr_sse2
130       3.6558  libvirtd                 virLogMessage
117       3.2902     pthread_mutex_unlock
77        2.1654           vfprintf
57        1.6029           _int_malloc
53        1.4904  libvirtd                 virEventRunOnce
45        1.2655           calloc
40        1.1249           _int_free
25        0.7030           malloc_consolidate
21        0.5906           _IO_default_xsputn
20        0.5624           __strlen_sse2
20        0.5624           poll
20        0.5624  libvirtd                 nodeListDevices
19        0.5343     pthread_cond_wait@@GLIBC_2.3.2
19        0.5343         virHashComputeKey
19        0.5343  libvirtd                 qemudDispatchClientEvent
18        0.5062  libvirtd                 remoteDispatchClientCall

So the problem appears to be that augeas (and netcf by implication) has exceedingly high CPU utilization :-(  Need advice from David on how to proceed here. Maybe augeas simply hasn't had any performance optimization work done on it yet ? Otherwise maybe we need to change netcf to not use augeas+xslt for converting the ifcfg files into XML format.
Comment 3 Daniel Berrange 2010-06-25 07:59:18 EDT
A more targetted profile this time from just running ncftool directly

samples  %        image name               symbol name
85058    20.7509           wcscoll_l
29644     7.2320           wcscmp
26191     6.3896           _int_malloc
25622     6.2508      parse_expression
18443     4.4994         /usr/lib64/
16999     4.1471           memcpy
13354     3.2579           _int_free
9402      2.2937      set_regs
9019      2.2003           wcscoll
8788      2.1439           malloc_consolidate
8457      2.0632      re_node_set_add_intersect
8062      1.9668           re_as_string
7832      1.9107           cset_contains
7343      1.7914      re_node_set_contains
6031      1.4713      re_acquire_state
5113      1.2474      sift_states_backward
4680      1.1417      re_node_set_insert_last
4680      1.1417      re_search_internal
4587      1.1191      re_node_set_insert
4451      1.0859           calloc
4315      1.0527           free
4189      1.0220           malloc
Comment 4 Cole Robinson 2010-06-25 10:08:25 EDT
I think every interface 'list' operation causes netcf to parse /etc/sysconfig/network-scripts/*, which would explain things.
Comment 5 Zachary Amsden 2010-06-25 17:07:26 EDT
I'm more than happy to run some profiling here as well.  On my particular system (F13 with encrypted disk on dual core Turion64) it's enough to make the system unresponsive and barely usable with a 2-VCPU VM.

If network is implicated, especially network scripts, well mine are pretty complex, I'm running split DNS, wired / wireless bonding, and a vpnc tunnel.  It's possible one of those things exaggerates the cost of this quite a bit.
Comment 6 David Lutterkort 2010-06-28 20:30:48 EDT
Cole is right: every netcf operation causes the network scripts to be parsed again. There's not enough change tracking to guard against other programs modifying those files in between netcf operations.

There's a few ways to address that, varying in complexity and hackishness:

(1) Instead of reparsing the files on every netcf operation, only reparse after some small amount of time
(2) Within netcf, watch pertinent files with inotify and reread them only upon actual changes (or complain if both netcf and some outside program try to make changes)
(3) Do the same within augeas

Of course, another option would be to reduce how often virt-manager does an ncf_list.
Comment 7 David Lutterkort 2010-06-29 21:36:31 EDT
This can be alleviated by teaching augeas to be smarter about when to reparse a file and when not.

I have posted patches upstream on augeas-devel (review much welcome) and built augeas-0.7.2-2 based on these. You need to get the packages out of koji for now from

It would be great if somebody could independently verify that these patches address the virt-manager issues (and do not introduce any regressions in libvirt's network config handling)
Comment 8 Laine Stump 2010-06-30 05:04:46 EDT
On my F13 system with virt-manager running, prior to this fix CPU usage by libvirtd was over 11%. With the new augeas installed, that drops to 2.3%, so there is definitely a huge difference!

Functional testing will take a bit more time ;-)
Comment 9 Fedora Update System 2010-06-30 18:26:47 EDT
augeas-0.7.2-2.fc13 has been submitted as an update for Fedora 13.
Comment 10 Fedora Update System 2010-07-01 14:46:45 EDT
augeas-0.7.2-2.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update augeas'.  You can provide feedback for this update here:
Comment 11 Fedora Update System 2010-07-02 04:58:17 EDT
augeas-0.7.2-3.fc13 has been submitted as an update for Fedora 13.
Comment 12 Fedora Update System 2010-07-05 18:06:36 EDT
augeas-0.7.2-3.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update augeas'.  You can provide feedback for this update here:
Comment 13 Aaron Faanes 2010-07-10 23:18:28 EDT
Using augeas-0.7.2-3.fc13 showed no excessive CPU load for me.
Comment 14 Fedora Update System 2010-07-22 22:29:52 EDT
augeas-0.7.2-3.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.