Description of problem: top - 16:07:01 up 11 days, 4:56, 6 users, load average: 0.14, 0.11, 0.05 Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie Cpu(s): 11.7%us, 2.9%sy, 0.0%ni, 83.1%id, 2.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1982808k total, 1942568k used, 40240k free, 98920k buffers Swap: 3964920k total, 69992k used, 3894928k free, 621348k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20194 root 20 0 608m 41m 3512 S 13.3 2.1 0:06.98 libvirtd Version-Release number of selected component (if applicable): I've seen this with all versions of libvirt, RHEL5,6 and Fedora on different machines and CPUs. Filing upstream because that is where it should be fixed How reproducible: 100% Steps to Reproduce: 1. Start virt-manager 2. run top 3. ... profit? Actual results: Libvirtd sits around consuming 13-20% of CPU even when completely idle and no VMs are running. Modern software shouldn't be doing whatever kind of continuous polling it is using to rack up such CPU time. If I leave the system alone, overnight, libvirtd will rack up more CPU than X, firefox, and all other idle processes combined. Expected results: Additional info:
It's because virt-manager is running. virt-manager polls libvirt to detect VM/network/storage state change, this polling causes lots of CPU churn. Libvirt supports async lifecycle events for domains, but virt-manager doesn't use those API yet. The other objects (network, storage, interface, host devices) don't have async API. So the solution is: 1) Add async lifecycle API for all libvirt objects 2) Have virt-manager actually use those API
libvirtd shows about 8% CPU usage when virt-manager is running on my system. Evidence is better than guesswork, so I ran oprofile against libvirtd. Augeas comes out on top by a country mile, followed by loads of malloc/memcpy stuff which I bet is all due to augeas too: samples % image name symbol name 31677 41.0255 libaugeas.so.0.10.2 /usr/lib64/libaugeas.so.0.10.2 14469 18.7391 libc-2.11.2.so wcscoll_l 5401 6.9949 libc-2.11.2.so wcscmp 4913 6.3629 libc-2.11.2.so _int_free 4020 5.2064 libc-2.11.2.so _int_malloc 3780 4.8955 libc-2.11.2.so memcpy 2486 3.2197 libc-2.11.2.so malloc 1676 2.1706 libc-2.11.2.so malloc_consolidate 1446 1.8727 libc-2.11.2.so memset 1359 1.7601 libc-2.11.2.so wcscoll 1299 1.6824 libc-2.11.2.so realloc 1097 1.4207 libc-2.11.2.so calloc 714 0.9247 libc-2.11.2.so free 517 0.6696 libc-2.11.2.so _int_realloc 278 0.3600 libc-2.11.2.so __GI___strncmp_ssse3 253 0.3277 libc-2.11.2.so __strstr_sse2 174 0.2254 libc-2.11.2.so __strlen_sse2 116 0.1502 libc-2.11.2.so vfprintf 111 0.1438 libc-2.11.2.so strnlen 106 0.1373 libpthread-2.11.2.so pthread_mutex_lock 95 0.1230 libvirtd virLogMessage 73 0.0945 libpthread-2.11.2.so pthread_mutex_unlock 66 0.0855 libc-2.11.2.so __strchr_sse2 64 0.0829 libc-2.11.2.so __GI___strcmp_ssse3 61 0.0790 libc-2.11.2.so __ctype_b_loc 45 0.0583 libc-2.11.2.so btowc 39 0.0505 libc-2.11.2.so _IO_default_xsputn 35 0.0453 libc-2.11.2.so strcat 30 0.0389 libc-2.11.2.so strndup 30 0.0389 libvirtd virEventRunOnce Editing virt-manager code to disable the 'update_interfaces' method in connection.py and re-run the oprofile test, libvirtd now consumes < 1% CPU and oprofile shows a completely different trace: samples % image name symbol name 1519 42.7165 libc-2.11.2.so memset 346 9.7300 libc-2.11.2.so __strstr_sse2 214 6.0180 libpthread-2.11.2.so pthread_mutex_lock 136 3.8245 libc-2.11.2.so __strchr_sse2 130 3.6558 libvirtd virLogMessage 117 3.2902 libpthread-2.11.2.so pthread_mutex_unlock 77 2.1654 libc-2.11.2.so vfprintf 57 1.6029 libc-2.11.2.so _int_malloc 53 1.4904 libvirtd virEventRunOnce 45 1.2655 libc-2.11.2.so calloc 40 1.1249 libc-2.11.2.so _int_free 25 0.7030 libc-2.11.2.so malloc_consolidate 21 0.5906 libc-2.11.2.so _IO_default_xsputn 20 0.5624 libc-2.11.2.so __strlen_sse2 20 0.5624 libc-2.11.2.so poll 20 0.5624 libvirtd nodeListDevices 19 0.5343 libpthread-2.11.2.so pthread_cond_wait@@GLIBC_2.3.2 19 0.5343 libvirt.so.0.8.1 virHashComputeKey 19 0.5343 libvirtd qemudDispatchClientEvent 18 0.5062 libvirtd remoteDispatchClientCall So the problem appears to be that augeas (and netcf by implication) has exceedingly high CPU utilization :-( Need advice from David on how to proceed here. Maybe augeas simply hasn't had any performance optimization work done on it yet ? Otherwise maybe we need to change netcf to not use augeas+xslt for converting the ifcfg files into XML format.
A more targetted profile this time from just running ncftool directly samples % image name symbol name 85058 20.7509 libc-2.11.2.so wcscoll_l 29644 7.2320 libc-2.11.2.so wcscmp 26191 6.3896 libc-2.11.2.so _int_malloc 25622 6.2508 libaugeas.so.0.11.0 parse_expression 18443 4.4994 libxml2.so.2.7.6 /usr/lib64/libxml2.so.2.7.6 16999 4.1471 libc-2.11.2.so memcpy 13354 3.2579 libc-2.11.2.so _int_free 9402 2.2937 libaugeas.so.0.11.0 set_regs 9019 2.2003 libc-2.11.2.so wcscoll 8788 2.1439 libc-2.11.2.so malloc_consolidate 8457 2.0632 libaugeas.so.0.11.0 re_node_set_add_intersect 8062 1.9668 libfa.so.1.3.1 re_as_string 7832 1.9107 libfa.so.1.3.1 cset_contains 7343 1.7914 libaugeas.so.0.11.0 re_node_set_contains 6031 1.4713 libaugeas.so.0.11.0 re_acquire_state 5113 1.2474 libaugeas.so.0.11.0 sift_states_backward 4680 1.1417 libaugeas.so.0.11.0 re_node_set_insert_last 4680 1.1417 libaugeas.so.0.11.0 re_search_internal 4587 1.1191 libaugeas.so.0.11.0 re_node_set_insert 4451 1.0859 libc-2.11.2.so calloc 4315 1.0527 libc-2.11.2.so free 4189 1.0220 libc-2.11.2.so malloc
I think every interface 'list' operation causes netcf to parse /etc/sysconfig/network-scripts/*, which would explain things.
I'm more than happy to run some profiling here as well. On my particular system (F13 with encrypted disk on dual core Turion64) it's enough to make the system unresponsive and barely usable with a 2-VCPU VM. If network is implicated, especially network scripts, well mine are pretty complex, I'm running split DNS, wired / wireless bonding, and a vpnc tunnel. It's possible one of those things exaggerates the cost of this quite a bit.
Cole is right: every netcf operation causes the network scripts to be parsed again. There's not enough change tracking to guard against other programs modifying those files in between netcf operations. There's a few ways to address that, varying in complexity and hackishness: (1) Instead of reparsing the files on every netcf operation, only reparse after some small amount of time (2) Within netcf, watch pertinent files with inotify and reread them only upon actual changes (or complain if both netcf and some outside program try to make changes) (3) Do the same within augeas Of course, another option would be to reduce how often virt-manager does an ncf_list.
This can be alleviated by teaching augeas to be smarter about when to reparse a file and when not. I have posted patches upstream on augeas-devel (review much welcome) and built augeas-0.7.2-2 based on these. You need to get the packages out of koji for now from http://koji.fedoraproject.org/koji/packageinfo?packageID=6131 It would be great if somebody could independently verify that these patches address the virt-manager issues (and do not introduce any regressions in libvirt's network config handling)
On my F13 system with virt-manager running, prior to this fix CPU usage by libvirtd was over 11%. With the new augeas installed, that drops to 2.3%, so there is definitely a huge difference! Functional testing will take a bit more time ;-)
augeas-0.7.2-2.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/augeas-0.7.2-2.fc13
augeas-0.7.2-2.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update augeas'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/augeas-0.7.2-2.fc13
augeas-0.7.2-3.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/augeas-0.7.2-3.fc13
augeas-0.7.2-3.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update augeas'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/augeas-0.7.2-3.fc13
Using augeas-0.7.2-3.fc13 showed no excessive CPU load for me.
augeas-0.7.2-3.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.