Red Hat Bugzilla – Bug 1035966
Start autostarted virtual networks in background
Last modified: 2015-03-05 02:28:41 EST
Description of problem: Existing a large number of autostarted virtual networks, libvirtd should rapidly respond to customer input after its daemon was restarted Version-Release number of selected component (if applicable): libvirt-1.1.1-12.el7.x86_64 qemu-kvm-1.5.3-19.el7.x86_64 kernel-3.10.0-48.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Enabled below limits. [root@ibm-x3650m3-03 ~]# cat /etc/libvirt/libvirtd.conf | grep ^[^#] max_clients = 204800 max_queued_clients = 1000000 max_workers = 204800 max_requests = 204800 max_client_requests = 204800 log_level = 1 log_outputs="1:file:/var/log/libvirt/libvirtd.log" log_filters="1:qemu 1:libvirt 1:conf 1:json" [root@ibm-x3850x5-01 ~]# cat /usr/lib/systemd/system/libvirtd.service| grep LimitNOFILE LimitNOFILE=204800 [root@ibm-x3850x5-01 ~]# 2. Defined more than 256 autostarted virtual networks [root@ibm-x3650m3-03 ~]# virsh net-list | grep active | wc -l 256 [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes network1 active yes yes network10 active yes yes ... 3. Restart libvirtd [root@ibm-x3650m3-03 ~]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@ibm-x3650m3-03 ~]# service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Fri 2013-11-22 12:59:02 CST; 11min ago Main PID: 15579 (libvirtd) CGroup: /system.slice/libvirtd.service ├─ 2737 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf ├─15579 /usr/sbin/libvirtd ├─27413 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network1.conf ├─27468 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network2.conf ├─27523 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network3.conf ├─27580 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network4.conf ├─27635 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network5.conf ├─27692 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network6.conf ... 4. Execute virsh cmd, and check the time consumption of cmd. [root@ibm-x3650m3-03 ~]# time virsh list --all Id Name State ---------------------------------------------------- 5 r7 running - r6 shut off real 19m5.009s user 0m0.010s sys 0m0.011s [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# virsh net-list --all|grep active |wc -l 256 5. Running below commands during running step 4. [root@ibm-x3650m3-03 ~]# ltrace -p `pidof libvirtd` --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- ^C[root@ibm-x3650m3-03 ~]ltrace virsh list --all __libc_start_main(0x7f6a068ad2b0, 3, 0x7fff2225c118, 0x7f6a068da1d0 <unfinished ...> setlocale(LC_ALL, "") = "LC_CTYPE=en_US.utf8;LC_NUMERIC=e"... bindtextdomain("libvirt", "/usr/share/locale") = "/usr/share/locale" textdomain("libvirt") = "libvirt" virMutexInit(0x7fff2225bfc0, 1, 1, 0x7472697662696c) = 0 virInitialize(0x7fff2225bda0, 0x7f6a068da86a, 5, 0) = 0 strrchr("virsh", '/') = nil virGetEnvBlockSUID(0x7f6a068da8a2, 9, 7, 5) = 0 virGetEnvAllowSUID(0x7f6a068da380, 9, 11, 34) = 0 virGetEnvBlockSUID(0x7f6a068da6f6, 9, 11, 0) = 0 getopt_long(3, 0x7fff2225c118, "+:d:hqtc:vVrl:e:", 0x7fff2225be20, -1) = -1 virStrdup(0x7fff2225bcc8, 0x7fff2225c69f, 1, 0) = 1 strcmp("attach-device", "list") = -11 strcmp("attach-disk", "list") = -11 strcmp("attach-interface", "list") = -11 strcmp("autostart", "list") = -11 strcmp("blkdeviotune", "list") = -10 strcmp("blkiotune", "list") = -10 strcmp("blockcommit", "list") = -10 strcmp("blockcopy", "list") = -10 strcmp("blockjob", "list") = -10 strcmp("blockpull", "list") = -10 strcmp("blockresize", "list") = -10 strcmp("change-media", "list") = -9 strcmp("console", "list") = -9 strcmp("cpu-baseline", "list") = -9 strcmp("cpu-compare", "list") = -9 strcmp("cpu-stats", "list") = -9 strcmp("create", "list") = -9 strcmp("define", "list") = -8 strcmp("desc", "list") = -8 strcmp("destroy", "list") = -8 strcmp("detach-device", "list") = -8 strcmp("detach-disk", "list") = -8 strcmp("detach-interface", "list") = -8 strcmp("domdisplay", "list") = -8 strcmp("domfstrim", "list") = -8 strcmp("domhostname", "list") = -8 strcmp("domid", "list") = -8 strcmp("domif-setlink", "list") = -8 strcmp("domiftune", "list") = -8 strcmp("domjobabort", "list") = -8 strcmp("domjobinfo", "list") = -8 strcmp("domname", "list") = -8 strcmp("dompmsuspend", "list") = -8 strcmp("dompmwakeup", "list") = -8 strcmp("domuuid", "list") = -8 strcmp("domxml-from-native", "list") = -8 strcmp("domxml-to-native", "list") = -8 strcmp("dump", "list") = -8 strcmp("dumpxml", "list") = -8 strcmp("edit", "list") = -7 strcmp("inject-nmi", "list") = -3 strcmp("send-key", "list") = 7 strcmp("send-process-signal", "list") = 7 strcmp("lxc-enter-namespace", "list") = 15 strcmp("managedsave", "list") = 1 strcmp("managedsave-remove", "list") = 1 strcmp("maxvcpus", "list") = 1 strcmp("memtune", "list") = 1 strcmp("migrate", "list") = 1 strcmp("migrate-setmaxdowntime", "list") = 1 strcmp("migrate-compcache", "list") = 1 strcmp("migrate-setspeed", "list") = 1 strcmp("migrate-getspeed", "list") = 1 strcmp("numatune", "list") = 2 strcmp("qemu-attach", "list") = 5 strcmp("qemu-monitor-command", "list") = 5 strcmp("qemu-agent-command", "list") = 5 strcmp("reboot", "list") = 6 strcmp("reset", "list") = 6 strcmp("restore", "list") = 6 strcmp("resume", "list") = 6 strcmp("save", "list") = 7 strcmp("save-image-define", "list") = 7 strcmp("save-image-dumpxml", "list") = 7 strcmp("save-image-edit", "list") = 7 strcmp("schedinfo", "list") = 7 strcmp("screenshot", "list") = 7 strcmp("setmaxmem", "list") = 7 strcmp("setmem", "list") = 7 strcmp("setvcpus", "list") = 7 strcmp("shutdown", "list") = 7 strcmp("start", "list") = 7 strcmp("suspend", "list") = 7 strcmp("ttyconsole", "list") = 8 strcmp("undefine", "list") = 9 strcmp("update-device", "list") = 9 strcmp("vcpucount", "list") = 10 strcmp("vcpuinfo", "list") = 10 strcmp("vcpupin", "list") = 10 strcmp("emulatorpin", "list") = -7 strcmp("vncdisplay", "list") = 10 strcmp("domblkerror", "list") = -8 strcmp("domblkinfo", "list") = -8 strcmp("domblklist", "list") = -8 strcmp("domblkstat", "list") = -8 strcmp("domcontrol", "list") = -8 strcmp("domif-getlink", "list") = -8 strcmp("domiflist", "list") = -8 strcmp("domifstat", "list") = -8 strcmp("dominfo", "list") = -8 strcmp("dommemstat", "list") = -8 strcmp("domstate", "list") = -8 strcmp("list", "list") = 0 virFree(0x7fff2225bd80, 0x7fff2225bd78, 0x7fff2225bd7c, 0) = 0x7f6a08090c90 virStrdup(0x7fff2225bcc8, 0x7fff2225c6a4, 1, 0) = 1 strchr("all", '=') = nil strcmp("all", "help") = -7 strcmp("inactive", "all") = 8 strcmp("all", "all") = 0 virFree(0x7fff2225bd80, 0, 3, 1) = 0x7f6a08090c90 virAllocN(0x7fff2225bce8, 1, 24, 1) = 0 dcgettext(0x7f6a068da89a, 0x7f6a068da782, 5, 3) = 0x7f6a068da782 dcgettext(0x7f6a068da89a, 0x7f6a068da791, 5, 2) = 0x7f6a068da791 virFree(0x7fff2225bd80, 0x7fff2225be00, 0x7fff2225bd80, 0x7f6a068dbbe0) = 0 virAllocN(0x7fff2225bce8, 1, 24, 1) = 0 virGetEnvAllowSUID(0x7f6a068da380, 0x7f6a068dc4ae, 0x7f6a0801d420, 4) = 0 virGetEnvBlockSUID(0x7f6a068da6f6, 0x7f6a068dc4ae, 11, 0) = 0 virSetErrorFunc(0, 0x7f6a068aecc0, 4, 0xffff8095f9725946) = 0x7f6a0666a230 virEventRegisterDefaultImpl(0, 0x7f6a068aecc0, 4, 0xffff8095f9725946) = 0 virThreadCreate(0x7fff2225bfb8, 1, 0x7f6a068b0360, 0x7fff2225bf60) = 0 virConnectOpenAuth(0, 0x7f6a066676e0, 0, -1^C <no return ...> --- SIGINT (Interrupt) --- +++ killed by SIGINT +++ Actual results: Above example is 256 autostarted virtual networks, if you enlarge the number of network, user will wait a longer time to get response from libvirtd. If setting network autostart off, the response time will be reduced a lot, the time is acceptance. 1.Another example:(765 autostarted virtual networks) [root@ibm-x3850x5-01 ~]# time virsh list --all Id Name State ---------------------------------------------------- real 79m39.460s user 0m0.015s sys 0m0.022s [root@ibm-x3850x5-01 ~]# virsh net-list --all | grep active | wc -l 765 [root@ibm-x3850x5-01 ~]# 2. Also, when we reduce the autostarted virtual network number to 51, the consumption was less than 3 minutes [root@ibm-x3650m3-03 ~]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service Warning: Unit file of libvirtd.service changed on disk, 'systemctl daemon-reload' recommended. [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# time virsh list --all Id Name State ---------------------------------------------------- 5 r7 running - r6 shut off real 2m53.779s user 0m0.013s sys 0m0.011s [root@ibm-x3650m3-03 ~]# virsh net-list --all| grep active |wc -l 51 Expected results: Libvirt should deal with this situation, give a quick reply to customer when handling a lot of events.
More questions on this issue: 1. Virsh commands(such as "virsh list --all", "virsh net-list --all", etc) will be blocked by libvirtd while its daemon is starting a lot of virtual networks in background. When libvirtd handled all background events(here is virtual network), all virsh commands responded to customer quickly. I think libvirtd should deal with foreground input with higher priority at anytime. What do you think of my opinion? 2. Are there some risk conditions between user input and libvirt daemon or among libvitd's child threads? Thanks.
(In reply to Hu Jianwei from comment #1) > 2. Are there some risk conditions between user input and libvirt daemon or s/risk/race/, Jianwei, please attach relevant log files and make sure you haven't touch any limitation.
Created attachment 831471 [details] libvirtd.log and system messages log. I can't find any useful log from attached logs, no error info recorded.
This is basically a request to start virtual networks in background so that a starting libvirtd is not blocked until all networks are set up. While this is a nice idea, I'm not convinced it's worth implementing. Is there any real reason one would need to deploy lots of autostarted virtual networks on a single host?
Before we even look at doing clever things in autostarting networks, we should get clear data on what is slow about starting them serially. If we can make starting of individual networks faster then that is a better win overall than focusing on playing tricks at libvirtd startup.
(In reply to Jiri Denemark from comment #4) > This is basically a request to start virtual networks in background so that > a starting libvirtd is not blocked until all networks are set up. While this > is a nice idea, I'm not convinced it's worth implementing. Is there any real > reason one would need to deploy lots of autostarted virtual networks on a > single host? No customer requirement for real deployment, only a testing scenario, but I think we could do some optimization to get better user experience, if possible.
Dan Berrange has just pushed patches upstream that have a very significant effect on network start time when firewalld is enabled on the system - if the testing done above was on a system with firewalld enabled, it would be useful to try it again with a build of the latest upstream source to see if the situation is now acceptable.
Yes, speed of starting virtual networks when firewalld is running has improved dramatically https://www.redhat.com/archives/libvir-list/2014-April/msg00335.html "timing the network driver $ for i in `seq 1 10` ; do virsh net-start default; virsh net-destroy default ; done Direct iptables: 3 seconds Via firewall-cmd: 42 seconds" So on a firewalld enabled host we cut time to start 10 networks from 42 seconds down to 3 seconds, so it is on a par with non-firewalld based hosts.
As Daniel said, this should be dramatically improved by not using firewalld-cmd anymore.
Compared with former results, libvirt has a great improvement. For 249 virtual networks, virsh will take 2m4.517s For 504 virtual networks, virsh will take 5m10.606s For 759 virtual networks, virsh will take 8m45.235s (On old libvirt: 256 vnet will spend 19m5.009s 765 vnet will spend 79m39.460s ) Detail steps: [root@localhost 1035966]# rpm -q libvirt kernel libvirt-1.2.7-1.el7.x86_64 kernel-3.10.0-138.el7.x86_64 For 249 virtual networks [root@localhost 1035966]# virsh net-list | grep active | wc -l 249 [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 2m4.517s user 0m0.007s sys 0m0.004s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.015s user 0m0.007s sys 0m0.003s For 504 virtual networks [root@localhost 1035966]# virsh net-list | grep active | wc -l 504 [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.016s user 0m0.005s sys 0m0.005s [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 5m10.606s user 0m0.006s sys 0m0.006s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.017s user 0m0.007s sys 0m0.005s For 759 virtual networks [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.020s user 0m0.005s sys 0m0.007s [root@localhost 1035966]# virsh net-list | grep active | wc -l 759 [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 8m45.235s user 0m0.005s sys 0m0.006s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.018s user 0m0.004s sys 0m0.008s
According to comment 11, it's acceptable, changed to Verified.
I found a machine, its libvirtd has a quicker reply than other machines I used before. 251 networks(real 0m21.877s) 510 networks(real 1m10.441s) 765 networks(real 2m25.811s) Software environment: [root@intel-e31225-8-3 network]# rpm -q libvirt kernel libvirt-1.2.8-12.el7.x86_64 kernel-3.10.0-221.el7.x86_64 [root@intel-e31225-8-3 network]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Xeon(R) CPU E31225 @ 3.10GHz Stepping: 7 CPU MHz: 2928.410 BogoMIPS: 6184.19 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 [root@intel-e31225-8-3 network]# free -g total used free shared buff/cache available Mem: 7 2 0 0 4 4 Swap: 7 0 7 [root@intel-e31225-8-3 network]# uptime 16:25:46 up 5:45, 2 users, load average: 0.01, 0.06, 10.21 [root@intel-e31225-8-3 network]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@intel-e31225-8-3 network]# time virsh -q net-list |wc -l 765 real 2m20.285s user 0m0.069s sys 0m0.050s Same software environment on another machine: [root@hp-dl385g7-08 network]# rpm -q libvirt kernel libvirt-1.2.8-12.el7.x86_64 kernel-3.10.0-221.el7.x86_64 [root@hp-dl385g7-08 network]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6272 Stepping: 2 CPU MHz: 2100.010 BogoMIPS: 4199.77 Virtualization: AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 NUMA node1 CPU(s): 16,18,20,22,24,26,28,30 NUMA node2 CPU(s): 1,3,5,7,9,11,13,15 NUMA node3 CPU(s): 17,19,21,23,25,27,29,31 [root@hp-dl385g7-08 network]# free -g total used free shared buff/cache available Mem: 62 3 47 0 11 58 Swap: 31 0 31 [root@hp-dl385g7-08 network]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@hp-dl385g7-08 network]# time virsh net-list | wc -l 768 real 20m54.478s user 0m0.178s sys 0m0.308s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html