| Summary: | Start autostarted virtual networks in background | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Hu Jianwei <jiahu> | ||||
| Component: | libvirt | Assignee: | Laine Stump <laine> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 7.0 | CC: | ajia, berrange, dyuan, honzhang, jdenemar, mzhan, rbalakri | ||||
| Target Milestone: | rc | Keywords: | TestOnly | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-1.2.7-1.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Restarting libvirtd with a large number of virtual networks configured was taking a very long time on systems with firewalld enabled, with libvirtd's services unavailable in the interim. This was due to libvirt repeatedly exec'ing firewall-cmd to add the relevant rules to the host's iptables chains, which was extremely slow. libvirt now manipulates firewalld via the dbus interface, which tests show to be 10x faster, resulting in greatly reduced downtime during libvirts restarts.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-03-05 07:28:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
More questions on this issue: 1. Virsh commands(such as "virsh list --all", "virsh net-list --all", etc) will be blocked by libvirtd while its daemon is starting a lot of virtual networks in background. When libvirtd handled all background events(here is virtual network), all virsh commands responded to customer quickly. I think libvirtd should deal with foreground input with higher priority at anytime. What do you think of my opinion? 2. Are there some risk conditions between user input and libvirt daemon or among libvitd's child threads? Thanks. (In reply to Hu Jianwei from comment #1) > 2. Are there some risk conditions between user input and libvirt daemon or s/risk/race/, Jianwei, please attach relevant log files and make sure you haven't touch any limitation. Created attachment 831471 [details]
libvirtd.log and system messages log.
I can't find any useful log from attached logs, no error info recorded.
This is basically a request to start virtual networks in background so that a starting libvirtd is not blocked until all networks are set up. While this is a nice idea, I'm not convinced it's worth implementing. Is there any real reason one would need to deploy lots of autostarted virtual networks on a single host? Before we even look at doing clever things in autostarting networks, we should get clear data on what is slow about starting them serially. If we can make starting of individual networks faster then that is a better win overall than focusing on playing tricks at libvirtd startup. (In reply to Jiri Denemark from comment #4) > This is basically a request to start virtual networks in background so that > a starting libvirtd is not blocked until all networks are set up. While this > is a nice idea, I'm not convinced it's worth implementing. Is there any real > reason one would need to deploy lots of autostarted virtual networks on a > single host? No customer requirement for real deployment, only a testing scenario, but I think we could do some optimization to get better user experience, if possible. Dan Berrange has just pushed patches upstream that have a very significant effect on network start time when firewalld is enabled on the system - if the testing done above was on a system with firewalld enabled, it would be useful to try it again with a build of the latest upstream source to see if the situation is now acceptable. Yes, speed of starting virtual networks when firewalld is running has improved dramatically https://www.redhat.com/archives/libvir-list/2014-April/msg00335.html "timing the network driver $ for i in `seq 1 10` ; do virsh net-start default; virsh net-destroy default ; done Direct iptables: 3 seconds Via firewall-cmd: 42 seconds" So on a firewalld enabled host we cut time to start 10 networks from 42 seconds down to 3 seconds, so it is on a par with non-firewalld based hosts. As Daniel said, this should be dramatically improved by not using firewalld-cmd anymore. Compared with former results, libvirt has a great improvement. For 249 virtual networks, virsh will take 2m4.517s For 504 virtual networks, virsh will take 5m10.606s For 759 virtual networks, virsh will take 8m45.235s (On old libvirt: 256 vnet will spend 19m5.009s 765 vnet will spend 79m39.460s ) Detail steps: [root@localhost 1035966]# rpm -q libvirt kernel libvirt-1.2.7-1.el7.x86_64 kernel-3.10.0-138.el7.x86_64 For 249 virtual networks [root@localhost 1035966]# virsh net-list | grep active | wc -l 249 [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 2m4.517s user 0m0.007s sys 0m0.004s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.015s user 0m0.007s sys 0m0.003s For 504 virtual networks [root@localhost 1035966]# virsh net-list | grep active | wc -l 504 [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.016s user 0m0.005s sys 0m0.005s [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 5m10.606s user 0m0.006s sys 0m0.006s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.017s user 0m0.007s sys 0m0.005s For 759 virtual networks [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.020s user 0m0.005s sys 0m0.007s [root@localhost 1035966]# virsh net-list | grep active | wc -l 759 [root@localhost 1035966]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 8m45.235s user 0m0.005s sys 0m0.006s [root@localhost 1035966]# time virsh list --all Id Name State ---------------------------------------------------- 20 r7_latest running - r7 shut off - r7_1 shut off - r7_n shut off real 0m0.018s user 0m0.004s sys 0m0.008s According to comment 11, it's acceptable, changed to Verified. I found a machine, its libvirtd has a quicker reply than other machines I used before.
251 networks(real 0m21.877s)
510 networks(real 1m10.441s)
765 networks(real 2m25.811s)
Software environment:
[root@intel-e31225-8-3 network]# rpm -q libvirt kernel
libvirt-1.2.8-12.el7.x86_64
kernel-3.10.0-221.el7.x86_64
[root@intel-e31225-8-3 network]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Xeon(R) CPU E31225 @ 3.10GHz
Stepping: 7
CPU MHz: 2928.410
BogoMIPS: 6184.19
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
[root@intel-e31225-8-3 network]# free -g
total used free shared buff/cache available
Mem: 7 2 0 0 4 4
Swap: 7 0 7
[root@intel-e31225-8-3 network]# uptime
16:25:46 up 5:45, 2 users, load average: 0.01, 0.06, 10.21
[root@intel-e31225-8-3 network]# service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service
[root@intel-e31225-8-3 network]# time virsh -q net-list |wc -l
765
real 2m20.285s
user 0m0.069s
sys 0m0.050s
Same software environment on another machine:
[root@hp-dl385g7-08 network]# rpm -q libvirt kernel
libvirt-1.2.8-12.el7.x86_64
kernel-3.10.0-221.el7.x86_64
[root@hp-dl385g7-08 network]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 4
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Model name: AMD Opteron(TM) Processor 6272
Stepping: 2
CPU MHz: 2100.010
BogoMIPS: 4199.77
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 6144K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
NUMA node1 CPU(s): 16,18,20,22,24,26,28,30
NUMA node2 CPU(s): 1,3,5,7,9,11,13,15
NUMA node3 CPU(s): 17,19,21,23,25,27,29,31
[root@hp-dl385g7-08 network]# free -g
total used free shared buff/cache available
Mem: 62 3 47 0 11 58
Swap: 31 0 31
[root@hp-dl385g7-08 network]# service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service
[root@hp-dl385g7-08 network]# time virsh net-list | wc -l
768
real 20m54.478s
user 0m0.178s
sys 0m0.308s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |
Description of problem: Existing a large number of autostarted virtual networks, libvirtd should rapidly respond to customer input after its daemon was restarted Version-Release number of selected component (if applicable): libvirt-1.1.1-12.el7.x86_64 qemu-kvm-1.5.3-19.el7.x86_64 kernel-3.10.0-48.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Enabled below limits. [root@ibm-x3650m3-03 ~]# cat /etc/libvirt/libvirtd.conf | grep ^[^#] max_clients = 204800 max_queued_clients = 1000000 max_workers = 204800 max_requests = 204800 max_client_requests = 204800 log_level = 1 log_outputs="1:file:/var/log/libvirt/libvirtd.log" log_filters="1:qemu 1:libvirt 1:conf 1:json" [root@ibm-x3850x5-01 ~]# cat /usr/lib/systemd/system/libvirtd.service| grep LimitNOFILE LimitNOFILE=204800 [root@ibm-x3850x5-01 ~]# 2. Defined more than 256 autostarted virtual networks [root@ibm-x3650m3-03 ~]# virsh net-list | grep active | wc -l 256 [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes network1 active yes yes network10 active yes yes ... 3. Restart libvirtd [root@ibm-x3650m3-03 ~]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service [root@ibm-x3650m3-03 ~]# service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: active (running) since Fri 2013-11-22 12:59:02 CST; 11min ago Main PID: 15579 (libvirtd) CGroup: /system.slice/libvirtd.service ├─ 2737 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf ├─15579 /usr/sbin/libvirtd ├─27413 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network1.conf ├─27468 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network2.conf ├─27523 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network3.conf ├─27580 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network4.conf ├─27635 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network5.conf ├─27692 /sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/network6.conf ... 4. Execute virsh cmd, and check the time consumption of cmd. [root@ibm-x3650m3-03 ~]# time virsh list --all Id Name State ---------------------------------------------------- 5 r7 running - r6 shut off real 19m5.009s user 0m0.010s sys 0m0.011s [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# virsh net-list --all|grep active |wc -l 256 5. Running below commands during running step 4. [root@ibm-x3650m3-03 ~]# ltrace -p `pidof libvirtd` --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- --- SIGCHLD (Child exited) --- ^C[root@ibm-x3650m3-03 ~]ltrace virsh list --all __libc_start_main(0x7f6a068ad2b0, 3, 0x7fff2225c118, 0x7f6a068da1d0 <unfinished ...> setlocale(LC_ALL, "") = "LC_CTYPE=en_US.utf8;LC_NUMERIC=e"... bindtextdomain("libvirt", "/usr/share/locale") = "/usr/share/locale" textdomain("libvirt") = "libvirt" virMutexInit(0x7fff2225bfc0, 1, 1, 0x7472697662696c) = 0 virInitialize(0x7fff2225bda0, 0x7f6a068da86a, 5, 0) = 0 strrchr("virsh", '/') = nil virGetEnvBlockSUID(0x7f6a068da8a2, 9, 7, 5) = 0 virGetEnvAllowSUID(0x7f6a068da380, 9, 11, 34) = 0 virGetEnvBlockSUID(0x7f6a068da6f6, 9, 11, 0) = 0 getopt_long(3, 0x7fff2225c118, "+:d:hqtc:vVrl:e:", 0x7fff2225be20, -1) = -1 virStrdup(0x7fff2225bcc8, 0x7fff2225c69f, 1, 0) = 1 strcmp("attach-device", "list") = -11 strcmp("attach-disk", "list") = -11 strcmp("attach-interface", "list") = -11 strcmp("autostart", "list") = -11 strcmp("blkdeviotune", "list") = -10 strcmp("blkiotune", "list") = -10 strcmp("blockcommit", "list") = -10 strcmp("blockcopy", "list") = -10 strcmp("blockjob", "list") = -10 strcmp("blockpull", "list") = -10 strcmp("blockresize", "list") = -10 strcmp("change-media", "list") = -9 strcmp("console", "list") = -9 strcmp("cpu-baseline", "list") = -9 strcmp("cpu-compare", "list") = -9 strcmp("cpu-stats", "list") = -9 strcmp("create", "list") = -9 strcmp("define", "list") = -8 strcmp("desc", "list") = -8 strcmp("destroy", "list") = -8 strcmp("detach-device", "list") = -8 strcmp("detach-disk", "list") = -8 strcmp("detach-interface", "list") = -8 strcmp("domdisplay", "list") = -8 strcmp("domfstrim", "list") = -8 strcmp("domhostname", "list") = -8 strcmp("domid", "list") = -8 strcmp("domif-setlink", "list") = -8 strcmp("domiftune", "list") = -8 strcmp("domjobabort", "list") = -8 strcmp("domjobinfo", "list") = -8 strcmp("domname", "list") = -8 strcmp("dompmsuspend", "list") = -8 strcmp("dompmwakeup", "list") = -8 strcmp("domuuid", "list") = -8 strcmp("domxml-from-native", "list") = -8 strcmp("domxml-to-native", "list") = -8 strcmp("dump", "list") = -8 strcmp("dumpxml", "list") = -8 strcmp("edit", "list") = -7 strcmp("inject-nmi", "list") = -3 strcmp("send-key", "list") = 7 strcmp("send-process-signal", "list") = 7 strcmp("lxc-enter-namespace", "list") = 15 strcmp("managedsave", "list") = 1 strcmp("managedsave-remove", "list") = 1 strcmp("maxvcpus", "list") = 1 strcmp("memtune", "list") = 1 strcmp("migrate", "list") = 1 strcmp("migrate-setmaxdowntime", "list") = 1 strcmp("migrate-compcache", "list") = 1 strcmp("migrate-setspeed", "list") = 1 strcmp("migrate-getspeed", "list") = 1 strcmp("numatune", "list") = 2 strcmp("qemu-attach", "list") = 5 strcmp("qemu-monitor-command", "list") = 5 strcmp("qemu-agent-command", "list") = 5 strcmp("reboot", "list") = 6 strcmp("reset", "list") = 6 strcmp("restore", "list") = 6 strcmp("resume", "list") = 6 strcmp("save", "list") = 7 strcmp("save-image-define", "list") = 7 strcmp("save-image-dumpxml", "list") = 7 strcmp("save-image-edit", "list") = 7 strcmp("schedinfo", "list") = 7 strcmp("screenshot", "list") = 7 strcmp("setmaxmem", "list") = 7 strcmp("setmem", "list") = 7 strcmp("setvcpus", "list") = 7 strcmp("shutdown", "list") = 7 strcmp("start", "list") = 7 strcmp("suspend", "list") = 7 strcmp("ttyconsole", "list") = 8 strcmp("undefine", "list") = 9 strcmp("update-device", "list") = 9 strcmp("vcpucount", "list") = 10 strcmp("vcpuinfo", "list") = 10 strcmp("vcpupin", "list") = 10 strcmp("emulatorpin", "list") = -7 strcmp("vncdisplay", "list") = 10 strcmp("domblkerror", "list") = -8 strcmp("domblkinfo", "list") = -8 strcmp("domblklist", "list") = -8 strcmp("domblkstat", "list") = -8 strcmp("domcontrol", "list") = -8 strcmp("domif-getlink", "list") = -8 strcmp("domiflist", "list") = -8 strcmp("domifstat", "list") = -8 strcmp("dominfo", "list") = -8 strcmp("dommemstat", "list") = -8 strcmp("domstate", "list") = -8 strcmp("list", "list") = 0 virFree(0x7fff2225bd80, 0x7fff2225bd78, 0x7fff2225bd7c, 0) = 0x7f6a08090c90 virStrdup(0x7fff2225bcc8, 0x7fff2225c6a4, 1, 0) = 1 strchr("all", '=') = nil strcmp("all", "help") = -7 strcmp("inactive", "all") = 8 strcmp("all", "all") = 0 virFree(0x7fff2225bd80, 0, 3, 1) = 0x7f6a08090c90 virAllocN(0x7fff2225bce8, 1, 24, 1) = 0 dcgettext(0x7f6a068da89a, 0x7f6a068da782, 5, 3) = 0x7f6a068da782 dcgettext(0x7f6a068da89a, 0x7f6a068da791, 5, 2) = 0x7f6a068da791 virFree(0x7fff2225bd80, 0x7fff2225be00, 0x7fff2225bd80, 0x7f6a068dbbe0) = 0 virAllocN(0x7fff2225bce8, 1, 24, 1) = 0 virGetEnvAllowSUID(0x7f6a068da380, 0x7f6a068dc4ae, 0x7f6a0801d420, 4) = 0 virGetEnvBlockSUID(0x7f6a068da6f6, 0x7f6a068dc4ae, 11, 0) = 0 virSetErrorFunc(0, 0x7f6a068aecc0, 4, 0xffff8095f9725946) = 0x7f6a0666a230 virEventRegisterDefaultImpl(0, 0x7f6a068aecc0, 4, 0xffff8095f9725946) = 0 virThreadCreate(0x7fff2225bfb8, 1, 0x7f6a068b0360, 0x7fff2225bf60) = 0 virConnectOpenAuth(0, 0x7f6a066676e0, 0, -1^C <no return ...> --- SIGINT (Interrupt) --- +++ killed by SIGINT +++ Actual results: Above example is 256 autostarted virtual networks, if you enlarge the number of network, user will wait a longer time to get response from libvirtd. If setting network autostart off, the response time will be reduced a lot, the time is acceptance. 1.Another example:(765 autostarted virtual networks) [root@ibm-x3850x5-01 ~]# time virsh list --all Id Name State ---------------------------------------------------- real 79m39.460s user 0m0.015s sys 0m0.022s [root@ibm-x3850x5-01 ~]# virsh net-list --all | grep active | wc -l 765 [root@ibm-x3850x5-01 ~]# 2. Also, when we reduce the autostarted virtual network number to 51, the consumption was less than 3 minutes [root@ibm-x3650m3-03 ~]# service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service Warning: Unit file of libvirtd.service changed on disk, 'systemctl daemon-reload' recommended. [root@ibm-x3650m3-03 ~]# [root@ibm-x3650m3-03 ~]# time virsh list --all Id Name State ---------------------------------------------------- 5 r7 running - r6 shut off real 2m53.779s user 0m0.013s sys 0m0.011s [root@ibm-x3650m3-03 ~]# virsh net-list --all| grep active |wc -l 51 Expected results: Libvirt should deal with this situation, give a quick reply to customer when handling a lot of events.