Bug 852984
Summary: | virsh start command will be hung with openvswitch network interface | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Alex Jia <ajia> | ||||||
Component: | libvirt | Assignee: | Laine Stump <laine> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 6.4 | CC: | acathrow, dyasny, dyuan, mzhan, veillard, ydu | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-0.10.2-0rc1.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-02-21 07:22:36 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Alex Jia
2012-08-30 07:45:49 UTC
Created attachment 608065 [details]
backtrace of hung libvirtd
Created attachment 608066 [details]
backtrace of hung libvirtd
should be fixed upstream now, commit 5e465df6be8bcb00f0b4bff831e91f4042fae272 Author: Kyle Mestery <kmestery> Date: Wed Aug 29 14:44:36 2012 -0400 Fix a crash when using Open vSwitch virtual ports Fixup buffer usage when handling VLANs. Also fix the logic used to determine if the virNetDevVlanPtr is valid or not. Fixes crashes in the latest code when using Open vSwitch virtualports. Signed-off-by: Kyle Mestery <kmestery> Ah, hum, no looking at the backtrace it's likely to be a different problem ... Daniel (In reply to comment #5) > Ah, hum, no looking at the backtrace it's likely to be a different problem > ... > > Daniel Yes, it's diffrent issue, so move the bug to 'NEW' again, I just simply configured ovs and also could meet this issue: # ps -ef | grep --color -E "[O|o]vs" nobody 23555 1 0 17:47 ? 00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --local=// --domain-needed --filterwin2k --pid-file=/var/run/libvirt/network/ovs-net.pid --conf-file= --except-interface lo --listen-address 192.168.100.1 --dhcp-range 192.168.100.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/ovs-net.leases --dhcp-lease-max=5885 --dhcp-no-override root 23581 21962 0 17:47 ? 00:00:00 ovs-vsctl -- --may-exist add-port ovsbr0 vnet0 -- set Interface vnet0 external-ids:attached-mac="52:54:00:79:50:4F" -- set Interface vnet0 external-ids:iface-id="9189f2ed-9aba-0e0f-35ad-83050282a3c7" -- set Interface vnet0 external-ids:vm-id="af57deb3-98a3-6cb7-b611-d63bcde2d7d5" -- set Interface vnet0 external-ids:iface-status=active The problem is that, by default, ovs-vsctl will wait forever for ovs-vswitchd to respond, even if ovs-vswitchd isn't running. The ovs-vsctl manpage recommends adding "--no-wait" to the ovs-vsctl commandline if you know that ovs-vswitchd isn't running, but of course we *don't* know that (and anyway, when I tested --nowait just now, I found that it had absolutely no effect for the ovs-vsctl addport command). An alternative is to set a timeout for the command with "--timeout=n" (all of this information is from the ovs-vsctl manpage). In this case, ovs-vsctl *will* exit with SIGALRM after the specified number of seconds. So the only trick is in figuring out the optimal setting for this - a shorter time means a greater likelyhood of false failure, but a longer time means a much longer stderr message in the logs. My preference is for longer messages, but greater reliability. Pushed the following patch upstream: commit 98e732fc34a47ad9dfdb64aa4207623ee4c1ebcd Author: Laine Stump <laine> Date: Tue Sep 4 15:26:29 2012 -0400 network: prevent infinite hang if ovs-vswitchd isn't running This fixes https://bugzilla.redhat.com/show_bug.cgi?id=852984 If a network or interface is configured to use Open vSwitch, but ovs-vswitchd (the Open vSwitch database service) isn't running, the ovs-vsctl add-port/del-port commands will hang indefinitely rather than returning an error. There is a --nowait option, but that appears to have no effect on add-port and del-port commands, so instead we add a --timeout=5 to the commands - they will retry for up to 5 seconds, then fail if there is no response. Test with libvirt-0.10.2-0rc1.el6.x86_64, and the virsh command hang issue is fixed. Following the steps in comment 0, when start guest, will get an error, but virsh will not hang. # virsh start aaa error: Failed to start domain aaa error: Unable to add port vnet0 to OVS bridge ovsbr0: Operation not permitted libvirtd.log ------ 2012-09-20 10:08:16.957+0000: 22402: error : virCommandWait:2345 : internal error Child process (ovs-vsctl --timeout=5 -- --may-exist add-port ovsbr0 vnet0 -- set Interface vnet0 'external-ids:attached-mac="52:54:00:CA:B4:39"' -- set Interface vnet0 'external-ids:iface-id="3e092f81-7e82-fc95-47a1-8f5a5d3c389d"' -- set Interface vnet0 'external-ids:vm-id="59fa20eb-dfb1-7dc8-2158-7888dbbec13b"' -- set Interface vnet0 external-ids:iface-status=active) unexpected fatal signal 14: 2012-09-20T10:08:10Z|00002|stream_unix|ERR|/var/run/openvswitch/db.sock: connection failed (No such file or directory) 2012-09-20T10:08:10Z|00003|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2012-09-20T10:08:11Z|00004|stream_unix|ERR|/var/run/openvswitch/db.sock: connection failed (No such file or directory) 2012-09-20T10:08:11Z|00005|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2012-09-20T10:08:13Z|00006|stream_unix|ERR|/var/run/openvswitch/db.sock: connection failed (No such file or directory) 2012 2012-09-20 10:08:16.957+0000: 22402: error : virNetDevOpenvswitchAddPort:136 : Unable to add port vnet0 to OVS bridge ovsbr0: Operation not permitted 2012-09-20 10:08:17.007+0000: 22402: error : virCommandWait:2345 : internal error Child process (ovs-vsctl --timeout=5 -- --if-exists del-port) unexpected exit status 1: ovs-vsctl: 'del-port' command requires at least 1 arguments 2012-09-20 10:08:17.007+0000: 22402: error : virNetDevOpenvswitchRemovePort:173 : Unable to delete port (null) from OVS: Operation not permitted ------ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |