Hide Forgot
Description of problem: When starting a new virtual machine (RHEL-6 on RHEL-7 host), it cannot connect to the libvirt virtual network. NetworkManager inside of the virtual machine just asks for the connection endlessly until it timeouts. Information from the host: ========================== matej@wycliff: ~$ brctl show bridge name bridge id STP enabled interfaces virbr0 8000.5254006988bd yes virbr0-nic matej@wycliff: ~$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether f0:de:f1:99:b2:5a brd ff:ff:ff:ff:ff:ff inet 10.0.0.8/24 scope global dynamic em1 valid_lft 255121sec preferred_lft 255121sec inet6 fe80::f2de:f1ff:fe99:b25a/64 scope link valid_lft forever preferred_lft forever 3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 24:77:03:1e:b3:58 brd ff:ff:ff:ff:ff:ff inet 10.0.0.7/24 scope global dynamic wlp3s0 valid_lft 255124sec preferred_lft 255124sec inet6 fe80::2677:3ff:fe1e:b358/64 scope link valid_lft forever preferred_lft forever 9: sit0: <NOARP> mtu 1480 qdisc noop state DOWN link/sit 0.0.0.0 brd 0.0.0.0 10: aiccu: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc pfifo_fast state UNKNOWN qlen 500 link/none inet6 2a01:8c00:ff00:1e8::2/64 scope global valid_lft forever preferred_lft forever inet6 fe80::8c00:ff00:1e8:2/64 scope link valid_lft forever preferred_lft forever 16: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether fe:54:00:f7:cb:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fef7:cbee/64 scope link valid_lft forever preferred_lft forever 18: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether fe:54:00:d8:e8:1e brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fed8:e81e/64 scope link valid_lft forever preferred_lft forever 19: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 52:54:00:69:88:bd brd ff:ff:ff:ff:ff:ff inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 valid_lft forever preferred_lft forever 20: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 500 link/ether 52:54:00:69:88:bd brd ff:ff:ff:ff:ff:ff 21: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc pfifo_fast state UNKNOWN qlen 100 link/none inet 10.36.116.76/22 scope global tun0 valid_lft forever preferred_lft forever matej@wycliff: ~$ Information from guest Nov 28 18:13:03 santiago NetworkManager[1529]: <info> (eth0): device state change: 5 -> 7 (reason 0) Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Beginning DHCPv4 transaction Nov 28 18:13:03 santiago NetworkManager[1529]: <info> dhclient started with pid 4474 Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) DHCPv4 will time out in 45 seconds Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Beginning IP6 addrconf. Nov 28 18:13:03 santiago avahi-daemon[1541]: Withdrawing address record for fe80::5054:ff:fed8:e81e on eth0. Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) complete. Nov 28 18:13:03 santiago dhclient[4474]: Internet Systems Consortium DHCP Client 4.1.1-P1 Nov 28 18:13:03 santiago dhclient[4474]: Copyright 2004-2010 Internet Systems Consortium. Nov 28 18:13:03 santiago dhclient[4474]: All rights reserved. Nov 28 18:13:03 santiago dhclient[4474]: For info, please visit https://www.isc.org/software/dhcp/ Nov 28 18:13:03 santiago dhclient[4474]: Nov 28 18:13:03 santiago NetworkManager[1529]: <info> (eth0): DHCPv4 state changed nbi -> preinit Nov 28 18:13:03 santiago dhclient[4474]: Listening on LPF/eth0/52:54:00:d8:e8:1e Nov 28 18:13:03 santiago dhclient[4474]: Sending on LPF/eth0/52:54:00:d8:e8:1e Nov 28 18:13:03 santiago dhclient[4474]: Sending on Socket/fallback Nov 28 18:13:03 santiago dhclient[4474]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x22dd7fa7) Nov 28 18:13:04 santiago avahi-daemon[1541]: Registering new address record for fe80::5054:ff:fed8:e81e on eth0.*. Nov 28 18:13:10 santiago dhclient[4474]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x22dd7fa7) Nov 28 18:13:19 santiago dhclient[4474]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6 (xid=0x255ee06e) Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): IP6 addrconf timed out or failed. Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) scheduled... Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) started... Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): device state change: 7 -> 9 (reason 5) Nov 28 18:13:23 santiago NetworkManager[1529]: <warn> Activation (eth0) failed. Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) complete. Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): device state change: 9 -> 3 (reason 0) Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): deactivating device (reason: 0). Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): canceled DHCP transaction, DHCP client pid 4474 santiago:~ # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:d8:e8:1e brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fed8:e81e/64 scope link valid_lft forever preferred_lft forever santiago:~ # rpm -qf /etc/redhat-release redhat-release-workstation-6Workstation-6.4.0.4.el6.i686 santiago:~ # tcpdump -vv on the host doesn't recognize any traffic when the guest NetworkManager asks for an IP address. Version-Release number of selected component (if applicable): NetworkManager-0.9.9.0-25.git20131108.el7.x86_64 libvirt-1.1.1-13.el7.x86_64 How reproducible: Unfortunately 100% ... I haven't managed to have network on either of my virtual machines (other runs RHEL-7) for two days (and I need them for work)
Created attachment 830331 [details] output of journalctl (as root)
(In reply to Matěj Cepl from comment #1) > Created attachment 830331 [details] > output of journalctl (as root) on host computer, that is.
No surprise you don't have a network in your guest since virbr0 has no network interfaces attached (virbr0-nic is just to make sure virbr0 has a stable MAC address). Could you attach the domain XML you used? An the log from journalctl is not helpful at all (libvirtd[30331]: [74B blob data]). Please, tell journalctl to show all messages even though this brain-dead tool thinks they contain unprintable characters. Also could you try to get debug logs from libvirtd while you start the domain? See http://libvirt.org/logging.html for instructions.
Created attachment 830427 [details] virsh dumpxml santiago
Created attachment 830428 [details] output of journalctl --all --full --boot
That's a bit better now. Something is wrong with your system as neither user nor group 107 (qemu) do not exist. I guess reinstalling libvirt-daemon package could solve it. In any case, what is the output of "getent passwd qemu" and "getent group qemu" commands on the host? However, I'm not sure this is is related to the network issues in any way. Thus, I'll repeat my request for debug logs, could you try to get debug logs from libvirtd while you start the domain? See http://libvirt.org/logging.html for instructions.
Created attachment 830647 [details] /var/log/libvirt/libvirtd.log (In reply to Jiri Denemark from comment #7) > That's a bit better now. Something is wrong with your system as neither user > nor group 107 (qemu) do not exist. I guess reinstalling libvirt-daemon > package could solve it. In any case, what is the output of "getent passwd > qemu" and "getent group qemu" commands on the host? However, I'm not sure > this is is related to the network issues in any way. matej@wycliff: bubuntu$ getent passwd qemu qemu:x:107:107:qemu user:/:/sbin/nologin matej@wycliff: bubuntu$ getent group qemu qemu:x:107: matej@wycliff: bubuntu$ > Thus, I'll repeat my request for debug logs, could you try to get debug logs > from libvirtd while you start the domain? See > http://libvirt.org/logging.html for instructions.
Created attachment 830657 [details] /var/log/libvirt/libvirtd.log
Created attachment 830680 [details] ip a on host computer
Created attachment 830681 [details] `brctl show` on the host computer
Created attachment 830682 [details] `virsh net-dumpxml default` on the host computer
Created attachment 830686 [details] `virsh dumpxml santiago` (RHEL-6 machine where network does NOT work)
Created attachment 830687 [details] `virsh dumpxml jenkinsEL7` (RHEL-7 machine where network DOES work)
Created attachment 830688 [details] `virsh dumpxml debian` (network DOES work)
Created attachment 830689 [details] /var/log/libvirt/libvirtd.log
OK, so the virbr0 bridge had no interfaces attached most likely because the default libvirt network was restarted while the domain were running. When starting from a clean state (comment 10 and on), all virtual interfaces are correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests work fine, I think it's a guest issue.
(In reply to Jiri Denemark from comment #17) > OK, so the virbr0 bridge had no interfaces attached most likely because the > default libvirt network was restarted while the domain were running. Why should that be a problem? I would think the guest has a defined set of interfaces and the networking configuration on the host side can be pretty much dynamic. > When > starting from a clean state (comment 10 and on), all virtual interfaces are > correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests work > fine, I think it's a guest issue. I can't imagine how an issue of a guest operating system could result in host operating system not adding the host endpoints of the virtual devices into bridges.
(In reply to Pavel Šimerda from comment #18) > > OK, so the virbr0 bridge had no interfaces attached most likely because the > > default libvirt network was restarted while the domain were running. > Why should that be a problem? I would think the guest has a defined set of > interfaces and the networking configuration on the host side can be pretty > much dynamic. Libvirt adds virtual nics to virbr0 only when the domain starts. When you do "virsh net-destroy default" followed by "virsh net-start default" all running domains will lose network connectivity because their virtual nics will be removed from virbr0. > > When starting from a clean state (comment 10 and on), all virtual interfaces > > are correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests > > work fine, I think it's a guest issue. > I can't imagine how an issue of a guest operating system could result in > host operating system not adding the host endpoints of the virtual devices > into bridges. That was the original issue. When all domains were killed, libvirtd and default network were restarted, all domains started since then had their virtual nics attached to virbr0.
Anyway, I don't think there is a real bug in libvirt so I'm closing this bug. Feel free to reopen if disagree and have any data that prove there is a bug.
(In reply to Jiri Denemark from comment #20) > Anyway, I don't think there is a real bug in libvirt so I'm closing this > bug. Feel free to reopen if disagree and have any data that prove there is a > bug. Awesome! So with our very enterprise system I have to periodically destroy and rebuild my system because otherwise I loose networking to the virtual machines. And that's called NOTABUG. Lesson learned.
Come on! If you tell us how you can get to the state with no network, we'll be happy to look at that and try to fix anything that is not working as expected. The only known way of getting virtual nics unattached from virbr0 is through restarting libvirt's network, which is a known limitation of how networking works in libvirt and may eventually be improved in the future. But since there should be no reason to restart the network, we don't see this as a bug. If you find another way to get to this state, that could indeed be a bug but there are no evidence of it in this report. And I'm sorry but when you start three domains with mostly identical XML definitions and all three get there virtual nics created and attached to virbr0 and networking does not work in one of them while it works just fine in the others, it's just logical you should start looking inside the domain rather than blaming libvirt. And if you have no time to investigate why the networking doesn't work inside the domain and prefer to just kill it and reinstall from scratch, that's you choice and we can't really do much about it. There's no sense in having a bug open with no useful data in it. I already said it should be reopened if the issue appears again or you have more data to show us.
I'm not going to comment Matěj's specific problem, as I don't have enough knowledge about that one (though I usually slightly prefer NEEDINFO over CLOSED/NOTABUG, especially when the issue seems to cause the reporter serious problems during his work and he appears to be ready to supply the necessary information when requested so). But what makes me curious is why are you treating the fact that starting (or restarting) one of libvirt's network configurations as less than a bug. And if it's a known limitation, I'm curious where and how it's documented and what is the official way to get around the issue. Also I'm quite surprised about the claim that it's never needed to restart a network configuration. In that case I would expect that such a thing (claimed not to be needed) would either not be available at all or would issue a warning so that the administrator knows he's doing something unsupported. I'll rather forget about the “guest operating system issue” thing.
(In reply to Jiri Denemark from comment #22) > Come on! If you tell us how you can get to the state with no network, Two points: a) It is not my job to debug bugs, but it is actually your job to tell me what should I do to provide you with more information. I can understand that you can fail in this task (happened to me many times with Xorg), but it is YOUR job not mine. b) It happened again. Just after restarting the virtual machine (and perhaps some suspend/resume cycles in my host, which is actually a notebook), I don't have network in virtual machines. Now, I have tried to do 'virsh net-destroy default && virsh net-start default' dance with running virtual machines but it didn't help. I had to shutdown all virtual machines, then do the dance, restart VMs, and only then I have network there. This is completely unacceptable user experience. Reopening. And yes, I don't know if the problem is in libvirt or wherever, but whole chain (VM is RHEL-6) is the Red Hat's one so somebody should find our what's going on. And I don't care who it is, but a part of your job is to find him out.
> b) It happened again. Just after restarting the virtual machine (and perhaps > some suspend/resume cycles in my host, which is actually a notebook), I > don't have network in virtual machines. Perfect, we're finally getting somewhere. What do you mean by "after restarting the virtual machine"? How did you restart it, from inside the VM or using virt-manager/virsh? > Now, I have tried to do 'virsh net-destroy default && virsh net-start default' > dance with running virtual machines but it didn't help. As I already said, this is the way to break networking in all running virtual machines and should be avoided. In other words, it's expected not to help at all. BTW, it is tracked by bug 1014554 and bug 1022042.
(In reply to Jiri Denemark from comment #25) > BTW, it is tracked by bug 1014554 and bug 1022042. Thanks for the links.
(In reply to Jiri Denemark from comment #25) > Perfect, we're finally getting somewhere. What do you mean by "after > restarting the virtual machine"? How did you restart it, from inside the VM > or using virt-manager/virsh? always from inside ... usually "sudo pwoeroff" > As I already said, this is the way to break networking in all running > virtual machines and should be avoided. In other words, it's expected not to > help at all. BTW, it is tracked by bug 1014554 and bug 1022042. OK, good to know. I misunderstood you, so I thought just opposite ... that it could be a way how to fix the networking. Somebody else also suggested couple virsh dettach/attach-interface ... could it be more helpful?
So the issue was identified as a bug in network manager which messes up bridges created by libvirt. See bug 1038158. I'm closing this one.