Bug 1035873 - I don't have a network inside of the guest computer
Summary: I don't have a network inside of the guest computer
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Libvirt Maintainers
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-28 17:18 UTC by Matěj Cepl
Modified: 2013-12-19 06:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-04 14:38:20 UTC
Target Upstream Version:


Attachments (Terms of Use)
output of journalctl (as root) (199.83 KB, text/plain)
2013-11-28 17:20 UTC, Matěj Cepl
no flags Details
virsh dumpxml santiago (2.69 KB, text/plain)
2013-11-29 00:15 UTC, Matěj Cepl
no flags Details
output of journalctl --all --full --boot (7.52 MB, text/plain)
2013-11-29 00:18 UTC, Matěj Cepl
no flags Details
/var/log/libvirt/libvirtd.log (1.47 MB, application/x-gzip)
2013-11-29 13:16 UTC, Matěj Cepl
no flags Details
/var/log/libvirt/libvirtd.log (3.73 MB, application/x-bz2)
2013-11-29 13:37 UTC, Matěj Cepl
no flags Details
ip a on host computer (2.28 KB, text/plain)
2013-11-29 14:29 UTC, Matěj Cepl
no flags Details
`brctl show` on the host computer (127 bytes, text/plain)
2013-11-29 14:30 UTC, Matěj Cepl
no flags Details
`virsh net-dumpxml default` on the host computer (610 bytes, text/plain)
2013-11-29 14:31 UTC, Matěj Cepl
no flags Details
`virsh dumpxml santiago` (RHEL-6 machine where network does NOT work) (3.53 KB, text/plain)
2013-11-29 14:35 UTC, Matěj Cepl
no flags Details
`virsh dumpxml jenkinsEL7` (RHEL-7 machine where network DOES work) (4.51 KB, text/plain)
2013-11-29 14:36 UTC, Matěj Cepl
no flags Details
`virsh dumpxml debian` (network DOES work) (3.51 KB, text/plain)
2013-11-29 14:38 UTC, Matěj Cepl
no flags Details
/var/log/libvirt/libvirtd.log (4.77 MB, application/x-bzip2)
2013-11-29 14:43 UTC, Matěj Cepl
no flags Details

Description Matěj Cepl 2013-11-28 17:18:05 UTC
Description of problem:
When starting a new virtual machine (RHEL-6 on RHEL-7 host), it cannot connect to the libvirt virtual network. NetworkManager inside of the virtual machine just asks for the connection endlessly until it timeouts.

Information from the host:
==========================
matej@wycliff: ~$ brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.5254006988bd	yes		virbr0-nic
matej@wycliff: ~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f0:de:f1:99:b2:5a brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.8/24 scope global dynamic em1
       valid_lft 255121sec preferred_lft 255121sec
    inet6 fe80::f2de:f1ff:fe99:b25a/64 scope link 
       valid_lft forever preferred_lft forever
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 24:77:03:1e:b3:58 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.7/24 scope global dynamic wlp3s0
       valid_lft 255124sec preferred_lft 255124sec
    inet6 fe80::2677:3ff:fe1e:b358/64 scope link 
       valid_lft forever preferred_lft forever
9: sit0: <NOARP> mtu 1480 qdisc noop state DOWN 
    link/sit 0.0.0.0 brd 0.0.0.0
10: aiccu: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc pfifo_fast state UNKNOWN qlen 500
    link/none 
    inet6 2a01:8c00:ff00:1e8::2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::8c00:ff00:1e8:2/64 scope link 
       valid_lft forever preferred_lft forever
16: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:54:00:f7:cb:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fef7:cbee/64 scope link 
       valid_lft forever preferred_lft forever
18: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500
    link/ether fe:54:00:d8:e8:1e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fed8:e81e/64 scope link 
       valid_lft forever preferred_lft forever
19: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 52:54:00:69:88:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
20: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 500
    link/ether 52:54:00:69:88:bd brd ff:ff:ff:ff:ff:ff
21: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc pfifo_fast state UNKNOWN qlen 100
    link/none 
    inet 10.36.116.76/22 scope global tun0
       valid_lft forever preferred_lft forever
matej@wycliff: ~$ 

Information from guest


Nov 28 18:13:03 santiago NetworkManager[1529]: <info> (eth0): device state change: 5 -> 7 (reason 0)
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Beginning DHCPv4 transaction
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> dhclient started with pid 4474
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) DHCPv4 will time out in 45 seconds
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Beginning IP6 addrconf.
Nov 28 18:13:03 santiago avahi-daemon[1541]: Withdrawing address record for fe80::5054:ff:fed8:e81e on eth0.
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
Nov 28 18:13:03 santiago dhclient[4474]: Internet Systems Consortium DHCP Client 4.1.1-P1
Nov 28 18:13:03 santiago dhclient[4474]: Copyright 2004-2010 Internet Systems Consortium.
Nov 28 18:13:03 santiago dhclient[4474]: All rights reserved.
Nov 28 18:13:03 santiago dhclient[4474]: For info, please visit https://www.isc.org/software/dhcp/
Nov 28 18:13:03 santiago dhclient[4474]: 
Nov 28 18:13:03 santiago NetworkManager[1529]: <info> (eth0): DHCPv4 state changed nbi -> preinit
Nov 28 18:13:03 santiago dhclient[4474]: Listening on LPF/eth0/52:54:00:d8:e8:1e
Nov 28 18:13:03 santiago dhclient[4474]: Sending on   LPF/eth0/52:54:00:d8:e8:1e
Nov 28 18:13:03 santiago dhclient[4474]: Sending on   Socket/fallback
Nov 28 18:13:03 santiago dhclient[4474]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x22dd7fa7)
Nov 28 18:13:04 santiago avahi-daemon[1541]: Registering new address record for fe80::5054:ff:fed8:e81e on eth0.*.
Nov 28 18:13:10 santiago dhclient[4474]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x22dd7fa7)
Nov 28 18:13:19 santiago dhclient[4474]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6 (xid=0x255ee06e)
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): IP6 addrconf timed out or failed.
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) scheduled...
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) started...
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): device state change: 7 -> 9 (reason 5)
Nov 28 18:13:23 santiago NetworkManager[1529]: <warn> Activation (eth0) failed.
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> Activation (eth0) Stage 4 of 5 (IP6 Configure Timeout) complete.
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): device state change: 9 -> 3 (reason 0)
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): deactivating device (reason: 0).
Nov 28 18:13:23 santiago NetworkManager[1529]: <info> (eth0): canceled DHCP transaction, DHCP client pid 4474
santiago:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:d8:e8:1e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fed8:e81e/64 scope link 
       valid_lft forever preferred_lft forever
santiago:~ # rpm -qf /etc/redhat-release 
redhat-release-workstation-6Workstation-6.4.0.4.el6.i686
santiago:~ # 

tcpdump -vv on the host doesn't recognize any traffic when the guest NetworkManager asks for an IP address.

Version-Release number of selected component (if applicable):
NetworkManager-0.9.9.0-25.git20131108.el7.x86_64
libvirt-1.1.1-13.el7.x86_64

How reproducible:
Unfortunately 100% ... I haven't managed to have network on either of my virtual machines (other runs RHEL-7) for two days (and I need them for work)

Comment 1 Matěj Cepl 2013-11-28 17:20:29 UTC
Created attachment 830331 [details]
output of journalctl (as root)

Comment 2 Matěj Cepl 2013-11-28 17:21:11 UTC
(In reply to Matěj Cepl from comment #1)
> Created attachment 830331 [details]
> output of journalctl (as root)

on host computer, that is.

Comment 4 Jiri Denemark 2013-11-28 22:24:33 UTC
No surprise you don't have a network in your guest since virbr0 has no network interfaces attached (virbr0-nic is just to make sure virbr0 has a stable MAC address). Could you attach the domain XML you used? An the log from journalctl is not helpful at all (libvirtd[30331]: [74B blob data]). Please, tell journalctl to show all messages even though this brain-dead tool thinks they contain unprintable characters. Also could you try to get debug logs from libvirtd while you start the domain? See http://libvirt.org/logging.html for instructions.

Comment 5 Matěj Cepl 2013-11-29 00:15:23 UTC
Created attachment 830427 [details]
virsh dumpxml santiago

Comment 6 Matěj Cepl 2013-11-29 00:18:42 UTC
Created attachment 830428 [details]
output of journalctl --all --full --boot

Comment 7 Jiri Denemark 2013-11-29 08:29:35 UTC
That's a bit better now. Something is wrong with your system as neither user nor group 107 (qemu) do not exist. I guess reinstalling libvirt-daemon package could solve it. In any case, what is the output of "getent passwd qemu" and "getent group qemu" commands on the host? However, I'm not sure this is is related to the network issues in any way.

Thus, I'll repeat my request for debug logs, could you try to get debug logs from libvirtd while you start the domain? See http://libvirt.org/logging.html for instructions.

Comment 8 Matěj Cepl 2013-11-29 13:16:34 UTC
Created attachment 830647 [details]
/var/log/libvirt/libvirtd.log

(In reply to Jiri Denemark from comment #7)
> That's a bit better now. Something is wrong with your system as neither user
> nor group 107 (qemu) do not exist. I guess reinstalling libvirt-daemon
> package could solve it. In any case, what is the output of "getent passwd
> qemu" and "getent group qemu" commands on the host? However, I'm not sure
> this is is related to the network issues in any way.

matej@wycliff: bubuntu$ getent passwd qemu
qemu:x:107:107:qemu user:/:/sbin/nologin
matej@wycliff: bubuntu$ getent group qemu
qemu:x:107:
matej@wycliff: bubuntu$ 

> Thus, I'll repeat my request for debug logs, could you try to get debug logs
> from libvirtd while you start the domain? See
> http://libvirt.org/logging.html for instructions.

Comment 9 Matěj Cepl 2013-11-29 13:37:07 UTC
Created attachment 830657 [details]
/var/log/libvirt/libvirtd.log

Comment 10 Matěj Cepl 2013-11-29 14:29:49 UTC
Created attachment 830680 [details]
ip a on host computer

Comment 11 Matěj Cepl 2013-11-29 14:30:46 UTC
Created attachment 830681 [details]
`brctl show` on the host computer

Comment 12 Matěj Cepl 2013-11-29 14:31:37 UTC
Created attachment 830682 [details]
`virsh net-dumpxml default` on the host computer

Comment 13 Matěj Cepl 2013-11-29 14:35:43 UTC
Created attachment 830686 [details]
`virsh dumpxml santiago` (RHEL-6 machine where network does NOT work)

Comment 14 Matěj Cepl 2013-11-29 14:36:43 UTC
Created attachment 830687 [details]
`virsh dumpxml jenkinsEL7` (RHEL-7 machine where network DOES work)

Comment 15 Matěj Cepl 2013-11-29 14:38:01 UTC
Created attachment 830688 [details]
`virsh dumpxml debian` (network DOES work)

Comment 16 Matěj Cepl 2013-11-29 14:43:00 UTC
Created attachment 830689 [details]
/var/log/libvirt/libvirtd.log

Comment 17 Jiri Denemark 2013-11-29 14:55:43 UTC
OK, so the virbr0 bridge had no interfaces attached most likely because the default libvirt network was restarted while the domain were running. When starting from a clean state (comment 10 and on), all virtual interfaces are correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests work fine, I think it's a guest issue.

Comment 18 Pavel Šimerda (pavlix) 2013-11-29 15:22:00 UTC
(In reply to Jiri Denemark from comment #17)
> OK, so the virbr0 bridge had no interfaces attached most likely because the
> default libvirt network was restarted while the domain were running.

Why should that be a problem? I would think the guest has a defined set of interfaces and the networking configuration on the host side can be pretty much dynamic.

> When
> starting from a clean state (comment 10 and on), all virtual interfaces are
> correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests work
> fine, I think it's a guest issue.

I can't imagine how an issue of a guest operating system could result in host operating system not adding the host endpoints of the virtual devices into bridges.

Comment 19 Jiri Denemark 2013-12-02 09:34:32 UTC
(In reply to Pavel Šimerda from comment #18)
> > OK, so the virbr0 bridge had no interfaces attached most likely because the
> > default libvirt network was restarted while the domain were running.
> Why should that be a problem? I would think the guest has a defined set of
> interfaces and the networking configuration on the host side can be pretty
> much dynamic.

Libvirt adds virtual nics to virbr0 only when the domain starts. When you do "virsh net-destroy default" followed by "virsh net-start default" all running domains will lose network connectivity because their virtual nics will be removed from virbr0.

> > When starting from a clean state (comment 10 and on), all virtual interfaces
> > are correctly attached to virbr0 bridge. And since RHEL-7 and Debian guests
> > work fine, I think it's a guest issue.
> I can't imagine how an issue of a guest operating system could result in
> host operating system not adding the host endpoints of the virtual devices
> into bridges.

That was the original issue. When all domains were killed, libvirtd and default network were restarted, all domains started since then had their virtual nics attached to virbr0.

Comment 20 Jiri Denemark 2013-12-02 09:45:22 UTC
Anyway, I don't think there is a real bug in libvirt so I'm closing this bug. Feel free to reopen if disagree and have any data that prove there is a bug.

Comment 21 Matěj Cepl 2013-12-02 13:16:25 UTC
(In reply to Jiri Denemark from comment #20)
> Anyway, I don't think there is a real bug in libvirt so I'm closing this
> bug. Feel free to reopen if disagree and have any data that prove there is a
> bug.

Awesome! So with our very enterprise system I have to periodically destroy and rebuild my system because otherwise I loose networking to the virtual machines. And that's called NOTABUG. Lesson learned.

Comment 22 Jiri Denemark 2013-12-02 14:07:11 UTC
Come on! If you tell us how you can get to the state with no network, we'll be happy to look at that and try to fix anything that is not working as expected. The only known way of getting virtual nics unattached from virbr0 is through restarting libvirt's network, which is a known limitation of how networking works in libvirt and may eventually be improved in the future. But since there should be no reason to restart the network, we don't see this as a bug. If you find another way to get to this state, that could indeed be a bug but there are no evidence of it in this report.

And I'm sorry but when you start three domains with mostly identical XML definitions and all three get there virtual nics created and attached to virbr0 and networking does not work in one of them while it works just fine in the others, it's just logical you should start looking inside the domain rather than blaming libvirt. And if you have no time to investigate why the networking doesn't work inside the domain and prefer to just kill it and reinstall from scratch, that's you choice and we can't really do much about it.

There's no sense in having a bug open with no useful data in it. I already said it should be reopened if the issue appears again or you have more data to show us.

Comment 23 Pavel Šimerda (pavlix) 2013-12-02 15:05:40 UTC
I'm not going to comment Matěj's specific problem, as I don't have enough knowledge about that one (though I usually slightly prefer NEEDINFO over CLOSED/NOTABUG, especially when the issue seems to cause the reporter serious problems during his work and he appears to be ready to supply the necessary information when requested so).

But what makes me curious is why are you treating the fact that starting (or restarting) one of libvirt's network configurations as less than a bug. And if it's a known limitation, I'm curious where and how it's documented and what is the official way to get around the issue.

Also I'm quite surprised about the claim that it's never needed to restart a network configuration. In that case I would expect that such a thing (claimed not to be needed) would either not be available at all or would issue a warning so that the administrator knows he's doing something unsupported.

I'll rather forget about the “guest operating system issue” thing.

Comment 24 Matěj Cepl 2013-12-03 08:04:28 UTC
(In reply to Jiri Denemark from comment #22)
> Come on! If you tell us how you can get to the state with no network,

Two points:

a) It is not my job to debug bugs, but it is actually your job to tell me what should I do to provide you with more information. I can understand that you can fail in this task (happened to me many times with Xorg), but it is YOUR job not mine.
b) It happened again. Just after restarting the virtual machine (and perhaps some suspend/resume cycles in my host, which is actually a notebook), I don't have network in virtual machines. Now, I have tried to do 'virsh net-destroy default && virsh net-start default' dance with running virtual machines but it didn't help. I had to shutdown all virtual machines, then do the dance, restart VMs, and only then I have network there.

This is completely unacceptable user experience. Reopening. And yes, I don't know if the problem is in libvirt or wherever, but whole chain (VM is RHEL-6) is the Red Hat's one so somebody should find our what's going on. And I don't care who it is, but a part of your job is to find him out.

Comment 25 Jiri Denemark 2013-12-03 09:11:46 UTC
> b) It happened again. Just after restarting the virtual machine (and perhaps
> some suspend/resume cycles in my host, which is actually a notebook), I
> don't have network in virtual machines.

Perfect, we're finally getting somewhere. What do you mean by "after restarting the virtual machine"? How did you restart it, from inside the VM or using virt-manager/virsh?

> Now, I have tried to do 'virsh net-destroy default && virsh net-start default'
> dance with running virtual machines but it didn't help.

As I already said, this is the way to break networking in all running virtual machines and should be avoided. In other words, it's expected not to help at all. BTW, it is tracked by bug 1014554 and bug 1022042.

Comment 26 Pavel Šimerda (pavlix) 2013-12-03 10:04:31 UTC
(In reply to Jiri Denemark from comment #25)
> BTW, it is tracked by bug 1014554 and bug 1022042.

Thanks for the links.

Comment 27 Matěj Cepl 2013-12-03 14:14:18 UTC
(In reply to Jiri Denemark from comment #25)
> Perfect, we're finally getting somewhere. What do you mean by "after
> restarting the virtual machine"? How did you restart it, from inside the VM
> or using virt-manager/virsh?

always from inside ... usually "sudo pwoeroff"

> As I already said, this is the way to break networking in all running
> virtual machines and should be avoided. In other words, it's expected not to
> help at all. BTW, it is tracked by bug 1014554 and bug 1022042.

OK, good to know. I misunderstood you, so I thought just opposite ... that it could be a way how to fix the networking. Somebody else also suggested couple virsh dettach/attach-interface ... could it be more helpful?

Comment 28 Jiri Denemark 2013-12-04 14:38:20 UTC
So the issue was identified as a bug in network manager which messes up bridges created by libvirt. See bug 1038158. I'm closing this one.


Note You need to log in before you can comment on or make changes to this bug.