Bug 1259070 - cannot launch any of my Boxes created VM, qemu-bridge-helper failing
Summary: cannot launch any of my Boxes created VM, qemu-bridge-helper failing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libvirt
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-01 22:02 UTC by Jakub Steiner
Modified: 2015-11-05 00:24 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-05 00:24:33 UTC
Type: Bug


Attachments (Terms of Use)

Description Jakub Steiner 2015-09-01 22:02:11 UTC
Description of problem:
After an update I cannot launch any VM created in Boxes, as described in upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=754095

As I cannot launch it using virsh either, I assume a problem with vibvirt:

$ virsh start fedora20
error: Failed to start domain fedora20
error: failed to retrieve file descriptor for interface: Transport endpoint is not connected


Version-Release number of selected component (if applicable):
Version 1.2.13.1-2 from F23

How reproducible:
Every time.

Comment 1 Cole Robinson 2015-09-03 21:40:22 UTC
> $ virsh start fedora20
> error: Failed to start domain fedora20
> error: failed to retrieve file descriptor for interface: Transport endpoint
> is not connected
> 

There's this commit, but makes it sound like it's just an error reporting failure sorta:

commit 151ba022939dad1e562c4156cb62e7a3ade6a7f5
Author: Guido Günther <agx@sigxcpu.org>
Date:   Thu Aug 13 14:19:50 2015 +0200

    Check if qemu-bridge-helper exists and is executable
    
    Otherwise the error is just
    
        error: Failed to create domain from test1.xml
        error: failed to retrieve file descriptor for interface: Transport endpoint is not connected
    
    since we don't get a sensible error after the fork.


jakub, is qemu-common installed? is does 'sudo virsh net-list --all' show the default network as running?


> Version-Release number of selected component (if applicable):
> Version 1.2.13.1-2 from F23

F23 should be on something much newer, versions like 1.2.18-1.fc23.x86_64

Comment 2 Zeeshan Ali 2015-09-04 19:58:20 UTC
Cole, I was able to reproduce this issue too (and so does at least one more person) and it was on F22 but it was rare. I seriously doubt the issue is default network since everything worked out just fine for me on second attempt. It seemed more like libvirt was stuck on something on autolaunch.

Comment 3 Cole Robinson 2015-09-04 21:02:28 UTC
(In reply to Zeeshan Ali from comment #2)
> Cole, I was able to reproduce this issue too (and so does at least one more
> person) and it was on F22 but it was rare. I seriously doubt the issue is
> default network since everything worked out just fine for me on second
> attempt. It seemed more like libvirt was stuck on something on autolaunch.

Okay, it wasn't mentioned in the original report that this is a one time error. Jakub, is that actually what you are seeing? your comment made it sound like this reproduces 100% of the time

Comment 4 Jakub Steiner 2015-09-08 12:59:37 UTC
I get this every time.

Comment 5 Zeeshan Ali 2015-09-09 12:40:48 UTC
(In reply to Cole Robinson from comment #3)
> (In reply to Zeeshan Ali from comment #2)
> > Cole, I was able to reproduce this issue too (and so does at least one more
> > person) and it was on F22 but it was rare. I seriously doubt the issue is
> > default network since everything worked out just fine for me on second
> > attempt. It seemed more like libvirt was stuck on something on autolaunch.
> 
> Okay, it wasn't mentioned in the original report that this is a one time
> error. Jakub, is that actually what you are seeing? your comment made it
> sound like this reproduces 100% of the time

It's 100% for jimmac but not for me.

Comment 6 Zeeshan Ali 2015-09-09 12:41:53 UTC
(In reply to Jakub Steiner from comment #4)
> I get this every time.

Can you grab the info Cole asked for in comment#1 anyway?

Comment 7 Cole Robinson 2015-09-10 16:37:16 UTC
I'm guessing jimmac's issue is that virbr0 isn't available/qemu:///system default network isn't running, but it could be something else

either way the libvirt error reporting here stinks since it doesn't pass up the stderr from qemu-bridge-helper which will likely tell us the root issue. I've send a patch to libvir-list to fix that:

https://www.redhat.com/archives/libvir-list/2015-September/msg00411.html

Comment 8 Jakub Steiner 2015-09-21 12:46:28 UTC
Sorry for missing the question. I have in the meantime updated to f23, but luckily the issue remains:

$ rpm -qa | grep qemu-common
qemu-common-2.4.0-2.fc23.x86_64
~
$ sudo virsh net-list --all
[sudo] password for jimmac: 
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              inactive   yes           yes

Comment 9 Cole Robinson 2015-09-21 21:41:03 UTC
The default network being inactive is the problem. jimmac please try 'sudo virsh net-start default', maybe it's hitting some error which is why it's not autostarting

Comment 10 Michael Catanzaro 2015-10-02 18:55:02 UTC
I can reproduce this 100% for the first time I start Boxes after I start my computer; after I close and reopen Boxes, I can see my VM properly.

qemu-common is installed.

'sudo virsh net-list --all' shows the default network as inactive and 'virsh start' always fails with the error Jakub has above. 'sudo virsh net-start default' works fine and causes 'virsh start' to work again.

Comment 11 Laine Stump 2015-10-02 19:08:55 UTC
(In reply to Michael Catanzaro from comment #10)

> 'sudo virsh net-list --all' shows the default network as inactive and 'virsh
> start' always fails with the error Jakub has above. 'sudo virsh net-start
> default' works fine and causes 'virsh start' to work again.

Does the output of virsh net-list --all show Autostart "yes" for the default network? If not, run "virsh net-autostart default" so that it will be automatically started when the host system is starting. If it is already has autostart set, then we need to troubleshoot what is the failure as the system is starting up; looking for any error message logged by libvirt during system startup would be a good place to start.

Comment 12 Michael Catanzaro 2015-10-03 15:08:59 UTC
(In reply to Laine Stump from comment #11)
> Does the output of virsh net-list --all show Autostart "yes" for the default
> network?

Yes!

> If it is already has
> autostart set, then we need to troubleshoot what is the failure as the
> system is starting up; looking for any error message logged by libvirt
> during system startup would be a good place to start.

Erm, it's also active, and Boxes is working fine on the first run today. Well, I guess I can't reproduce 100% then... only on Thursdays and Fridays....

Looking at my journal from yesterday, the only thing I see is:

Oct 02 09:47:36 victory-road unknown[6836]: Failed to acquire pid file '/run/user/1000/libvirt/libvirtd.pid': Resource temporarily unavailable

But I also see that today, when it worked. Probably it should not be logged as important (bold red).

Comment 13 Jakub Steiner 2015-10-07 18:45:59 UTC
I am getting:

$ sudo virsh net-start default
error: Failed to start network default
error: internal error: Network is already in use by interface eno1

Comment 14 Laine Stump 2015-10-07 19:10:27 UTC
netstat -nr; ifconfig eno1


Is this "host" perhaps a virtual machine running as a guest inside a physical host that also has a default network?

The solution to you problem is to edit the default network config and change it to a different subnet, e.g. change all occurences of "122" to "123". Use the following command:

  sudo virsh net-edit default

once you have saved the file and exited, try again to start the network:

  sudo virsh net-start default

Comment 15 Jakub Steiner 2015-10-08 12:46:07 UTC
eno1 is my physical ethernet device, on my home network 192.168.3.0/24:

$ netstat -nr; ifconfig eno1
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.3.2     255.255.255.255 UGH       0 0          0 eno1
0.0.0.0         192.168.3.250   0.0.0.0         UG        0 0          0 eno1
10.0.0.0        0.0.0.0         255.128.0.0     U         0 0          0 tun0
10.0.0.0        0.0.0.0         255.0.0.0       U         0 0          0 tun0
10.36.7.168     0.0.0.0         255.255.255.255 UH        0 0          0 tun0
192.168.3.0     0.0.0.0         255.255.255.0   U         0 0          0 eno1
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 eno1
209.132.186.252 192.168.3.250   255.255.255.255 UGH       0 0          0 eno1
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.3.12  netmask 255.255.255.0  broadcast 192.168.3.255
        inet6 fe80::b6b5:2fff:fee0:7f90  prefixlen 64  scopeid 0x20<link>
        ether b4:b5:2f:e0:7f:90  txqueuelen 1000  (Ethernet)
        RX packets 1202021  bytes 1627575290 (1.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 731498  bytes 76167347 (72.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20  memory 0xde300000-de320000  

When I changed the network from 192.168.122.0 to 192.168.123, it started correctly. Boxes is now able to start my VMs again.

Comment 16 Laine Stump 2015-10-08 14:57:18 UTC
(In reply to Jakub Steiner from comment #15)
> eno1 is my physical ethernet device, on my home network 192.168.3.0/24:
> 
> $ netstat -nr; ifconfig eno1
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface

> 192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 eno1

Why do you have a route for 192.168.122.0/24 pointing to eno1? That is the reason libvirt wouldn't start its network.

Do you have a valid reason for that, or was it mistakenly added (note that libvirt wouldn't add such a route).

Look in /etc/sysconfig/network-scripts/route-* for this route; that's the most likely place for it to be set.

> When I changed the network from 192.168.122.0 to 192.168.123, it started
> correctly. Boxes is now able to start my VMs again.

Okay, then libvirt is doing the right thing with the default network config that it has. The question then is why it was set with 192.168.122.0/24 (since libvirt will check for this when it is installed). How did you install Fedora on this system, and was libvirt a part of the initial install, or did you install it later? And when was the route for 192.168.122.0/24 via eno1 added?

For a history of this, see Bug 811967 and Bug 1146232.

Comment 17 Cole Robinson 2015-11-05 00:24:33 UTC
This bug is overloaded and going into a couple different directions.

Since the patch I referenced in comment #7 is in f23, I'm closing this bug.

If people hit the occasional VM startup issue in the future, please file a new bug and provide the (hopefully better) libvirt error message.

If we wanna track the default network startup issue, let's make that a separate bug too


Note You need to log in before you can comment on or make changes to this bug.