I have two VMs configured in virt-manager on my F26 box with a spice server. I set both up identically: Type "Spice server", Address "Hypervisor default", Port "Auto", TLS port "Auto". But if I start one then try to start the other, I get an error: Error starting domain: internal error: qemu unexpectedly closed the monitor: (process:7985): Spice-WARNING **: reds.c:2577:reds_init_socket: listen: Address already in use 2017-03-15T23:57:46.288270Z qemu-system-x86_64: failed to initialize spice server Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 88, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 124, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn ret = fn(self, *args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1404, in startup self._backend.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1035, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: internal error: qemu unexpectedly closed the monitor: (process:7985): Spice-WARNING **: reds.c:2577:reds_init_socket: listen: Address already in use 2017-03-15T23:57:46.288270Z qemu-system-x86_64: failed to initialize spice server It seems like it tries to use the same port for the SPICE server for both. If I set one to 'manual' and give it port 5901 or something, I can start both successfully. [root@adam openqa (master %)]# rpm -q virt-manager libvirt virt-manager-1.4.0-6.fc26.noarch libvirt-3.0.0-2.fc26.x86_64
I just tried with libvirt-daemon-3.1.0-1.fc26.x86_64 and couldn't reproduce, both VMs received unique ports. Can you try that version, restart libvirtd, and test again? If you still hit the issue, please provide sudo virsh dumpxml $vmname for each VM
Yeah, after a reboot with 3.1.0 it seems OK. Will re-open if it shows up again.
Confirmed in libvirt-daemon-3.2.0-1.fc26.x86_64 Deleting all of the serial ports seemed to help, but it was kinda screwy. I apologize for my lack of proper debugging, but had to get vm's back up.
Not only F26, this problem even still occurs in 3.5.0-3.fc27. I don't know if it was ever fixed in 3.1.0, but if it was, then it's a regression. It's easy to reproduce: Just start several VMs, all of which have SPICE configured in their XML as below. Even with just two or three VMs running, one will ultimately fail to start. <graphics type='spice' autoport='yes'> <listen type='address'/> <image compression='off'/> </graphics> The error shown in virt-manager is: Error starting domain: internal error: process exited while connecting to monitor: 2017-07-29T16:25:52.410914Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/8 (label charserial0) (process:11973): Spice-WARNING **: reds.c:2577:reds_init_socket: listen: Address already in use 2017-07-29T16:25:52.412280Z qemu-system-x86_64: failed to initialize spice server Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 88, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 124, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 83, in newfn ret = fn(self, *args, **kwargs) File "/usr/share/virt-manager/virtManager/domain.py", line 1479, in startup self._backend.create() File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1039, in create if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) libvirtError: internal error: process exited while connecting to monitor: 2017-07-29T16:25:52.410914Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/8 (label charserial0) (process:11973): Spice-WARNING **: reds.c:2577:reds_init_socket: listen: Address already in use 2017-07-29T16:25:52.412280Z qemu-system-x86_64: failed to initialize spice server The logged command line contained: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name guest=rhel6.9,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-12-rhel6.9/master-key.aes -machine pc-i440fx-2.9,accel=kv m,usb=off,vmport=off,dump-guest-core=off -cpu Broadwell-noTSX -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 16d8840e-5e03-4cc2-a381-8f2fcf76d4b8 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-12-rhel6.9/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/rhel6.9.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/libvirt/isos/rhel-server-6.9-x86_64-dvd.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d7:e2:32,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-12-rhel6.9/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on 2017-07-29T16:25:52.410914Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/8 (label charserial0)
This also occurs if you try to start up a single VM and something else, e.g. lightdm, is already listening on port 5900. Pretty clearly libvirt is now effectively hardwired to use port 5900 and only port 5900. The autoport search is broken. An inadequate workaround is to set the listen type on the VM to 'none'. virt-manager's viewer can still connect through libvirt, and it's also possible to manually launch virt-viewer -a. But cockpit can't connect to a VM which isn't listening. And virt-manager's new VM wizard can't create a new VM because it assumes it can create it listening on port 5900, so the creation process aborts. (Although you can check configure before install, and set the listen type to none before continuing) Yes, it's possible to manually configure each VM with a unique port, but what an unnecessary nuisance. This failure occurs in F25 using 2.2.1-2.fc25.x86_64. I'd have sworn it worked on F25 before with an earlier version, but 2.2.0-1.fc25.x86_64 has the same problem.. The last release I can say for sure this worked was F23 using 1.2.18.4-1.fc23.x86_64. I never did use F24, so can't comment on that. I would gladly try 3.1.0, but that's no longer available for downgrade in the repos.
Michael hampton, do you have something else on the host occupying one of the 590X ports? Try 'nmap localhost' Dennis, yes this is a known issue. Libvirt doesn't actually track that the port isn't in use, just what ports have been assigned to other VMs. If you are running another service on your host that uses port 5900, either change the service to use a different port, or edit remote_display_port_min in /etc/libvirt/qemu.conf and restart libvirtd
Cole, no, nothing else is listening on any of the 59xx ports.
Michael, so if you start from a state with no VMs running, and begin starting several VMs in sequence, the issue is 100% reproducible? If so, please reproduce and then post 'sudo virsh dumpxml $vmname' for each VM in the order you tried to start them. This will give the runtime VM state for the successfully started VMs (which will show the port they were allocated) and the offline config for the VM that failed.
So I just hit this myself. Not sure if it's exactly what other people are hitting. But this reliably reproduces: 1) Stop all VMs 2) Start a VM with spice autoport 3) systemctl restart libvirtd 4) start a second VM with spice autoport -> failure Seems we don't regenerate a list of occupied ports on daemon restart. I tested back to f24 libvirt version and it was present there too, so I'm not sure if this ever worked. Not positive this is what you are seeing Michael... the info in Comment #8 will confirm. Note, starting a second VM between step 2 and 3 always seems to work for me, it's the libvirtd restart that is the key
Created attachment 1311200 [details] libvirt XML config for running VM (comment #8)
Created attachment 1311201 [details] libvirt XML config for failed VM (comment #8)
Right, it appears that restarting libvirtd _while VMs are running_ causes the problem. If the VMs that were running when libvirtd was restarted are subsequently stopped, the problem then goes away again and any number of VMs can start. So, here, I start VM "rhel6.9", restart libvirtd while it's still running, then try to start VM "wordpress.example.com", which fails with the given error. The XML configs are now attached.
(In reply to Cole Robinson from comment #6) > Michael hampton, do you have something else on the host occupying one of the > 590X ports? Try 'nmap localhost' > > Dennis, yes this is a known issue. Libvirt doesn't actually track that the > port isn't in use, just what ports have been assigned to other VMs. If you > are running another service on your host that uses port 5900, either change > the service to use a different port, or edit remote_display_port_min in > /etc/libvirt/qemu.conf and restart libvirtd Cole, to some extent what you're saying here does describe the observed behavior, but even what you say is expected behavior differs from historical behavior. By that I mean libvirtd behavior in f23 and earlier, where occupied ports were detected and skipped over regardless of who/what occupies the port. For libvirtd to be aware only of the ports it has assigned itself is not useful. There also seems to be a separate issue creating confusion. It turns out the Debian community have experienced similar problems, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863266. In particular, someone in comment #53 of that thread observes that kernel version matters: Working: 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) Broken: 4.11.0-1-amd64 #1 SMP Debian 4.11.6-1 (2017-06-19) So let me try to put that all together. I am running F26, kernel 4.11.11-300.fc26.x86_64, libvirt 3.2.1-4.fc26.x86_64, and virt-manager 1.4.1-2.fc26.noarch. With all that in place I can confirm the behavior you describe in comment #9 and Michael describes in comment #12. Normally I have lightdm configured to offer Xvnc service on localhost port 5900, and vino-server starts upon login on localhost port 5901. When I try to start a VM, it insists on trying to use port 5900 and fails. If I then reconfigure lightdm to not listen at all, starting a VM succeeds and it takes port 5900. Trying to start a second VM also succeeds and it takes port 5902. This seems consistent with Michael's report, with a couple of provisos: * libvirtd does indeed detect and skip busy ports other than its own * Detection works only if the busy port in some sense "belongs" to my login session lightdm's use of port 5900 had nothing to do with my login session, so libvirtd didn't see it as busy and failed. vino-server on 5901 did "belong" to me, so that was noticed and the port was skipped. Restarting libvirtd or not has nothing to do with this, except that for any running VMs the restart seems to break the association between the VM and my login session. The new instance of libvirtd won't see the running VM as "mine" and fails to discover the port is busy, hence starting a new VM fails. And btw, setting SElinux to permissive doesn't change anything. Very strange. And just to confuse matters more... I had a months-old VM kicking around with an early version of F25, and it still had kernel 4.9.9-200.fc25.x86_64 on it. So I rsynced the boot and module files for that kernel onto my bare metal F26 system, rebuilt its initramfs and booted it up. Result: with everything exactly the same as in my earlier description excepting _only_ the older kernel, libvirtd behaves according to its F23 and earlier historical behavior. That is, it detects and skips over used ports regardless of whether they're "mine" or not. And this is actually useful. I sure hope this helps somehow.
FWIW, my initial case didn't involve any daemon restarts, I don't think. But the bug seems to have morphed a bit since then.
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'.
Thanks Dennis, good detective work. Indeed your usecase of lightdm on port=5900 should be working with libvirt, I didn't realize libvirt would attempt to bind() to a port first to see if it's in use. That's the behavior that has changed across kernel versions that is the root of this issue. I'll see if I can narrow it down a bit between kernel versions and maybe find a commit that explains things
Created attachment 1314915 [details] Program to help reproduce the issue This bit of code helps demonstrate the issue. There's definitely something weird going on. The port check logic is adapted from libvirt's code, I can't speak to all the bits there. In a separate terminal, qemu-kvm -vnc 127.0.0.1:0 to grab port 5900. Then do this: $ gcc bind-collision.c && ./a.out bind: Address already in use AF_INET check failed. $ gcc -D CHECK_IPV6 bind-collision.c && ./a.out AF_INET6 success AF_INET success $ gcc bind-collision.c && ./a.out AF_INET success By default the script will just check to see if we can bind to ipv4 5900. When qemu is holding that port, this check rightfully fails. When we compile with -D CHECK_IPV6, this adds ipv6 checking first. This succeeds, and then all subsequent ipv4 checks also succeed, which seems wrong.
kernel 4.10.13 from f25 works. 4.11-rc1 reproduces the issue, and 4.12.5 from f26, so still seems relevant. And there are some commits in 4.11 that sound relevant ("inet: don't check for bind conflicts twice when searching for a port" and the commits around that) Moving to kernel.
Confirmed still an issue with 4.13.0-0.rc4.git4.1.fc27.x86_64
*** Bug 1487674 has been marked as a duplicate of this bug. ***
This bug affects Fedora 26 and it should be fixed there as well, changing version to '26'.
kernel maintainers, advice/assistance on escalating this? getting a decent number of reports from virt users
Sorry, I think this got lost between travel and Flock. I started a bisect but it's still churning. Next week is also Plumbers so if I can't finish before then I'll bother upstream and see if they have any idea.
Completed a bisect based on your standalone test case, found commit 319554f284dda9f2737d09df82ba3610bd8ddea3 Author: Josef Bacik <jbacik> Date: Thu Jan 19 17:47:46 2017 -0500 inet: don't use sk_v6_rcv_saddr directly When comparing two sockets we need to use inet6_rcv_saddr so we get a NULL sk_v6_rcv_saddr if the socket isn't AF_INET6, otherwise our comparison function can be wrong. Fixes: 637bc8b ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Josef Bacik <jbacik> Signed-off-by: David S. Miller <davem> Confirmed we no longer see the socket behavior when reverted. Can you test https://koji.fedoraproject.org/koji/taskinfo?taskID=21763613 when it finishes?
Urgh, some of the other arches failed but x86_64 did complete https://koji.fedoraproject.org/koji/taskinfo?taskID=21763614
(In reply to Laura Abbott from comment #25) > Urgh, some of the other arches failed but x86_64 did complete > https://koji.fedoraproject.org/koji/taskinfo?taskID=21763614 I confirmed that fixes the actual libvirt test case, thanks Laura!
https://koji.fedoraproject.org/koji/taskinfo?taskID=21829240 upstream got back quickly with an untested but simple patch, can you give this a spin when it finishes?
(In reply to Laura Abbott from comment #27) > https://koji.fedoraproject.org/koji/taskinfo?taskID=21829240 upstream got > back quickly with an untested but simple patch, can you give this a spin > when it finishes? Thanks, but that build doesn't fix it. Comment #17 and libvirt reproducer still trigger for me
Upstream gave another patch for testing https://koji.fedoraproject.org/koji/taskinfo?taskID=21846472
(In reply to Laura Abbott from comment #29) > Upstream gave another patch for testing > https://koji.fedoraproject.org/koji/taskinfo?taskID=21846472 Still reproduces the issue. I tried my own separate build too, I'll watch the upstream thread now and reply there
(In reply to Laura Abbott from comment #24) > Completed a bisect based on your standalone test case, found > > commit 319554f284dda9f2737d09df82ba3610bd8ddea3 > Author: Josef Bacik <jbacik> > Date: Thu Jan 19 17:47:46 2017 -0500 > > inet: don't use sk_v6_rcv_saddr directly > > Confirmed we no longer see the socket behavior when reverted. I can confirm that reverting this patch solves the libvirt issue at least on a test system. However, it breaks IPv6 networking (an sshd in default configuration does not receive incoming IPv6 connections any more, IPv4 is unaffected). Greetings Marc
Laura and Cole brought that up in linux-kernel and netdev, and Josef provided four patches that solve the issue for me. See the thread from http://www.spinics.net/lists/netdev/msg454644.html and onwards.
kernel-4.13.4-200.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-c0e81a1c7a
kernel-4.13.4-200.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-c0e81a1c7a
kernel-4.13.4-200.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Some more info for anyone else watching, the main upstream kernel fix seems to be: commit cbb2fb5c72f48d3029c144be0f0e61da1c7bccf7 Author: Josef Bacik <jbacik> Date: Fri Sep 22 20:20:06 2017 -0400 net: set tb->fast_sk_family Doesn't seem to be in any stable releases at the moment but it's in 4.14