Bug 1397440

Summary: port allocator allocates the same port for multiple guests (race condition)
Product: Red Hat Enterprise Linux 7 Reporter: Peter Krempa <pkrempa>
Component: libvirtAssignee: Pavel Hrdina <phrdina>
Status: CLOSED ERRATA QA Contact: Fangge Jin <fjin>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: agedosier, alexl, alon, berrange, cfergeau, clalancette, congzhangzh, dhill, dyuan, extras-qa, fjin, hdegoede, itamar, jforbes, laine, libvirt-maint, marcandre.lureau, mark, pkrempa, rbalakri, sandmann, uril, veillard, victortoso, virt-maint, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.1.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1390413 Environment:
Last Closed: 2017-08-01 17:19:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1390413    
Bug Blocks: 1401400    

Description Peter Krempa 2016-11-22 14:24:16 UTC
+++ This bug was initially created as a clone of Bug #1390413 +++

Description of problem:
Once in a while, libvirtd will fail to start a VM and retrying soon after will in the end start it properly:
Oct 31 22:24:12 zappa.orion libvirtd[31784]: Unable to read from monitor: Connection reset by peer
Oct 31 22:24:12 zappa.orion libvirtd[31784]: internal error: qemu unexpectedly closed the monitor: 
                                             (process:19455): Spice-WARNING **: reds.c:2651:reds_init_socket: reds_init_socket: binding socket to 0.0.0.0:5903 failed
                                             
                                             (process:19455): Spice-WARNING **: reds.c:3650:reds_set_video_codecs: spice: unsupported video encoder gstreamer
                                             
                                             (process:19455): Spice-WARNING **: reds.c:3650:reds_set_video_codecs: spice: unsupported video encoder gstreamer
                                             
                                             (process:19455): Spice-WARNING **: reds.c:3650:reds_set_video_codecs: spice: unsupported video encoder gstreamer
                                             2016-11-01T02:24:12.646437Z qemu-system-x86_64: failed to initialize spice server


Version-Release number of selected component (if applicable):


How reproducible:
Random

Steps to Reproduce:
1. Create 10 VMs
2. Try to start them sequentially
3.

Actual results:
Sometimes it fails

Expected results:
Always succeed

Additional info:

--- Additional comment from Peter Krempa on 2016-11-01 05:35:09 CET ---

Could you please post the tail of the VM log file including the command line for qemu. The log file is located in /var/log/libvirt/qemu/$VMNAME.log.

Looks like the port allocator code in libvirt may be experiencing a race condition, which would manifest itself by qemu not being able to listen on the selected socket as other process would grab it. This should be visible in the log file.

Additionally log files of other VMs may show that a different VM attempted to use the same socket.

--- Additional comment from Peter Krempa on 2016-11-22 15:22:18 CET ---

From the logs provided:

I picked a time when multiple VMs are started and listed the port for SPICE and whether startup was successful:

control-0-rhosp8.log-20161006:2016-09-28 16:24:59.620
SPICE port: 5907
startup success

ceph-1-rhosp8.log-20161007:2016-09-28 16:24:57.822
SPICE port: 5901
startup failed

ceph-2-rhosp8.log-20161007:2016-09-28 16:24:59.155
SPICE port: 5902
startup success

control-1-rhosp8.log-20161007:2016-09-28 16:25:01.175
SPICE port: 5903
startup success

compute-0-rhosp8.log-20161007:2016-09-28 16:24:59.305
SPICE port: 5902
startup failed

control-2-rhosp8.log-20161007:2016-09-28 16:25:08.553
SPICE port: 5904
startup success

ceph-0-rhosp8.log-20161006:2016-09-28 16:24:57.443
SPICE port: 5901
startup success

From the above it's clear that the port allocator code allocated the same port for multiple VMs, which made some of them fail once they actually tried to bind to the port.

Note that the following message "(process:19455): Spice-WARNING **: reds.c:3650:reds_set_video_codecs: spice: unsupported video encoder gstreamer" is not relevant to this bug.

Comment 1 Fangge Jin 2016-11-28 02:08:45 UTC
This bug CAN NOT be reproduced with libvirt-2.0.0-10.el7_3.2.x86_64.

This bug CAN be reproduced with libvirt-2.4.0-1.fc26.x86_64:

# for i in {1..20}; do virsh create guest$i.xml & date; done
error: Domain guest2 created from guest2.xml
Failed to create domain from guest3.xml
error: 
internal error: process exited while connecting to monitor: 
(process:11025): Spice-WARNING **: reds.c:2562:reds_init_socket: reds_init_socket: binding socket to 0.0.0.0:5900 failed
2016-11-25T04:02:35.976936Z qemu-system-x86_64: failed to initialize spice server

error: Failed to create domain from guest1.xml
error: internal error: process exited while connecting to monitor: 
(process:11038): Spice-WARNING **: reds.c:2562:reds_init_socket: reds_init_socket: binding socket to 0.0.0.0:5900 failed
2016-11-25T04:02:36.072754Z qemu-system-x86_64: failed to initialize spice server

Comment 2 Pavel Hrdina 2017-02-24 09:12:01 UTC
Upstream commit:

commit c23b7b81db58c61b0adc98095dca7bfbad74fa79
Author: Pavel Hrdina <phrdina>
Date:   Fri Feb 24 02:28:40 2017 +0100

    qemu_process: spice: don't release used port

Comment 4 Fangge Jin 2017-03-10 09:10:42 UTC
I can reproduce this bug with build libvirt-2.5.0-1.el7.x86_64.

Verify PASS with build libvirt-3.1.0-2.el7.x86_64

Steps:
1.Prepare 20 guest xml files with spice autoport setting:
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
    </graphics>

2.Start 20 guests concurrently:
# for i in {1..20}; do virsh create guest$i.xml & date; done

3.Check active guests, all the 20 guests are running:
# virsh list
 Id    Name                           State
----------------------------------------------------
 8     guest1                         running
 9     guest3                         running
 10    guest4                         running
 11    guest2                         running
 12    guest5                         running
 13    guest9                         running
 14    guest10                        running
 15    guest12                        running
 16    guest13                        running
 17    guest16                        running
 18    guest6                         running
 19    guest19                        running
 20    guest20                        running
 21    guest18                        running
 22    guest7                         running
 23    guest8                         running
 24    guest11                        running
 25    guest15                        running
 26    guest17                        running
 27    guest14                        running

4.Check the display URI, port 5901~5920 are used:
# for i in {1..20}; do virsh domdisplay guest$i ; done
spice://127.0.0.1:5903
spice://127.0.0.1:5901
spice://127.0.0.1:5902
spice://127.0.0.1:5904
spice://127.0.0.1:5910
spice://127.0.0.1:5913
spice://127.0.0.1:5915
spice://127.0.0.1:5905
spice://127.0.0.1:5906
spice://127.0.0.1:5916
spice://127.0.0.1:5907
spice://127.0.0.1:5908
spice://127.0.0.1:5919
spice://127.0.0.1:5917
spice://127.0.0.1:5909
spice://127.0.0.1:5918
spice://127.0.0.1:5914
spice://127.0.0.1:5912
spice://127.0.0.1:5911
spice://127.0.0.1:5920

Comment 5 errata-xmlrpc 2017-08-01 17:19:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 6 errata-xmlrpc 2017-08-01 23:59:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846