Bug 874330

Summary:	First autostarted guest has always id 1
Product:	Red Hat Enterprise Linux 6	Reporter:	Peter Krempa <pkrempa>
Component:	libvirt	Assignee:	Peter Krempa <pkrempa>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	acathrow, bili, dallan, dyasny, dyuan, eblake, mzhan, rwu, whuang, ydu
Target Milestone:	rc	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libvirt-0.10.2-8.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 07:26:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Krempa 2012-11-07 23:14:50 UTC

Description of problem:
When guests are selected to be autostarted on libvirtd start, the first one has always ID 1 regardless of existing guests. This leads to unfortunate situations like:

# virsh list
 Id    Name                           State
----------------------------------------------------
 1     guest1                         running
 1     guest2                         running


Version-Release number of selected component (if applicable):
Found in upstream version, but the code wasn't touched in ages so recent downstream versions are affected too.


How reproducible:
100%


Steps to Reproduce:
1. shutdown/destroy all guests
2. restart libvirtd
3. start a guest, it will get ID 1
4. mark a different guest as autostartable
5. restart libvirt
  
Actual results:
Two guests will share ID 1.


Expected results:
Guests will have different ID's


Additional info:
This bug only applies to the first guest started. After the first one is started the next one continues the numbering series from the highest ID of the guests that were running at libvirtd restart.

Comment 2 Peter Krempa 2012-11-08 11:23:21 UTC

This is caused by a race between the thread that autostarts machines on daemon startup and threads that re-connect to existing processes. The maximum of the IDs of guests that are still running on restart of libvirt has to be determined before forking separate threads. I'm working on a fix.

Comment 3 Peter Krempa 2012-11-08 13:13:25 UTC

Fix posted upstream: http://www.redhat.com/archives/libvir-list/2012-November/msg00403.html

Comment 4 Peter Krempa 2012-11-08 23:15:55 UTC

Fixed upstream:

commit 02cf57c0d0d2333dceadb7f84b08ec28a35ef540
Author: Peter Krempa <pkrempa>
Date:   Thu Nov 8 13:48:37 2012 +0100

    qemu: Fix domain ID numbering race condition
    
    When the libvirt daemon is restarted it tries to reconnect to running
    qemu domains. Since commit d38897a5d4b1880e1998394b2a37bba979bbdff1 the
    re-connection code runs in separate threads. In the original
    implementation the maximum of domain ID's (that is used as an
    initializer for numbering guests created next) while libvirt was
    reconnecting to the guest.
    
    With the threaded implementation this opens a possibility for race
    conditions with the thread that is autostarting guests. When there's a
    guest running with id 1 and the daemon is restarted. The autostart code
    is reached first and spawns the first guest that should be autostarted
    as id 1. This results into the following unwanted situation:
    
     # virsh list
       Id    Name                           State
      ----------------------------------------------------
       1     guest1                         running
       1     guest2                         running
    
    This patch extracts the detection code before the re-connection threads
    are started so that the maximum id of the guests being reconnected to is
    known.
    
    The only semantic change created by this is if the guest with greatest ID
    quits before we are able to reconnect it's ID is used anyway as the
    greatest one as without this patch the greatest ID of a process we could
    successfuly reconnect to would be used.

Comment 6 dyuan 2012-11-09 06:04:52 UTC

I can reproduce it with libvirt-0.10.2-7.el6.

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

# virsh list
 Id    Name                           State
----------------------------------------------------

# virsh start rhel63
Domain rhel63 started

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     rhel63                         running
 -     rhel62                         shut off

# virsh autostart rhel62
Domain rhel62 marked as autostarted

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

# virsh list
 Id    Name                           State
----------------------------------------------------
 1     rhel63                         running
 1     rhel62                         running

Comment 8 EricLee 2012-11-15 08:09:26 UTC

Verified the bug with libvirt-0.10.2-8.el6:

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

# virsh start raw
Domain raw started

# virsh list 
 Id    Name                           State
----------------------------------------------------
 1     raw                            running

# virsh autostart aa
Domain aa marked as autostarted

# virsh list --all --autostart
 Id    Name                           State
----------------------------------------------------
 -     aa                             shut off

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     raw                            running
 2     aa                             running

So moving to VERIFIED.

Comment 9 Eric Blake 2012-11-20 15:24:40 UTC

Marking this as regression, since it was introduced in upstream commit d38897a (0.9.5); RHEL 6.1 did not have this issue.

Comment 11 errata-xmlrpc 2013-02-21 07:26:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html