Bug 677864

Summary: bootindex causes qemu exit for conflict on hotplug, doesn't support removal
Product: Red Hat Enterprise Linux 6 Reporter: juzhang <juzhang>
Component: qemu-kvmAssignee: Gleb Natapov <gleb>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: medium    
Version: 6.1CC: alex.williamson, ddutile, knoel, michen, mkenneth, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-17 19:32:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 580954    

Description juzhang 2011-02-16 06:18:28 UTC
Description of problem:
Hot add two vfs with same bootindex leading to qemu-kvm process quit

Version-Release number of selected component (if applicable):
#qemu-kvm version
qemu-kvm-0.12.1.2-2.144.el6.x86_64
#uname -r
2.6.32-115.el6.x86_64
#rpm -qa | grep seabios
seabios-0.6.1.2-3.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Generate VF
#modprobe -r igb;modprobe igb max_vfs=7
2.Unbind two vfs from host
lspci -n | grep 09:10.2
09:10.2 0200: 8086:10ca (rev 01)

echo "8086 10ca" >/sys/bus/pci/drivers/pci-stub/new_id
echo 0000:09:10.2 >/sys/bus/pci/devices/0000\:09\:10.2/driver/unbind
echo 0000:09:10.2 >/sys/bus/pci/drivers/pci-stub/bind

#lspci -n | grep 09:10.3
09:10.3 0200: 8086:10ca (rev 01)

#echo "8086 10ca" >/sys/bus/pci/drivers/pci-stub/new_id
#echo 0000:09:10.3 >/sys/bus/pci/devices/0000\:09\:10.3/driver/unbind
#echo 0000:09:10.3 >/sys/bus/pci/drivers/pci-stub/bind
3.boot guest
/usr/libexec/qemu-kvm -m 2G -smp 2 -drive file=/root/zhangjunyi/rhel6.1-ide.qcow2,if=none,id=test,boot=on,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test -cpu qemu64,+sse2,+x2apic -monitor stdio -boot c -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:11:22:45:66:95 -vnc :9 -qmp tcp:0:4446,server,nowait
4.hotadd two vfs with the same bootindex
#(qemu) device_add pci-assign,host=09:10.2,id=pci1,bootindex=10
#(qemu) device_add pci-assign,host=09:10.3,id=pci3,bootindex=10

  
Actual results:
qemu-kvm process was quit with message "Two devices with same boot index 10"

Expected results:
should prevent hot-add instead of quit process,image that,if some very important jobs running in guest currently.it's very aggressive let qemu-kvm quit directly. 

Additional info:

Comment 2 Alex Williamson 2011-02-17 19:12:57 UTC
The above error has nothing to do with device assignment in particular, it will happen with any device that supports bootindex.  The new bootindex option does not work well with hotplug.  add_boot_device_path() should not call exit, it should return an error if there's a conflict.  There also appears to be no remove function, so if a device is specified with a bootindex and later removed, what happens?  Assigning back to Gleb.

Comment 4 Gleb Natapov 2011-02-17 19:32:06 UTC
Don't add two devices with same same bootindex. This does not suppose to work neither from command line nor from qemu monitor. Closing as NOTABUG. Alex is correct that we probably need to add del_boot_device_path(), but hot-unplug will not cause crash without it too, so this is not high priority.

Comment 5 Alex Williamson 2011-02-17 19:46:56 UTC
(In reply to comment #4)
> Don't add two devices with same same bootindex. This does not suppose to work
> neither from command line nor from qemu monitor. Closing as NOTABUG. Alex is
> correct that we probably need to add del_boot_device_path(), but hot-unplug
> will not cause crash without it too, so this is not high priority.

Gleb, should we at least ignore the bootindex for hotplugged device, maybe printing a warning?  I'm not a fan of leaving holes that are so easy to trigger and result in qemu calling exit() while the guest is running.

Comment 7 Gleb Natapov 2011-02-17 22:12:34 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Don't add two devices with same same bootindex. This does not suppose to work
> > neither from command line nor from qemu monitor. Closing as NOTABUG. Alex is
> > correct that we probably need to add del_boot_device_path(), but hot-unplug
> > will not cause crash without it too, so this is not high priority.
> 
> Gleb, should we at least ignore the bootindex for hotplugged device, maybe
> printing a warning?  I'm not a fan of leaving holes that are so easy to trigger
> and result in qemu calling exit() while the guest is running.
Why should we ignore bootindex for hotplugged device? User may want to hotplug a device and boot from in on the next boot for instance. About exit() I am not sure what can be done here. This is not exit that can be triggered by a guest and qemu usually exits on wrong command line (user may not notice the warning if we will print one). Also printing a warning is not appropriate response on monitor command failure since the one who issues the command may not even have access to stderr. I can change the code to not treat two devices with the same bootindex as an error, but then user may unintentionally specify two devices with the same bootindex and be surprised by a result.

Comment 9 Alex Williamson 2011-02-17 22:28:38 UTC
(In reply to comment #7)
> Why should we ignore bootindex for hotplugged device? User may want to hotplug
> a device and boot from in on the next boot for instance. About exit() I am not
> sure what can be done here. This is not exit that can be triggered by a guest
> and qemu usually exits on wrong command line (user may not notice the warning
> if we will print one). Also printing a warning is not appropriate response on
> monitor command failure since the one who issues the command may not even have
> access to stderr. I can change the code to not treat two devices with the same
> bootindex as an error, but then user may unintentionally specify two devices
> with the same bootindex and be surprised by a result.

Gleb, it sounds like you're validating why this should not be closed.  If we want to allow users to hotplug devices and boot from it on the next boot, then we need to make it such that a trivial typo on the hotplug doesn't cause the VM to exit.  It seems like a fairly simple matter to have add_boot_device_path() return an error and make the device initfn fail when this occurs.  If an initfn fails before the VM is running, I think we automatically abort.  I actually started fixing this, but decided to re-assign to you after I saw there was no del_boot_device_path().

Comment 11 Gleb Natapov 2011-02-18 06:51:48 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > Why should we ignore bootindex for hotplugged device? User may want to hotplug
> > a device and boot from in on the next boot for instance. About exit() I am not
> > sure what can be done here. This is not exit that can be triggered by a guest
> > and qemu usually exits on wrong command line (user may not notice the warning
> > if we will print one). Also printing a warning is not appropriate response on
> > monitor command failure since the one who issues the command may not even have
> > access to stderr. I can change the code to not treat two devices with the same
> > bootindex as an error, but then user may unintentionally specify two devices
> > with the same bootindex and be surprised by a result.
> 
> Gleb, it sounds like you're validating why this should not be closed.  If we
> want to allow users to hotplug devices and boot from it on the next boot, then
> we need to make it such that a trivial typo on the hotplug doesn't cause the VM
> to exit.  It seems like a fairly simple matter to have add_boot_device_path()
> return an error and make the device initfn fail when this occurs.  If an initfn
> fails before the VM is running, I think we automatically abort.  I actually
> started fixing this, but decided to re-assign to you after I saw there was no
> del_boot_device_path().
Are you saying that if initfn fails during machine creation qemu aborts, but if it fails during hot plug it doesn't? If yes then we may change the behaviour for 6.2. We may make add_boot_device_path() to return token or an error. Token will be used for  del_boot_device_path().

Comment 13 Alex Williamson 2011-02-18 14:12:53 UTC
(In reply to comment #11)
> Are you saying that if initfn fails during machine creation qemu aborts, but if
> it fails during hot plug it doesn't? If yes then we may change the behaviour
> for 6.2. We may make add_boot_device_path() to return token or an error. Token
> will be used for  del_boot_device_path().

Yes, exactly.  See how device_init_func is called versus do_device_add.