Bug 719794 - Repeatedly attach/detach a VF would get Duplicate ID error
Summary: Repeatedly attach/detach a VF would get Duplicate ID error
Keywords:
Status: CLOSED DUPLICATE of bug 696877
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.1
Hardware: x86_64
OS: Windows
medium
medium
Target Milestone: rc
: ---
Assignee: Alex Williamson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-08 02:42 UTC by Chao Yang
Modified: 2013-01-10 00:02 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-08 21:52:48 UTC
Target Upstream Version:


Attachments (Terms of Use)
coredump and logs (2.28 MB, application/x-compressed-tar)
2011-07-08 02:42 UTC, Chao Yang
no flags Details

Description Chao Yang 2011-07-08 02:42:28 UTC
Created attachment 511828 [details]
coredump and logs

Description of problem:
Repeatedly attach&detach a VF to xp-32 guest for about 700 times, libvirtd crashed.

Version-Release number of selected component (if applicable):
# rpm -q libvirt
libvirt-0.8.7-18.el6.x86_64
# uname -r
2.6.32-131.0.15.el6.x86_64
# rpm -q virt-manager
virt-manager-0.8.6-4.el6.noarch
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.160.el6.x86_64


How reproducible:
1/5

Steps to Reproduce:
1.generate some VFs on a SR-IOV capable machine
2.start a domain
# virsh start winxp-32
3.attach/detach VF in a loop

Actual results:
libvirtd crashed in my test when repeatedly attach/detach over 690 times

Expected results:
no crash

Additional info:

Comment 1 Dave Allan 2011-07-08 02:48:47 UTC
Libvirt doesn't make a distinction between VFs and other PCI devices, but most of the developers don't have SRIOV hardware so reproduction is far more difficult in that case.  Can you try to reproduce with an ordinary PCI device?  Also, what was the timing of the loop?  Are you attaching and detaching as quickly as you can, or is there a sleep in there?

Comment 2 Chao Yang 2011-07-08 03:13:55 UTC
(In reply to comment #1)
> Libvirt doesn't make a distinction between VFs and other PCI devices, but most
> of the developers don't have SRIOV hardware so reproduction is far more
> difficult in that case.  Can you try to reproduce with an ordinary PCI device?

Okay, I will try to reproduce with an ordinary PCI device, can you tell me what kind of information should I focus on? 

> Also, what was the timing of the loop?  Are you attaching and detaching as
> quickly as you can, or is there a sleep in there?

There is 3 seconds' break between attach and detach.

Comment 3 Dave Allan 2011-07-08 13:41:24 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Libvirt doesn't make a distinction between VFs and other PCI devices, but most
> > of the developers don't have SRIOV hardware so reproduction is far more
> > difficult in that case.  Can you try to reproduce with an ordinary PCI device?
> 
> Okay, I will try to reproduce with an ordinary PCI device, can you tell me what
> kind of information should I focus on? 

If you do exactly what you did with the VF with an ordinary PCI device, I would expect the crash to reproduce.

> > Also, what was the timing of the loop?  Are you attaching and detaching as
> > quickly as you can, or is there a sleep in there?
> 
> There is 3 seconds' break between attach and detach.

Is there a sleep between the detach and the next attach?  (Could you attach the script you're using?)

Comment 4 Chao Yang 2011-07-11 01:50:14 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Libvirt doesn't make a distinction between VFs and other PCI devices, but most
> > > of the developers don't have SRIOV hardware so reproduction is far more
> > > difficult in that case.  Can you try to reproduce with an ordinary PCI device?
> > 
> > Okay, I will try to reproduce with an ordinary PCI device, can you tell me what
> > kind of information should I focus on? 
> 
> If you do exactly what you did with the VF with an ordinary PCI device, I would
> expect the crash to reproduce.
> 
> > > Also, what was the timing of the loop?  Are you attaching and detaching as
> > > quickly as you can, or is there a sleep in there?
> > 
> > There is 3 seconds' break between attach and detach.
> 
> Is there a sleep between the detach and the next attach?  (Could you attach the
> script you're using?)
# cat loop-to-attach-datach.sh 
#!/bin/bash

for i in $(seq 1000)
do
echo "the $i times at `date`"
virsh attach-device winxp-32 vf.xml
sleep 3
virsh detach-device  winxp-32 vf.xml
sleep 3
done

Comment 7 Osier Yang 2011-07-29 12:42:51 UTC
I tested on a SRIOV box, with attach/detach a vf 1000 times successfully. The only difference is my testing script sleeps 6 secs both before "detach-device" and after "detach-device. As if sleeping 3 secs, there will be qemu error "Duplicate ID: hostdev0....". Just like I commented in https://bugzilla.redhat.com/show_bug.cgi?id=696877.

[root@target osier]# cat vf-loop-hotplug.sh 
#!/bin/bash

for i in {1..1000}; do
	echo "[ -- $i times -- $(date) -- ]"
	virsh attach-device virtlab_test vf.xml

	echo "sleep 6 secs"
	sleep 6

	virsh detach-device virtlab_test vf.xml

	echo "sleep 6 secs"
	sleep 6
done

@chayang, could you try to see if it can be reproduced for you anymore?

Comment 8 Chao Yang 2011-08-23 03:09:29 UTC
(In reply to comment #7)
> I tested on a SRIOV box, with attach/detach a vf 1000 times successfully. The
> only difference is my testing script sleeps 6 secs both before "detach-device"
> and after "detach-device. As if sleeping 3 secs, there will be qemu error
> "Duplicate ID: hostdev0....". Just like I commented in
> https://bugzilla.redhat.com/show_bug.cgi?id=696877.
> 
> [root@target osier]# cat vf-loop-hotplug.sh 
> #!/bin/bash
> 
> for i in {1..1000}; do
>  echo "[ -- $i times -- $(date) -- ]"
>  virsh attach-device virtlab_test vf.xml
> 
>  echo "sleep 6 secs"
>  sleep 6
> 
>  virsh detach-device virtlab_test vf.xml
> 
>  echo "sleep 6 secs"
>  sleep 6
> done
> 
> @chayang, could you try to see if it can be reproduced for you anymore?

Sorry for the late response, I will try again with 6 seconds sleep. And this crash may not be easily triggered.

Comment 9 Osier Yang 2011-09-26 04:42:36 UTC
<snip>
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name winxp-32 -uuid 4513b1c0-d9fe-e39c-3eda-4d0528dbabcb -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/winxp-32.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -boot c -drive file=/tmp/win-xp-i386.raw,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/tmp/en_windows_xp_professional_with_service_pack_3_x86_cd_x14-80428.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=28,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:68:e0:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga std -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/2
</snip>

Above is the qemu command line from reporter's log, from the qemu command line, we could see both "-device" and "-netdev" options are used, so the vf unhotplug will use qemu monitor command "netdev_del".

We check the result of "netdev_del" and quit the "detach-device" with error if it fails, this means there is no failure from "netdev_del" command, as command "detach-device" was successful, and the error "duplicate ID ..." was from "device_add" when trying to attach the device right after "detach-device". 

So all these sounds like QEMU doesn't clear the reference of the used device ID "'hostdev0'" timely after "netdev_del" was executed successfully.

Per the libvirtd crash can't be reproduced, IMHO this should be duplicated with https://bugzilla.redhat.com/show_bug.cgi?id=696877.

Comment 10 Osier Yang 2011-09-28 13:59:11 UTC
Reproduced the problem easily using qemu monitor commands directly with my own network card 

# /usr/libexec/qemu-kvm -S -M pc-0.14 -enable-kvm -m 512 -smp 4,sockets=4,cores=1,threads=1 -name test -uuid 861c297d-d570-23e3-84ce-97e2bd23b211 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -drive file=/var/lib/libvirt/images/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,bus=pci.0,multifunction=on,addr=0x4.0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x5.0x0

# echo "8086 10f5" > /sys/bus/pci/drivers/pci-stub/new_id

# echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind

# echo 0000:00:19.0 > /sys/bus/pci/drivers/pci-stub/bind

# readlink /sys/bus/pci/devices/0000\:00\:19.0/driver/ -f
/sys/bus/pci/drivers/pci-stub

# nc -U /var/lib/libvirt/qemu/test.monitor 
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 14, "major": 0}, "package": " (qemu-kvm-0.14.0)"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"pci-assign","host":"00:19.0","id":"hostdev0","configfd":"fd-hostdev0","bus":"pci.0","multifunction":"on","addr":"0x7.0x0"},"id":"libvirt-8"}
{"id": "libvirt-8", "error": {"class": "DeviceInitFailed", "desc": "Device 'pci-assign' could not be initialized", "data": {"device": "pci-assign"}}}
{"execute":"device_add","arguments":{"driver":"pci-assign","host":"00:19.0","id":"hostdev0"}}
{"return": {}}
{"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-9"}
{"return": {}, "id": "libvirt-9"}
{"execute":"device_add","arguments":{"driver":"pci-assign","host":"00:19.0","id":"hostdev0"}}
{"error": {"class": "DuplicateId", "desc": "Duplicate ID 'hostdev0' for device", "data": {"object": "device", "id": "hostdev0"}}}

Fair enough to reassign to qemu-kvm IMO.

Comment 11 Osier Yang 2011-09-29 03:14:54 UTC
reassign to qemu-kvm

Comment 13 Alex Williamson 2011-12-08 21:52:48 UTC
Closing as duplicate of bug 696877.  There seems to be more triage over there

*** This bug has been marked as a duplicate of bug 696877 ***


Note You need to log in before you can comment on or make changes to this bug.