847193 – PCI passthrough for spacewire card cripples host system

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 847193 - PCI passthrough for spacewire card cripples host system

Summary: PCI passthrough for spacewire card cripples host system

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Alex Williamson
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-10 04:30 UTC by wrob0123
Modified:	2013-04-04 19:26 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-04-04 19:26:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
results of lspci -vv command (52.77 KB, text/plain) 2012-08-10 04:30 UTC, wrob0123	no flags	Details
the jetVM2.log file (20.08 KB, text/plain) 2012-08-11 06:38 UTC, wrob0123	no flags	Details
tree3 (4.46 KB, text/plain) 2012-08-17 06:34 UTC, wrob0123	no flags	Details
tree1 config (4.37 KB, application/octet-stream) 2012-08-22 05:10 UTC, wrob0123	no flags	Details
command line (1.20 KB, text/plain) 2012-08-24 02:30 UTC, Alex Williamson	no flags	Details
View All

Description wrob0123 2012-08-10 04:30:10 UTC

Created attachment 603411 [details]
results of lspci -vv command

Description of problem:
After starting ubuntu guest with PCI device (spacewire card) assigned, the host system runs slower and slower for about 10 seconds, and then stops responding at all. Nothing ever appears on the guest console, and 

Version-Release number of selected component (if applicable):
Virtual Machine Manager 0.9.0, qemu-kvm 0.12.1.2-2.295.el6_3.1.x86

How reproducible:
This happens every time, the same behavior occurs when trying to assign that card to a windows7 guest. 

Steps to Reproduce:
1. install ubuntu 10.04 guest 
2. assign host PCI device 03:04.0 (spacewire card)
3. using virt-manager, start the guest
  
Actual results:
guest does not appear to start, and host locks up after about 10 seconds

Expected results:
run the guest machine and use the spacewire card within the guest

Additional info:
The host machine is a Dell Precision R5500 rackmount workstation with a video card in one PCIe slot and the spacewire card in another PCIe slot. The spacewire card is actually a PMC card mounted on a carrier with a Tundra 8114 P2P bridge. The spacewire card only has BAR0 requesting 4K (non-PF) memory.

# lb /sys/bus/pci/devices/0000:03:04.0/reso*
 4096 Aug  9 21:41 /sys/bus/pci/devices/0000:03:04.0/resource
 4096 Aug  9 23:19 /sys/bus/pci/devices/0000:03:04.0/resource0

# cat /sys/bus/pci/devices/0000:03:04.0/resource
0x00000000f3fff000 0x00000000f3ffffff 0x0000000000020200
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

see attachment "lspci-vv"

Comment 2 wrob0123 2012-08-10 18:29:06 UTC

About the hardware setup, I forgot to mention the DDC1553 card (24:00.0) in another PCIe slot. I can assign this card to a windows VM and the apps work fine.

Comment 3 Alex Williamson 2012-08-10 21:52:33 UTC

You seem to have a great ability for finding cards that don't work :-\

Can you provide the command line used to start the guest or if started by libvirt, provide /var/log/libvirt/qemu/$GUEST.log

Also, how much memory are you assigning to the guest relative to memory installed in the host?  Often a report of the machine getting slower and slower and the guest never starting means the guest is too close to the full memory size of the system.  When using device assignment all of the memory for the guest must be pinned to allow for DMA.  If there's not enough free memory for that to happen, the host can get bogged down trying to swap.

Comment 4 wrob0123 2012-08-11 06:38:54 UTC

Created attachment 603665 [details]
the jetVM2.log file

Comment 5 wrob0123 2012-08-11 06:41:11 UTC

(In reply to comment #3)
> You seem to have a great ability for finding cards that don't work :-\
It's not hard to do. This spacewire card has been on the market with linux and windows drivers for several years now. Perhaps I only get more problems from using PMC cards on PCIe carriers. Another type of PMC card (on a PCI-X carrier) was tried once 2 weeks ago, and also locked up the system. This other PMC card is a mature commercial serial card with good windows and linux drivers also. In a few weeks when I get my hands on one of those cards again, I would like to get the PCI pass-through working with that card also.

> Can you provide the command line used to start the guest or if started by
> libvirt, provide /var/log/libvirt/qemu/$GUEST.log
I have been using virt-manager GUI to start up. When I have the spacewire card assigned to the guest, and the system goes bonkers, nothing gets written to the log file (/var/log/libvirt/qemu/$GUEST.log). Here is the latest entry in the log for when I started the VM without the PCI device assignment:

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.2.0 -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name jetVM3 -uuid cf60d92e-8bc8-8433-fd51-3bc21da8a215 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/jetVM3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/jetVM3.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:65:71:06,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/1

I also tried to assign the device to a windows machine without much luck, but there are entries on the log file about the various problems I had when moving the spacewire card around to different slots. See the jetVM2.log file attached.

> Also, how much memory are you assigning to the guest relative to memory
> installed in the host?  Often a report of the machine getting slower and
> slower and the guest never starting means the guest is too close to the full
> memory size of the system.  When using device assignment all of the memory
> for the guest must be pinned to allow for DMA.  If there's not enough free
> memory for that to happen, the host can get bogged down trying to swap.

I have assigned 4GB to the guest and the host has 24GB installed

Comment 6 wrob0123 2012-08-11 06:51:54 UTC

I just tried assigning the P2P bridge (01:00.0) on the carrier card to the guest as well as the spacewire card (02:04.0) and failed to start the guest, with this being written to the log file:

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.2.0 -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name jetVM3 -uuid cf60d92e-8bc8-8433-fd51-3bc21da8a215 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/jetVM3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/jetVM3.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:65:71:06,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device pci-assign,host=01:00.0,id=hostdev0,configfd=26,bus=pci.0,addr=0x7 -device pci-assign,host=02:04.0,id=hostdev1,configfd=27,bus=pci.0,addr=0x8 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/2
Device assignment only supports endpoint assignment, device type 7
qemu-kvm: -device pci-assign,host=01:00.0,id=hostdev0,configfd=26,bus=pci.0,addr=0x7: Device 'pci-assign' could not be initialized

Comment 7 wrob0123 2012-08-11 07:34:40 UTC

Just tried moving the P2P bridge to pci-stub driver:

# lspci -n -s 01:00.0
01:00.0 0604: 10e3:8114 (rev ff)
# echo 0000:01:00.0 > /sys/bus/pci/drivers/shpchp/unbind
-bash: echo: write error: No such device
# echo 10e3 8114 > /sys/bus/pci/drivers/pci-stub/new_id
# ls /sys/bus/pci/drivers/pci-stub
0000:01:00.0  bind  new_id  remove_id  uevent  unbind

Then when I tried to start the guest, got this:

Failed to assign device "hostdev0" : Operation not permitted
qemu-kvm: -device pci-assign,host=02:04.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x8: 
Device 'pci-assign' could not be initialized

I was hoping to find some kind of work-around ...

Comment 8 wrob0123 2012-08-11 10:54:35 UTC

Been looking through /var/log/messages, and noticed that the last thing I see before the host locks up is: "pci-stub 0000:03:04.0: claimed by stub"

Do I need to disable hotplug and how is that done?

Looks like IRQ 11 gets used several times: (is this good?)
# lspci -vv | grep IRQ
	Interrupt: pin A routed to IRQ 11 (host bridge)
	Interrupt: pin A routed to IRQ 16
	Interrupt: pin B routed to IRQ 17
	Interrupt: pin C routed to IRQ 22
	Interrupt: pin A routed to IRQ 77
	Interrupt: pin A routed to IRQ 23
	Interrupt: pin B routed to IRQ 17
	Interrupt: pin C routed to IRQ 18
	Interrupt: pin A routed to IRQ 23
	Interrupt: pin C routed to IRQ 76
	Interrupt: pin C routed to IRQ 20
	Interrupt: pin A routed to IRQ 11 (spacewire card)
	Interrupt: pin A routed to IRQ 11 (video card)
	Interrupt: pin A routed to IRQ 78
	Interrupt: pin A routed to IRQ 79
	Interrupt: pin A routed to IRQ 11 (DDC1553 card)

VFIO?

Comment 9 Alex Williamson 2012-08-15 18:37:47 UTC

(In reply to comment #8)
> Been looking through /var/log/messages, and noticed that the last thing I
> see before the host locks up is: "pci-stub 0000:03:04.0: claimed by stub"
> 
> Do I need to disable hotplug and how is that done?

It would be a good test, I'd start with binding 3:00.0 to pci-stub.  Both this and your other bug have this Tundra P2P bridge involved.

> Looks like IRQ 11 gets used several times: (is this good?)

RHEL6 doesn't support shared legacy interrupts, what does /proc/interrupts report?

> # lspci -vv | grep IRQ
> 	Interrupt: pin A routed to IRQ 11 (host bridge)
> 	Interrupt: pin A routed to IRQ 16
> 	Interrupt: pin B routed to IRQ 17
> 	Interrupt: pin C routed to IRQ 22
> 	Interrupt: pin A routed to IRQ 77
> 	Interrupt: pin A routed to IRQ 23
> 	Interrupt: pin B routed to IRQ 17
> 	Interrupt: pin C routed to IRQ 18
> 	Interrupt: pin A routed to IRQ 23
> 	Interrupt: pin C routed to IRQ 76
> 	Interrupt: pin C routed to IRQ 20
> 	Interrupt: pin A routed to IRQ 11 (spacewire card)
> 	Interrupt: pin A routed to IRQ 11 (video card)
> 	Interrupt: pin A routed to IRQ 78
> 	Interrupt: pin A routed to IRQ 79
> 	Interrupt: pin A routed to IRQ 11 (DDC1553 card)
> 
> VFIO?

VFIO won't be supported for some time, but if you're willing to try it, I'd be interested to know if it works.  I can provide details on which trees to test if you're interested.

Comment 10 wrob0123 2012-08-16 00:47:01 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > Been looking through /var/log/messages, and noticed that the last thing I
> > see before the host locks up is: "pci-stub 0000:03:04.0: claimed by stub"
> > 
> > Do I need to disable hotplug and how is that done?
> 
> It would be a good test, I'd start with binding 3:00.0 to pci-stub.  Both
> this and your other bug have this Tundra P2P bridge involved.

I have no 3:00.0 listed, the BDF for the Tundra bridge is 2:00.0. So use the steps you told me on the other bug comments?

> > Looks like IRQ 11 gets used several times: (is this good?)
> 
> RHEL6 doesn't support shared legacy interrupts, what does /proc/interrupts
> report?

Looking at /proc/interrupts I cannot tell which card is assigned to what interrupt, but there is no row for interrupt 11 listed.

> VFIO won't be supported for some time, but if you're willing to try it, I'd
> be interested to know if it works.  I can provide details on which trees to
> test if you're interested.

Anything that will give me more granular control over resources will be good, but at the same time, it would be nice to still use virt-manager to start-up and shutdown the VMs. What would we give up (other than support) by trying to use VFIO at this point. I read some of your blogs about VFIO, and want to voice my general opinion that engineering departments like mine could benefit from something like VFIO tailored for "odd-ball" cards. This is not a hobby for me. Several departments in my company are interested in this technology but we need to use VMs attached to several types of interface cards. It just seems that libvirt / KVM is not that flexible at this point.

Comment 11 Alex Williamson 2012-08-16 01:41:50 UTC

(In reply to comment #10)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > Been looking through /var/log/messages, and noticed that the last thing I
> > > see before the host locks up is: "pci-stub 0000:03:04.0: claimed by stub"
> > > 
> > > Do I need to disable hotplug and how is that done?
> > 
> > It would be a good test, I'd start with binding 3:00.0 to pci-stub.  Both
> > this and your other bug have this Tundra P2P bridge involved.
> 
> I have no 3:00.0 listed, the BDF for the Tundra bridge is 2:00.0. So use the
> steps you told me on the other bug comments?

Sorry, yes 2:00.0; typo.  Yes, follow the steps in the other bz.

> > > Looks like IRQ 11 gets used several times: (is this good?)
> > 
> > RHEL6 doesn't support shared legacy interrupts, what does /proc/interrupts
> > report?
> 
> Looking at /proc/interrupts I cannot tell which card is assigned to what
> interrupt, but there is no row for interrupt 11 listed.

Ok, so those devices would use irq 11, but the driver is either using MSI or or there is no driver or the driver doesn't use interrupts.

> > VFIO won't be supported for some time, but if you're willing to try it, I'd
> > be interested to know if it works.  I can provide details on which trees to
> > test if you're interested.
> 
> Anything that will give me more granular control over resources will be
> good, but at the same time, it would be nice to still use virt-manager to
> start-up and shutdown the VMs. What would we give up (other than support) by
> trying to use VFIO at this point. I read some of your blogs about VFIO, and
> want to voice my general opinion that engineering departments like mine
> could benefit from something like VFIO tailored for "odd-ball" cards. This
> is not a hobby for me. Several departments in my company are interested in
> this technology but we need to use VMs attached to several types of
> interface cards. It just seems that libvirt / KVM is not that flexible at
> this point.

There is no libvirt/virt-manager support for vfio yet, but I expect it to come soon.  The current proposed upstream qemu code is also lacking legacy, INTx, interrupt support, but I have a number or prototypes around that.  I think the benefit you'll see in vfio is how it manages the tundra p2p bridge.  That will be in the same iommu group as the device behind it, so we'll program one iommu domain for both w/o needing to attempt to assign the bridge to the guest.  We're probably looking at at least Qemu 1.3 in early December for full upstream integration.

Comment 12 wrob0123 2012-08-17 06:33:07 UTC

> VFIO won't be supported for some time, but if you're willing to try it ...
I am interested, but I need some OJT from you on startup w/o virt-manager.
The other engineers here would be lost without the GUI for startup/shutdown,
but I suppose if we could put everything into shell scripts, that would work.

> Can you provide the command line used to start the guest ...
What kind of command line are available? virsh? qemu-kvm?
Will I get more control of device resource assignments with CLI?

> ... proposed upstream qemu code is also lacking legacy, INTx ...
On the other bz (836058) where I lucked out with a workaround, there is an entry in /proc/interrupts with "kvm_assigned_intx_device" on the right side of the row. Are you saying that qemu will not even support that in the future?

> (tundra p2p) will be in the same iommu group as the device behind it
So there is no way to force libvirt/kvm/qemu to group things manually now?
I apologize if I am asking stupid questions, but I am ignorant and confused.

Here is something I noticed about the config for the other machine with the xxx card in the expansion chassis. Remember I assigned both the P2P bridge and the xxx card to the guest, even though you did not think that was a good idea.
lspci -vv (after starting guest) shows some lines about kernel stuff:
  24:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8114
	Kernel driver in use: pci-stub
	Kernel modules: shpchp
  25:0f.0 Unassigned class [ff00]: Device xxxx:xxxx (xxx card)
        Interrupt: pin A routed to IRQ 54
	Kernel driver in use: pci-stub

Does this mean that KVM automagically moved the P2P bridge and the xxx card to the pci-stub driver in a way so that the interrupts get supported properly? 
Also note the interrupt for the xxx card got routed to IRQ 54, not IRQ 11

### Now back to the host with the spacewire card on PCIe carrier card ###
Sorry I keep moving the cards around to different slots to see if I can get something to work, so now the BDF numbers are changed. Also, tonight I am trying to assign the spacewire card to the jetVM2 windows guest (instead of the jetVM3 ubuntu guest) so here is the current config (see attachment tree3 for more info)

  24:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8114
  25:04.0 Bridge: Xilinx Corporation Device 002a (spacewire card)

> > > Do I need to disable hotplug and how is that done?
> > ... start with binding (Tundra P2P bridge) to pci-stub ...
  # echo 0000:24:00.0 > /sys/bus/pci/drivers/shpchp/unbind
  -bash: echo: write error: No such device
  # echo 10e3 8114    > /sys/bus/pci/drivers/pci-stub/new_id
  # ls /sys/bus/pci/drivers/pci-stub
  0000:24:00.0  bind  new_id  remove_id  uevent  unbind

This does not help - the host still locks up after about 10 seconds.
How do I disable hotplug?  Does I need something like coldplug?
Is this an legacy interrupt related problem?

Note that the jetVM1 windows guest (with the DDC1553 card assigned to it) works okay, but it has an Altera PCIe bridge supporting MSI.

> > Looks like IRQ 11 gets used several times
> RHEL6 doesn't support shared legacy interrupts
The spacewire and DDC1553 cards do not have host linux drivers, only the video card does, and I assume that the nvidia driver does not use interrupts.

Comment 13 wrob0123 2012-08-17 06:34:24 UTC

Created attachment 605088 [details]
tree3

Comment 14 wrob0123 2012-08-17 07:58:40 UTC

Right after starting the guest before the host machines dies completely, looking at /proc/interrupts, I never see a line with "kvm_assigned_intx_device" like I see on the other system (bz 836058: xxx card in expansion chassis)

Another difference in the Tundra 8114 listing in lspci -vv:
This machine with the spacewire card has "Memory behind bridge"
The other machine with the xxx card does not

The spacewire card has 4K non-prefetchable memory
The xxx card has 16M prefetchable memory space

Comment 15 wrob0123 2012-08-21 03:26:20 UTC

(In reply to comment #14)
> Right after starting the guest before the host machines dies completely,
> looking at /proc/interrupts, I never see a line with
> "kvm_assigned_intx_device" like I see on the other system (bz 836058: xxx
> card in expansion chassis)

In this system I where the DDC1553 card does work attached to a guest, when I am running that guest there's a /proc/interrupts line with kvm_assigned_msi_device

> Another difference in the Tundra 8114 listing in lspci -vv:
> This machine with the spacewire card has "Memory behind bridge"
> The other machine with the xxx card does not
> 
> The spacewire card has 4K non-prefetchable memory
> The xxx card has 16M prefetchable memory space

Maybe it is not fair to compare this system to the other system from bz 836058 since the expansion chassis probably isolated some of the issues of the legacy PCI (xxx card) device. Also on this system I have loaded the latest NVIDIA graphics driver, whereas on the other system I did not bother and the console login remained a CLI. 

Is there anything I can request of the hardware vendor like MSI support? Also I have seen mention of FLR on the KVM ToDo list. Or am I fighting a lost cause since the spacewire card is basically a conventional PCI device?

Comment 16 wrob0123 2012-08-21 08:17:18 UTC

Now the graphics fb is disable and I am using a serial console. The video card is installed but it is in the wrong slot. The spacewire card is in a different slot also. When I tried to start the guest with card assigned, got this error:

Domain id=1 is tainted: high-privileges
Failed to assign device "hostdev0" : Operation not permitted
qemu-kvm: -device pci-assign,host=24:04.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x7: Device 'pci-assign' could not be initialized

I thought that allow_unsafe_assigned_interrupts=1 let me do anything. Then when I look at the lspci -vv for the spacewire card, it shows:
24:04.0 Bridge: Xilinx Corporation Device 002a (rev ff) (prog-if ff)
	!!! Unknown header type 7f

Is this another useless clue?

Comment 17 wrob0123 2012-08-22 05:09:30 UTC

Should have mentioned before: updates running 6.3 for the last week or so.
This prevents me from trying to assign the Tundra P2P bridge to the guest.

Today I put the video card back into the correct slot; with another DDC 1553 card, and the spacewire card is in a different slot (see attachment tree1)

Running a windows guest with DDC 1553 assigned was an entry in /var/log/messages about disabling the interrupt:
  kernel: pci-stub 0000:24:00.0: PCI INT A disabled
But then I fixed the firmware on the DDC 1553 card, and tried again:
  kernel: pci-stub 0000:24:00.0: PCI INT A -> GSI 54 (level, low) -> IRQ 54

I have seen this message a few times on the serial console recently:
  irq 16: nobody cared (try booting with the "irqpoll" option)
  handlers:
  [<ffffffff813a5940>] (usb_hcd_irq+0x0/0x90)
  Disabling IRQ #16
Should I try the irqpoll option, or is this a harmless message?

Earlier tonight I got this error again when trying to start a windows guest with the spacewire card assigned to it:
  Failed to assign device "hostdev0" : Operation not permitted
  qemu-kvm: 
  -device pci-assign,host=02:04.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x8: 
  Device 'pci-assign' could not be initialized
Plus lspci -vv shows the weird info for spacewire card and the Tundra P2P:
01:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8114 (rev ff) (prog-if ff)
        !!! Unknown header type 7f
02:04.0 Bridge: Xilinx Corporation Device 002a (rev ff) (prog-if ff)
        !!! Unknown header type 7f

Then I just tried it again (windows guest with spacewire card) and it started this time, but the system died within 10 seconds. Looking at /proc/interrupts before it croaks, I still do not see any kvm assigned interrupts.

I have requested some new firmware from the spacewire card vendor which configures the card to use MSI interrupts. Hopefully this will help.
Should I ask them about FLR or any other enhancements?

Comment 18 wrob0123 2012-08-22 05:10:39 UTC

Created attachment 606125 [details]
tree1 config

Comment 19 wrob0123 2012-08-22 06:10:12 UTC

Just noticed something else (another useless clue?)
Whenever I start the guest with the spacewire card assigned, 
the serial console stops working immediately.
This has got to be something to do with interrupts.

> > The current proposed upstream qemu code is also lacking legacy, INTx,
> > interrupt support, but I have a number or prototypes around that.

What does this mean? So is VFIO going to somehow bypass qemu for legacy interrupts support? Or do I need to get all of my hardware suppliers to update their cards with the latest PCI interface technology? That is probably not going to happen in some cases. All of this in pursuit of hot-plug, and I do not even need that for my case of assigned PCI cards to guest VMs. Once this type of PCI device is assigned to a guest, it will always be assigned to that guest.

Comment 20 Alex Williamson 2012-08-22 19:02:05 UTC

(In reply to comment #12)
> > VFIO won't be supported for some time, but if you're willing to try it ...
> I am interested, but I need some OJT from you on startup w/o virt-manager.
> The other engineers here would be lost without the GUI for startup/shutdown,
> but I suppose if we could put everything into shell scripts, that would work.

Note that VFIO is still in development upstream.  To use it you'll need to run the latest development Linux kernel (v3.6 pre-release) and a patched qemu.  This may be the best way to get you running, but it does have support implications.
 
> > Can you provide the command line used to start the guest ...
> What kind of command line are available? virsh? qemu-kvm?
> Will I get more control of device resource assignments with CLI?

virsh is just a wrapper for qemu-kvm and you can always find the command line used to launch a guest in /var/log/libvirt/qemu/$GUEST.log

There's not much to control wrt to device resources from the CLI, the problem we're potentially having is that current KVM device assignment handles each device as if it's uniquely identifiable by the iommu, which is not always the case.

> > ... proposed upstream qemu code is also lacking legacy, INTx ...
> On the other bz (836058) where I lucked out with a workaround, there is an
> entry in /proc/interrupts with "kvm_assigned_intx_device" on the right side
> of the row. Are you saying that qemu will not even support that in the
> future?

It's a step in development.  We'll of course eventually support legacy PCI interrupts and I'm working through making that happen, but it's not there yet.

> > (tundra p2p) will be in the same iommu group as the device behind it
> So there is no way to force libvirt/kvm/qemu to group things manually now?
> I apologize if I am asking stupid questions, but I am ignorant and confused.
> 
> Here is something I noticed about the config for the other machine with the
> xxx card in the expansion chassis. Remember I assigned both the P2P bridge
> and the xxx card to the guest, even though you did not think that was a good
> idea.
> lspci -vv (after starting guest) shows some lines about kernel stuff:
>   24:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8114
> 	Kernel driver in use: pci-stub
> 	Kernel modules: shpchp
>   25:0f.0 Unassigned class [ff00]: Device xxxx:xxxx (xxx card)
>         Interrupt: pin A routed to IRQ 54
> 	Kernel driver in use: pci-stub
> 
> Does this mean that KVM automagically moved the P2P bridge and the xxx card
> to the pci-stub driver in a way so that the interrupts get supported
> properly? 
> Also note the interrupt for the xxx card got routed to IRQ 54, not IRQ 11

libvirt moves assigned devices to pci-stub so that other host drivers can't interfere with them while assigned to a guest.  The IRQ probably moved because it actually got assigned.  IRQ 11 is probably what's left in the interrupt like register from the bios.

(In reply to comment #15) 
> Is there anything I can request of the hardware vendor like MSI support?
> Also I have seen mention of FLR on the KVM ToDo list. Or am I fighting a
> lost cause since the spacewire card is basically a conventional PCI device?

MSI support is generally strongly preferred.  FLR is a PCIe feature, the closest we could get on conventional PCI is a soft reset on D3hot->D0 transition, which means some degree of power management support.

(In reply to comment #17)
> Should have mentioned before: updates running 6.3 for the last week or so.
> This prevents me from trying to assign the Tundra P2P bridge to the guest.
> 
> Today I put the video card back into the correct slot; with another DDC 1553
> card, and the spacewire card is in a different slot (see attachment tree1)
> 
> Running a windows guest with DDC 1553 assigned was an entry in
> /var/log/messages about disabling the interrupt:
>   kernel: pci-stub 0000:24:00.0: PCI INT A disabled
> But then I fixed the firmware on the DDC 1553 card, and tried again:
>   kernel: pci-stub 0000:24:00.0: PCI INT A -> GSI 54 (level, low) -> IRQ 54
> 
> I have seen this message a few times on the serial console recently:
>   irq 16: nobody cared (try booting with the "irqpoll" option)
>   handlers:
>   [<ffffffff813a5940>] (usb_hcd_irq+0x0/0x90)
>   Disabling IRQ #16
> Should I try the irqpoll option, or is this a harmless message?

This means that a device pulled IRQ16 but none of the drivers registered to IRQ16 serviced the interrupt.  When this happens enough, Linux disables the interrupt.  It may be an indication that a device isn't actually using the interrupt we think it's using.  I would not recommend switching to irqpoll.

> Earlier tonight I got this error again when trying to start a windows guest
> with the spacewire card assigned to it:
>   Failed to assign device "hostdev0" : Operation not permitted
>   qemu-kvm: 
>   -device
> pci-assign,host=02:04.0,id=hostdev0,configfd=25,bus=pci.0,addr=0x8: 
>   Device 'pci-assign' could not be initialized
> Plus lspci -vv shows the weird info for spacewire card and the Tundra P2P:
> 01:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8114 (rev ff) (prog-if
> ff)
>         !!! Unknown header type 7f
> 02:04.0 Bridge: Xilinx Corporation Device 002a (rev ff) (prog-if ff)
>         !!! Unknown header type 7f

This means PCI config space for the device is inaccessible, often this will happen when a device is powered down or the bus is reset and configuration is not restored.

> Then I just tried it again (windows guest with spacewire card) and it
> started this time, but the system died within 10 seconds. Looking at
> /proc/interrupts before it croaks, I still do not see any kvm assigned
> interrupts.
> 
> I have requested some new firmware from the spacewire card vendor which
> configures the card to use MSI interrupts. Hopefully this will help.
> Should I ask them about FLR or any other enhancements?

Some sort of reset mechanism would hopefully solve a lot of these problems.  I strongly suspect we're still being bitten by the secondary bus reset on the tundra device.  As mentioned above, for conventional PCI, I think the best we can do is power management support indicating a soft reset on D3hot->D0 transition.

(In reply to comment #19)
> Just noticed something else (another useless clue?)
> Whenever I start the guest with the spacewire card assigned, 
> the serial console stops working immediately.
> This has got to be something to do with interrupts.
> 
> > > The current proposed upstream qemu code is also lacking legacy, INTx,
> > > interrupt support, but I have a number or prototypes around that.
> 
> What does this mean? So is VFIO going to somehow bypass qemu for legacy
> interrupts support? Or do I need to get all of my hardware suppliers to
> update their cards with the latest PCI interface technology? That is
> probably not going to happen in some cases. All of this in pursuit of
> hot-plug, and I do not even need that for my case of assigned PCI cards to
> guest VMs. Once this type of PCI device is assigned to a guest, it will
> always be assigned to that guest.

The INTx bypass mechanism simply means that upon asserting INTx, we notify KVM to inject an interrupt into the guest.  An EOI path is then created between KVM and VFIO to allow the interrupt to be re-asserted.  It's just an implementation detail towards support of legacy PCI interrupts.

If you can provide the log file from one of your guests (/var/log/libvirt/qemu/$GUEST.log) I can help you reduce it to a qemu command line that you can execute by hand, which will avoid the secondary bus reset on the tundra device, and may end up working better.

Comment 21 wrob0123 2012-08-22 22:56:52 UTC

Alex, thank you very much for your responses; I am slowly catching on.
Unfortunately now I will not have a spacewire card for about a month.
It has to be shipped back for an update to the FPGA to support MSI.
Perhaps I can try VFIO with another odd-ball interface card.
I will probably create another bz soon.

> > Should I ask them about FLR or any other enhancements?
> Some sort of reset ... bitten by the secondary bus reset on the tundra ...
So if the spacewire card had FLR, then KVM would not be using the tundra SBR?
I am in discussion with the card vendor, and have a custom procedure; but I told them that the Linux KVM wants to use standardized PCI options like FLR. Is that true, or is VFIO going to support customized reset mechanisms?

> > This has got to be something to do with interrupts
> VFIO ... support of legacy PCI interrupts ...
On the other bz (836058) I believe you were right about the main problem being the tundra P2P device, and if you remember (in that config) the system would hang immediately. On this config for the spacewire card, it takes about 10 seconds for the system to become totally unresponsive. I am hoping the MSI support on the spacewire will solve this.

> If you can provide the log file from one of your guests ...
See attachment jetVM2.log

Comment 22 wrob0123 2012-08-24 01:41:16 UTC

> conventional PCI (reset) means some degree of power management support ...
Okay, now I understand what you were saying, but I am still confused about how the lack of power management capabilities would end up looking like an interrupt related problem. Well, maybe it was both reset and interrupt problems.

Today I put another PMC card on the same PCIe carrier card into this system, and had no problems. This ADT PMC1553 card does not look much different from the spacewire card, but it does have some basic power management capabilities.

For the ADT card, pin A gets routed to IRQ 24, not IRQ 11 (mystery?)

Section from lspci -vv:

02:04.0 Unassigned class [ff00]: Alta Data Technologies LLC Device 0010
        Subsystem: Alta Data Technologies LLC Device 0010
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 24
        Region 0: Memory at f7eff000 (32-bit, non-prefetchable) [size=512]
        Region 1: I/O ports at dc00 [size=256]
        Region 2: Memory at f7000000 (32-bit, non-prefetchable) [size=8M]
        Expansion ROM at <unassigned>
        Capabilities: [40] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] #00 [0000]
        Capabilities: [4c] Vital Product Data
                Unknown small resource type 00, will not decode more

Guess I do not have to file another bug report today ;D

Comment 23 Alex Williamson 2012-08-24 02:14:25 UTC

(In reply to comment #22)
> > conventional PCI (reset) means some degree of power management support ...
> Okay, now I understand what you were saying, but I am still confused about
> how the lack of power management capabilities would end up looking like an
> interrupt related problem. Well, maybe it was both reset and interrupt
> problems.

The lack of power management, specifically the D3hot->D0 reset means that we escalate to a bus reset, which resets everything.

> Today I put another PMC card on the same PCIe carrier card into this system,
> and had no problems. This ADT PMC1553 card does not look much different from
> the spacewire card, but it does have some basic power management
> capabilities.
> 
> For the ADT card, pin A gets routed to IRQ 24, not IRQ 11 (mystery?)
> 
> Section from lspci -vv:
> 
> 02:04.0 Unassigned class [ff00]: Alta Data Technologies LLC Device 0010
>         Subsystem: Alta Data Technologies LLC Device 0010
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 24
>         Region 0: Memory at f7eff000 (32-bit, non-prefetchable) [size=512]
>         Region 1: I/O ports at dc00 [size=256]
>         Region 2: Memory at f7000000 (32-bit, non-prefetchable) [size=8M]
>         Expansion ROM at <unassigned>
>         Capabilities: [40] Power Management version 1
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
                             ^^^^^^^^^^

This is exactly the power management feature we need (it's inverse logic so NoSoftRst- means that it does a soft reset).  The fact that this device works sure seems to point to the secondary bus reset being the root of the problems.  If you run qemu-kvm by hand or script it won't force that reset, it's libvirt that does that.  I'll take a look at your log and give you something to try by hand.

Comment 24 Alex Williamson 2012-08-24 02:30:05 UTC

Created attachment 606733 [details]
command line

Here's a command line to run jetVM2 from the command line.  You'll need to run this as root.  In the shell where it runs you should get a (qemu) prompt.  When you see that, from another terminal run "vncviewer 127.0.0.1:0" to connect to the graphics head of the VM (vncviwer is in the tigervnc package).  Then enter 'c' at the (qemu) prompt to continue the VM, much like a debugger.  Hopefully the VM will start then.

Comment 25 wrob0123 2012-08-28 03:26:21 UTC

Okay, I got the spacewire card back tonight. It has not been shipped out yet for the MSI interrupt update to the PCI interface portion of the FPGA. The system is still getting crippled when I start the guest VM from the qemu command line.

On the serial console, I am getting these messages:

BUG: soft lockup - CPU#5 stuck for 67s! [qemu-kvm:2680]
INFO: task jbd2/dm-0-8:439 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task packagekitd:2478 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Does this eliminate the SBR theory? Please advise soon if you want me to try something else before the spacewire cards get shipped out.

Comment 26 wrob0123 2012-09-06 06:34:48 UTC

Tried another type of card tonight - it is a multiprotocol serial (PMC) card, and I mounted it on the same carrier card uses for the spacewire and the PMC1553 cards as noted in my previous comments. With this card, it does not take long for the system to become unresponsive. This one has power management:

02:04.0 Bridge: PLX Technology, Inc. PCI9056 32-bit 66MHz PCI <-> IOBus Bridge (rev ac)
	Subsystem: PLX Technology, Inc. Device 3198
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at f7edf000 (32-bit, non-prefetchable) [size=512]
	Region 1: I/O ports at dc00 [size=256]
	Region 2: Memory at f7ee0000 (32-bit, non-prefetchable) [size=128K]
	Expansion ROM at <unassigned>
	Capabilities: [40] Power Management version 7
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable+ DSel=15 DScale=0 PME-
	Capabilities: [48] CompactPCI hot-swap <?>
	Capabilities: [4c] Vital Product Data
		Unknown large resource type 35, will not decode more.

I really would like to get this card to work as a PCI device assigned to a guest VM. Shall I submit a separate bug report?

Comment 27 wrob0123 2012-09-06 17:10:29 UTC

I forgot to mention that I saw this message on the serial console when booting the host, and may indicate that this is a kernel problem:

  pci 0000:02:04.0: unsupported PM cap regs version (7)

Also, I tried binding the Tundra 8114 P2P bridge to pci-stub, and starting the guest from the command line. The system seemed to stay alive a little longer, but ended up locking up within 10 seconds or so. 

Here is the resource info for the serial PMC card:

# ll /sys/bus/pci/devices/0000:02:04.0/res*
--w--w---- 1   4096 Sep  6 12:02 /sys/bus/pci/devices/0000:02:04.0/rescan
--w------- 1   4096 Sep  6 12:02 /sys/bus/pci/devices/0000:02:04.0/reset
-r--r--r-- 1   4096 Sep  6 11:26 /sys/bus/pci/devices/0000:02:04.0/resource
-rw------- 1    512 Sep  6 12:02 /sys/bus/pci/devices/0000:02:04.0/resource0
-rw------- 1    256 Sep  6 12:02 /sys/bus/pci/devices/0000:02:04.0/resource1
-rw------- 1 131072 Sep  6 12:02 /sys/bus/pci/devices/0000:02:04.0/resource2

# cat /sys/bus/pci/devices/0000:02:04.0/resource
0x00000000f7edf000 0x00000000f7edf1ff 0x0000000000020200
0x000000000000dc00 0x000000000000dcff 0x0000000000020101
0x00000000f7ee0000 0x00000000f7efffff 0x0000000000020200
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000

Notice there is /sys/bus/pci/devices/0000:02:04.0/reset, but the kernel reported that it does not support power management version 7.

Comment 28 wrob0123 2012-09-18 18:11:15 UTC

> ... may indicate that this is a kernel problem:
  pci 0000:02:04.0: unsupported PM cap regs version (7)

After research, I found this was a problem with the serial PMC card, and worked with the vendor to get it fixed (see bz 856891) so that the Power Management capabilities now reports as version 2. Hoping to retest with the serial card on the carrier card with the Tundra 8114 P2P bridge soon.

Seems like IRQ 16 is a problem child on my host. See comment 17 above and bz 856891. How can I tell qemu-kvm and/or the kernel not to use IRQ 16 for any cards assigned to my guest VMs?

Comment 29 wrob0123 2012-11-16 00:55:47 UTC

Life is good now - the serial card works great, and I finally got back my spacewire card with the updated firmware. Need to get updated OS and drivers on the guest, but I am happy to report that the host and guest run okay without any known problems as reported earlier.

Here is the lspci -vv output for the spacewire card:

07:04.0 Bridge: Xilinx Corporation Device 002a (rev 09)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f3dff000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 1
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Comment 30 Alex Williamson 2012-11-16 03:09:58 UTC

(In reply to comment #29)
> Life is good now - the serial card works great, and I finally got back my
> spacewire card with the updated firmware. Need to get updated OS and drivers
> on the guest, but I am happy to report that the host and guest run okay
> without any known problems as reported earlier.
> 
> Here is the lspci -vv output for the spacewire card:
> 
> 07:04.0 Bridge: Xilinx Corporation Device 002a (rev 09)
>         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 16
>         Region 0: Memory at f3dff000 (32-bit, non-prefetchable) [size=4K]
>         Capabilities: [40] Power Management version 1
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Great news!  So the primary change is the addition of the power management allowing a device reset rather than a secondary bus reset?  Should this bug be closed now?  What about bug 836058 and bug 856891?  Thanks for the report.

Comment 32 Alex Williamson 2013-04-04 19:26:13 UTC

I believe this is now working with the updated hardware, closing.

Note You need to log in before you can comment on or make changes to this bug.