Red Hat Bugzilla – Bug 1294677
Reboot with SR-IOV devices confuses IOMMU
Last modified: 2016-06-09 09:59:03 EDT
Description of problem:
We hit an issue with PCI passthrough - after we reboot the VM IOMMU mappings are incorrect and devices will access invalid memory.
The issue is quite easy to reproduce, we were using NICs (e.g. Intel 40
Gig Ethernet NICs hit the issue reliably) but you could probably just pass through any PCI device which does a bit of DMA.
Steps to Reproduce:
With NICs we do the following to reproduce:
- configure the NIC for SR-IOV passthrough ;
- create two standard VMs;
- configure VMs with 4GB current allocation and 15GB maximum allocation of memory (my machines have 32 or 64GB total);
- pass a VF to each machine;
Note1: the current/maximum allocation of memory seem to play a role here. I'm not 100% sure, however, if it causes the bug or just makes it more likely to be triggered.
Note2: we leave <on_reboot>restart</on_reboot> so that VMs can reboot.
I was able to reproduce easily on 3 distinct machines (dual CPU Haswell E, single CPU Haswell E, single Sandy Bridge EP).
With the VMs created above do the following:
(2) configure VF interfaces;
(3) run ping -c30 to confirm they can communicate;
(4) run iperf -P4 -t30 between the machines;
(6) goto 2;
First time (fresh after boot) ping and iperf should work fine. After first reboot, there should already be communication problems. From traffic inspection with tcpdump it appears that VFs receive zeroed packets. Only some of the packets are zeroed so depending on your luck the communication may work for a while. Usually it breaks down when ARP or important TCP segment gets placed in area that device reads as zero. Reboot will not fix this condition, shutdown/start will.
Intel NIC i40e (configured for SR-IOV)
Machine with any of following CPU configurations: dual CPU Haswell E, single CPU Haswell E, single Sandy Bridge EP
Linux Ubuntu 14.04 LTS
Linux kernel linux-next.git 4ef76753
Qemu git 38a762fe
Reproduced easily on fully up-to-date CentOS 7 with ixgbe (Intel's 10gig cards):
CentOS Linux release 7.2.1511 (Core)
# rpm -q libvirt qemu-kvm kernel
Ivy Bridge CPU
Postings to libvir-list that may (or may not) contain more information: