Description of problem: I'm not sure if AMD's SEV is supposed to work with vhost-user - I know we have tricks to use unencrypted areas for kernel vhost; what happens with vhost-user. Please test. Version-Release number of selected component (if applicable): 8.2 How reproducible: ? Steps to Reproduce: 1. Set up a SEV VM 2. Add a vhost-user NIC 3. Actual results: Unsure! Expected results: Unsure! Additional info:
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Testing Update: SEV guest can boot with vhost-user. Next I'll test if vhost-user can receive MoonGen packets. (As MoonGen tests need back-to-back NIC connection. Besides the NUMA node which NICs are located has no memory now, so I have sent requests to IT people to re-connect to satisfy these specific hardware connections. I'll update final results soon)
Testing summary: SEV doesn't support vhost-user well. == Testing result: 1. Sometimes guest can boot successfully. However starting dpdk's testpmd in guest will cause guest rebooting. In below testing steps, guest will always reboot after step 8 in this condition. 2. Sometimes guest fail boot with below error. In below testing steps, guest fail boot in step 7. qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=5,policy=bind: cannot bind memory to host NUMA nodes: Input/output error == Testing highlight: 1. For vhost-user testing, hugepage is a must. So we have to reserve hugepage in this testing. 2. We need to use memory and cores from NUMA node where the NICs are located in. In our setup, NUMA Node 5 satisfy this requirement, so we use NUMA Node 5. 3. Without SEV, the vhost-user works very well. No any error, and MoonGen can receive packets from guest dpdk's testpmd well. So I think we can confirm the SEV is the key reason which cause above two testing results items. == Testing steps: 1. Add "amd_iommu=on iommu=pt default_hugepagesz=1G" in host kernel and reboot host. # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-184.el8.x86_64 root=/dev/mapper/rhel_hp--dl385g10--10-root ro crashkernel=auto resume=/dev/mapper/rhel_hp--dl385g10--10-swap rd.lvm.lv=rhel_hp-dl385g10-10/root rd.lvm.lv=rhel_hp-dl385g10-10/swap console=ttyS0,115200n81 amd_iommu=on iommu=pt default_hugepagesz=1G 2. Enable SEV in kvm_amd # modprobe -r kvm_amd # modprobe kvm_amd sev=1 3. Check NIC in which NUMA node, in this setup, it's in NUMA Node 5. # hwloc-ls Machine (126GB total) Package L#0 NUMANode L#0 (P#0 31GB) ... NUMANode L#5 (P#5 31GB) L3 L#10 (4096KB) + L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10 PU L#20 (P#10) PU L#21 (P#26) L3 L#11 (4096KB) + L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11 PU L#22 (P#11) PU L#23 (P#27) HostBridge L#5 PCIBridge 2 x { PCI 8086:1528 } ... 4. Reserve hugepage from NUMA node 5 # echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages # echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 5. Bind NICs to VFIO # modprobe vfio # modprobe vfio-pci # dpdk-devbind --bind=vfio-pci 0000:a3:00.0 # dpdk-devbind --bind=vfio-pci 0000:a3:00.1 # dpdk-devbind --status Network devices using DPDK-compatible driver ============================================ 0000:a3:00.0 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe 0000:a3:00.1 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe 6. Boot OVS, reserve Hugepage from NUMA node 5. Full XML will be attached in next Comment. ... ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0,0,0,0,1024" ... # ovs-vsctl show ee576f79-4200-49cc-8a47-3548d489ffa8 Bridge ovsbr0 datapath_type: netdev Port dpdk0 Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:a3:00.0", n_rxq="1"} Port ovsbr0 Interface ovsbr0 type: internal Port vhost-user0 Interface vhost-user0 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Bridge ovsbr1 datapath_type: netdev Port ovsbr1 Interface ovsbr1 type: internal Port dpdk1 Interface dpdk1 type: dpdk options: {dpdk-devargs="0000:a3:00.1", n_rxq="1"} Port vhost-user1 Interface vhost-user1 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} 7. Boot QEMU with 2 vhost-user ports enabling SEV /usr/libexec/qemu-kvm \ -enable-kvm \ -cpu EPYC \ -smp 4 \ -m 4G \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=5,policy=bind \ -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \ -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \ -machine q35,memory-encryption=sev0 \ -drive if=pflash,format=raw,unit=0,file=/usr/share/edk2/ovmf/sev/OVMF_CODE.secboot.fd,readonly \ -drive if=pflash,format=raw,unit=1,file=/usr/share/edk2/ovmf/sev/OVMF_VARS.fd \ -device pcie-root-port,id=root.1,chassis=1 \ -device pcie-root-port,id=root.2,chassis=2 \ -device pcie-root-port,id=root.3,chassis=3 \ -device pcie-root-port,id=root.4,chassis=4 \ -device pcie-root-port,id=root.5,chassis=5 \ -device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=root.1,addr=0x0 \ -drive file=/home/sev_guest.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scssi0-0-0-0,bootindex=1 \ -netdev tap,id=hostnet0,vhost=off \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:66:da:57:dd:12,bus=root.2,iommu_platform=on \ -vnc :0 \ -monitor stdio \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,id=hostnet1 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=root.3,iommu_platform=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,id=hostnet2 \ -device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,bus=root.4,iommu_platform=on \ 8. In guest, start testpmd. # modprobe vfio enable_unsafe_noiommu_mode=Y # modprobe vfio-pci # cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode Y # dpdk-devbind --bind=vfio-pci 0000:03:00.0 # dpdk-devbind --bind=vfio-pci 0000:04:00.0 # echo 1 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages # /usr/bin/testpmd \ -l 1,2,3 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so \ -w 0000:03:00.0 -w 0000:04:00.0 \ -- \ --nb-cores=2 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=1 --txq=1
Created attachment 1670725 [details] ovs boot script More info: Testing versions: 4.18.0-184.el8.x86_64 qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64 openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 dpdk-19.11-4.el8.x86_64 Testing servers: hp-dl385g10-10.lab.eng.pek2.redhat.com Testing NICs: X540-AT2 (10G, ixgbe)
Hi David, Could you check the testings in Comment 5 and Comment 6, please? Do you have any Comment? Also, if we plan to support vhost-user with SEV in future, I think we need to file new bugs to track the issues in Comment 5. Please let me know if we need to file and I can file them. Thank you. Best regards, Pei
(In reply to Pei Zhang from comment #7) > Hi David, > > Could you check the testings in Comment 5 and Comment 6, please? Do you have > any Comment? > > Also, if we plan to support vhost-user with SEV in future, I think we need > to file new bugs to track the issues in Comment 5. Please let me know if we > need to file and I can file them. > > > Thank you. > > Best regards, > > Pei This test case combines lots of different things; it has both vhost-user and vfio and iommu's; I think we probably need to try something simpler just with vhost-user in one test and then vfio/iommu's in another test to see which ones cause a problem. However, it would be good to file a bug for the case where the guest reboots. The error about 'cannot bind memory to host NUMA nodes', after you do: echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages please do: cat /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages and check it says 20; sometimes it takes some time, and sometimes it can't move pages to do it. If that doesn't help please file a separate bug on that (please check first if it is really SEV only). I suggest you try your current host configuration and qemu+ovs, but in the guest just run normal Linux network tests to send packets over the vhost-user network rather than using testpmd. That will tell us if normal vhost-user networking works without worrying about VFIO. Dave
(In reply to Dr. David Alan Gilbert from comment #8) ... > This test case combines lots of different things; it has both vhost-user > and vfio and iommu's; > I think we probably need to try something simpler just with vhost-user in > one test and then > vfio/iommu's in another test to see which ones cause a problem. > However, it would be good to file a bug for the case where the guest reboots. > The error about 'cannot bind memory to host NUMA nodes', after you do: > > echo 20 > > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages > please do: > cat > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages > and check it says 20; sometimes it takes some time, and sometimes it can't > move pages to do it. > If that doesn't help please file a separate bug on that (please check first > if it is really SEV only). I've filed Bug 1814502 to track hugepage issue. And it's not related with vhost-user, sev+ hugepage can cause this issue. > > I suggest you try your current host configuration and qemu+ovs, but in the > guest just run normal Linux network tests > to send packets over the vhost-user network rather than using testpmd. That > will tell us if normal vhost-user networking works > without worrying about VFIO. I've filed Bug 1814509 to track guest reboot issue. Next ,I'll run normal linux network tests over vhost-user. The testing results will be updated.
Verified with qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64: With kernel driver, SEV vhost-user network works well. Below step is testing from libvirt. As this is the way to be covered in QE testing. 1. Add "amd_iommu=on iommu=pt default_hugepagesz=1G" in host kernel and reboot host. # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-191.el8.x86_64 root=/dev/mapper/rhel_hp--dl385g10--10-root ro crashkernel=auto resume=/dev/mapper/rhel_hp--dl385g10--10-swap rd.lvm.lv=rhel_hp-dl385g10-10/root rd.lvm.lv=rhel_hp-dl385g10-10/swap console=ttyS0,115200n81 iommu=pt amd_iommu=on default_hugepagesz=1G 2. Enable SEV in kvm_amd # modprobe -r kvm_amd # modprobe kvm_amd sev=1 3. Check NIC in which NUMA node, in this setup, it's in NUMA Node 5. # hwloc-ls Machine (126GB total) Package L#0 NUMANode L#0 (P#0 31GB) ... NUMANode L#5 (P#5 31GB) L3 L#10 (4096KB) + L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10 PU L#20 (P#10) PU L#21 (P#26) L3 L#11 (4096KB) + L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11 PU L#22 (P#11) PU L#23 (P#27) HostBridge L#5 PCIBridge 2 x { PCI 8086:1528 } ... 4. Reserve hugepage from NUMA node 5 # echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages # echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 5. Bind NICs to VFIO # modprobe vfio # modprobe vfio-pci # dpdk-devbind --bind=vfio-pci 0000:a3:00.0 # dpdk-devbind --status Network devices using DPDK-compatible driver ============================================ 0000:a3:00.0 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe ... 6. Boot OVS, reserve Hugepage from NUMA node 5. # cat boot_ovs_client.sh #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0,0,0,0,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:a3:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xC00 ovs-vsctl set Interface dpdk0 options:n_rxq=1 echo "all done" # ovs-vsctl show 4051b115-51b7-44fc-a1ea-5d54296876f9 Bridge ovsbr0 datapath_type: netdev Port ovsbr0 Interface ovsbr0 type: internal Port vhost-user0 Interface vhost-user0 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port dpdk0 Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:a3:00.0", n_rxq="1"} 7. Boot qemu SEV guest with 1 vhost-user. Full XML will be attached in next Comment. 8. In guest, set tmp IP to vhost-user NIC. # ifconfig enp6s0 192.168.1.1/24 9. Do ping testing with another host.(In this setup, two hosts are connected back-to-back). Ping works. Hoever there are 30% packets loss. But without SEV, this issue also exists. So this is not SEV issue. I'll confirm and possibly file another new bug to track this packets loss issue. # ping 192.168.1.2 -c 20 -i 0.1 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.118 ms 64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.149 ms 64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=0.141 ms 64 bytes from 192.168.1.2: icmp_seq=6 ttl=64 time=0.147 ms 64 bytes from 192.168.1.2: icmp_seq=7 ttl=64 time=0.107 ms 64 bytes from 192.168.1.2: icmp_seq=9 ttl=64 time=0.140 ms 64 bytes from 192.168.1.2: icmp_seq=10 ttl=64 time=0.145 ms 64 bytes from 192.168.1.2: icmp_seq=11 ttl=64 time=0.150 ms 64 bytes from 192.168.1.2: icmp_seq=13 ttl=64 time=0.137 ms 64 bytes from 192.168.1.2: icmp_seq=14 ttl=64 time=0.147 ms 64 bytes from 192.168.1.2: icmp_seq=16 ttl=64 time=0.124 ms 64 bytes from 192.168.1.2: icmp_seq=17 ttl=64 time=0.088 ms 64 bytes from 192.168.1.2: icmp_seq=18 ttl=64 time=0.108 ms 64 bytes from 192.168.1.2: icmp_seq=20 ttl=64 time=0.134 ms --- 192.168.1.2 ping statistics --- 20 packets transmitted, 14 received, 30% packet loss, time 979ms rtt min/avg/max/mdev = 0.088/0.131/0.150/0.019 ms Other versions info: 4.18.0-191.el8.x86_64 qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 openvswitch2.13-2.13.0-9.el8fdp.x86_64
Created attachment 1673074 [details] SEV guest with vhost-user
As Comment 10, I would like to move this bug to 'VERIFIED'. David, please let me know if you have any concern about the verification in Comment 10.
Yep, that looks good to me - thanks! We've got a couple of good test failures out of this; just not the ones I was expecting!
(In reply to Pei Zhang from comment #10) [...] > > 9. Do ping testing with another host.(In this setup, two hosts are connected > back-to-back). Ping works. Hoever there are 30% packets loss. But without > SEV, this issue also exists. So this is not SEV issue. I'll confirm and > possibly file another new bug to track this packets loss issue. Update: This ping loss is expected with scenario "vhost-user + vIOMMU + kernel virtio-net driver in guest". This was explained by Maxime with Bug 1572879#c13: "This issue happens when using vhost-user with vIOMMU enabled and with Kernel Virtio-net driver in guest. This combination is not recommended for performance reasons[0], but the problem is real and should be fixed in the future. I think we can put a low priority on this bug. [0]: This setup is not recommended, because when using kernel driver in guest the performance with vIOMMU enabled is very bad, because Kernel driver has a dynamic mapping that creates huge overhead on vhost-user backend side as every packet results in an IOTLB cache miss. The use of vIOMMU with vhost-user backend is recommended when using DPDK Virtio PMD in guest, in that case it provides the guest with user application and kernel isolation, while the overhead is close to nil. "
After mail discussion with David, as there is packet loss during the testing in Comment 10, we should fail on_qa. Move back to Assigned status.
OK, given that SEV has some problems with vhost-user due to the iommu behaviour, I'm turning this from a testonly into a full bug. However, I'm not sure we're needing it to work with vhost-user imminently; so lets leave this bug on the backlog.
Based on comment 16, adding Triaged so that it's really in the backlog
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.