Bug 1797058
| Summary: | AMD/SEV: vhost-user support | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Dr. David Alan Gilbert <dgilbert> | ||||||
| Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> | ||||||
| qemu-kvm sub component: | Devices | QA Contact: | Pei Zhang <pezhang> | ||||||
| Status: | CLOSED WONTFIX | Docs Contact: | |||||||
| Severity: | low | ||||||||
| Priority: | low | CC: | ailan, chayang, coli, jinzhao, juzhang, maxime.coquelin, virt-maint, yanghliu | ||||||
| Version: | 8.2 | Keywords: | Triaged | ||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||
| Target Release: | 8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-07-31 07:27:16 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Dr. David Alan Gilbert
2020-01-31 20:00:57 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Testing Update: SEV guest can boot with vhost-user. Next I'll test if vhost-user can receive MoonGen packets. (As MoonGen tests need back-to-back NIC connection. Besides the NUMA node which NICs are located has no memory now, so I have sent requests to IT people to re-connect to satisfy these specific hardware connections. I'll update final results soon) Testing summary: SEV doesn't support vhost-user well.
== Testing result:
1. Sometimes guest can boot successfully. However starting dpdk's testpmd in guest will cause guest rebooting. In below testing steps, guest will always reboot after step 8 in this condition.
2. Sometimes guest fail boot with below error. In below testing steps, guest fail boot in step 7.
qemu-kvm: -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=5,policy=bind: cannot bind memory to host NUMA nodes: Input/output error
== Testing highlight:
1. For vhost-user testing, hugepage is a must. So we have to reserve hugepage in this testing.
2. We need to use memory and cores from NUMA node where the NICs are located in. In our setup, NUMA Node 5 satisfy this requirement, so we use NUMA Node 5.
3. Without SEV, the vhost-user works very well. No any error, and MoonGen can receive packets from guest dpdk's testpmd well. So I think we can confirm the SEV is the key reason which cause above two testing results items.
== Testing steps:
1. Add "amd_iommu=on iommu=pt default_hugepagesz=1G" in host kernel and reboot host.
# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-184.el8.x86_64 root=/dev/mapper/rhel_hp--dl385g10--10-root ro crashkernel=auto resume=/dev/mapper/rhel_hp--dl385g10--10-swap rd.lvm.lv=rhel_hp-dl385g10-10/root rd.lvm.lv=rhel_hp-dl385g10-10/swap console=ttyS0,115200n81 amd_iommu=on iommu=pt default_hugepagesz=1G
2. Enable SEV in kvm_amd
# modprobe -r kvm_amd
# modprobe kvm_amd sev=1
3. Check NIC in which NUMA node, in this setup, it's in NUMA Node 5.
# hwloc-ls
Machine (126GB total)
Package L#0
NUMANode L#0 (P#0 31GB)
...
NUMANode L#5 (P#5 31GB)
L3 L#10 (4096KB) + L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10
PU L#20 (P#10)
PU L#21 (P#26)
L3 L#11 (4096KB) + L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11
PU L#22 (P#11)
PU L#23 (P#27)
HostBridge L#5
PCIBridge
2 x { PCI 8086:1528 }
...
4. Reserve hugepage from NUMA node 5
# echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages
# echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
5. Bind NICs to VFIO
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind --bind=vfio-pci 0000:a3:00.0
# dpdk-devbind --bind=vfio-pci 0000:a3:00.1
# dpdk-devbind --status
Network devices using DPDK-compatible driver
============================================
0000:a3:00.0 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe
0000:a3:00.1 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe
6. Boot OVS, reserve Hugepage from NUMA node 5. Full XML will be attached in next Comment.
...
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0,0,0,0,1024"
...
# ovs-vsctl show
ee576f79-4200-49cc-8a47-3548d489ffa8
Bridge ovsbr0
datapath_type: netdev
Port dpdk0
Interface dpdk0
type: dpdk
options: {dpdk-devargs="0000:a3:00.0", n_rxq="1"}
Port ovsbr0
Interface ovsbr0
type: internal
Port vhost-user0
Interface vhost-user0
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhostuser0.sock"}
Bridge ovsbr1
datapath_type: netdev
Port ovsbr1
Interface ovsbr1
type: internal
Port dpdk1
Interface dpdk1
type: dpdk
options: {dpdk-devargs="0000:a3:00.1", n_rxq="1"}
Port vhost-user1
Interface vhost-user1
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhostuser1.sock"}
7. Boot QEMU with 2 vhost-user ports enabling SEV
/usr/libexec/qemu-kvm \
-enable-kvm \
-cpu EPYC \
-smp 4 \
-m 4G \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=5,policy=bind \
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \
-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \
-machine q35,memory-encryption=sev0 \
-drive if=pflash,format=raw,unit=0,file=/usr/share/edk2/ovmf/sev/OVMF_CODE.secboot.fd,readonly \
-drive if=pflash,format=raw,unit=1,file=/usr/share/edk2/ovmf/sev/OVMF_VARS.fd \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-device pcie-root-port,id=root.4,chassis=4 \
-device pcie-root-port,id=root.5,chassis=5 \
-device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=root.1,addr=0x0 \
-drive file=/home/sev_guest.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scssi0-0-0-0,bootindex=1 \
-netdev tap,id=hostnet0,vhost=off \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:66:da:57:dd:12,bus=root.2,iommu_platform=on \
-vnc :0 \
-monitor stdio \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=root.3,iommu_platform=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,id=hostnet2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,bus=root.4,iommu_platform=on \
8. In guest, start testpmd.
# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci
# cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
Y
# dpdk-devbind --bind=vfio-pci 0000:03:00.0
# dpdk-devbind --bind=vfio-pci 0000:04:00.0
# echo 1 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# /usr/bin/testpmd \
-l 1,2,3 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so \
-w 0000:03:00.0 -w 0000:04:00.0 \
-- \
--nb-cores=2 \
-i \
--disable-rss \
--rxd=512 --txd=512 \
--rxq=1 --txq=1
Created attachment 1670725 [details]
ovs boot script
More info:
Testing versions:
4.18.0-184.el8.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64
openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64
dpdk-19.11-4.el8.x86_64
Testing servers:
hp-dl385g10-10.lab.eng.pek2.redhat.com
Testing NICs:
X540-AT2 (10G, ixgbe)
Hi David, Could you check the testings in Comment 5 and Comment 6, please? Do you have any Comment? Also, if we plan to support vhost-user with SEV in future, I think we need to file new bugs to track the issues in Comment 5. Please let me know if we need to file and I can file them. Thank you. Best regards, Pei (In reply to Pei Zhang from comment #7) > Hi David, > > Could you check the testings in Comment 5 and Comment 6, please? Do you have > any Comment? > > Also, if we plan to support vhost-user with SEV in future, I think we need > to file new bugs to track the issues in Comment 5. Please let me know if we > need to file and I can file them. > > > Thank you. > > Best regards, > > Pei This test case combines lots of different things; it has both vhost-user and vfio and iommu's; I think we probably need to try something simpler just with vhost-user in one test and then vfio/iommu's in another test to see which ones cause a problem. However, it would be good to file a bug for the case where the guest reboots. The error about 'cannot bind memory to host NUMA nodes', after you do: echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages please do: cat /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages and check it says 20; sometimes it takes some time, and sometimes it can't move pages to do it. If that doesn't help please file a separate bug on that (please check first if it is really SEV only). I suggest you try your current host configuration and qemu+ovs, but in the guest just run normal Linux network tests to send packets over the vhost-user network rather than using testpmd. That will tell us if normal vhost-user networking works without worrying about VFIO. Dave (In reply to Dr. David Alan Gilbert from comment #8) ... > This test case combines lots of different things; it has both vhost-user > and vfio and iommu's; > I think we probably need to try something simpler just with vhost-user in > one test and then > vfio/iommu's in another test to see which ones cause a problem. > However, it would be good to file a bug for the case where the guest reboots. > The error about 'cannot bind memory to host NUMA nodes', after you do: > > echo 20 > > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages > please do: > cat > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages > and check it says 20; sometimes it takes some time, and sometimes it can't > move pages to do it. > If that doesn't help please file a separate bug on that (please check first > if it is really SEV only). I've filed Bug 1814502 to track hugepage issue. And it's not related with vhost-user, sev+ hugepage can cause this issue. > > I suggest you try your current host configuration and qemu+ovs, but in the > guest just run normal Linux network tests > to send packets over the vhost-user network rather than using testpmd. That > will tell us if normal vhost-user networking works > without worrying about VFIO. I've filed Bug 1814509 to track guest reboot issue. Next ,I'll run normal linux network tests over vhost-user. The testing results will be updated. Verified with qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64:
With kernel driver, SEV vhost-user network works well. Below step is testing from libvirt. As this is the way to be covered in QE testing.
1. Add "amd_iommu=on iommu=pt default_hugepagesz=1G" in host kernel and reboot host.
# cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-191.el8.x86_64 root=/dev/mapper/rhel_hp--dl385g10--10-root ro crashkernel=auto resume=/dev/mapper/rhel_hp--dl385g10--10-swap rd.lvm.lv=rhel_hp-dl385g10-10/root rd.lvm.lv=rhel_hp-dl385g10-10/swap console=ttyS0,115200n81 iommu=pt amd_iommu=on default_hugepagesz=1G
2. Enable SEV in kvm_amd
# modprobe -r kvm_amd
# modprobe kvm_amd sev=1
3. Check NIC in which NUMA node, in this setup, it's in NUMA Node 5.
# hwloc-ls
Machine (126GB total)
Package L#0
NUMANode L#0 (P#0 31GB)
...
NUMANode L#5 (P#5 31GB)
L3 L#10 (4096KB) + L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (64KB) + Core L#10
PU L#20 (P#10)
PU L#21 (P#26)
L3 L#11 (4096KB) + L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (64KB) + Core L#11
PU L#22 (P#11)
PU L#23 (P#27)
HostBridge L#5
PCIBridge
2 x { PCI 8086:1528 }
...
4. Reserve hugepage from NUMA node 5
# echo 20 > /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/nr_hugepages
# echo 10 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
5. Bind NICs to VFIO
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind --bind=vfio-pci 0000:a3:00.0
# dpdk-devbind --status
Network devices using DPDK-compatible driver
============================================
0000:a3:00.0 'Ethernet Controller 10-Gigabit X540-AT2 1528' drv=vfio-pci unused=ixgbe
...
6. Boot OVS, reserve Hugepage from NUMA node 5.
# cat boot_ovs_client.sh
#!/bin/bash
set -e
echo "killing old ovs process"
pkill -f ovs-vswitchd || true
sleep 5
pkill -f ovsdb-server || true
echo "probing ovs kernel module"
modprobe -r openvswitch || true
modprobe openvswitch
echo "clean env"
DB_FILE=/etc/openvswitch/conf.db
rm -rf /var/run/openvswitch
mkdir /var/run/openvswitch
rm -f $DB_FILE
echo "init ovs db and boot db server"
export DB_SOCK=/var/run/openvswitch/db.sock
ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file
ovs-vsctl --no-wait init
echo "start ovs vswitch daemon"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0,0,0,0,1024"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log
echo "creating bridge and ports"
ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:a3:00.0
ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock
ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xC00
ovs-vsctl set Interface dpdk0 options:n_rxq=1
echo "all done"
# ovs-vsctl show
4051b115-51b7-44fc-a1ea-5d54296876f9
Bridge ovsbr0
datapath_type: netdev
Port ovsbr0
Interface ovsbr0
type: internal
Port vhost-user0
Interface vhost-user0
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhostuser0.sock"}
Port dpdk0
Interface dpdk0
type: dpdk
options: {dpdk-devargs="0000:a3:00.0", n_rxq="1"}
7. Boot qemu SEV guest with 1 vhost-user. Full XML will be attached in next Comment.
8. In guest, set tmp IP to vhost-user NIC.
# ifconfig enp6s0 192.168.1.1/24
9. Do ping testing with another host.(In this setup, two hosts are connected back-to-back). Ping works. Hoever there are 30% packets loss. But without SEV, this issue also exists. So this is not SEV issue. I'll confirm and possibly file another new bug to track this packets loss issue.
# ping 192.168.1.2 -c 20 -i 0.1
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.118 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.149 ms
64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=0.141 ms
64 bytes from 192.168.1.2: icmp_seq=6 ttl=64 time=0.147 ms
64 bytes from 192.168.1.2: icmp_seq=7 ttl=64 time=0.107 ms
64 bytes from 192.168.1.2: icmp_seq=9 ttl=64 time=0.140 ms
64 bytes from 192.168.1.2: icmp_seq=10 ttl=64 time=0.145 ms
64 bytes from 192.168.1.2: icmp_seq=11 ttl=64 time=0.150 ms
64 bytes from 192.168.1.2: icmp_seq=13 ttl=64 time=0.137 ms
64 bytes from 192.168.1.2: icmp_seq=14 ttl=64 time=0.147 ms
64 bytes from 192.168.1.2: icmp_seq=16 ttl=64 time=0.124 ms
64 bytes from 192.168.1.2: icmp_seq=17 ttl=64 time=0.088 ms
64 bytes from 192.168.1.2: icmp_seq=18 ttl=64 time=0.108 ms
64 bytes from 192.168.1.2: icmp_seq=20 ttl=64 time=0.134 ms
--- 192.168.1.2 ping statistics ---
20 packets transmitted, 14 received, 30% packet loss, time 979ms
rtt min/avg/max/mdev = 0.088/0.131/0.150/0.019 ms
Other versions info:
4.18.0-191.el8.x86_64
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
openvswitch2.13-2.13.0-9.el8fdp.x86_64
Created attachment 1673074 [details]
SEV guest with vhost-user
As Comment 10, I would like to move this bug to 'VERIFIED'. David, please let me know if you have any concern about the verification in Comment 10. Yep, that looks good to me - thanks! We've got a couple of good test failures out of this; just not the ones I was expecting! (In reply to Pei Zhang from comment #10) [...] > > 9. Do ping testing with another host.(In this setup, two hosts are connected > back-to-back). Ping works. Hoever there are 30% packets loss. But without > SEV, this issue also exists. So this is not SEV issue. I'll confirm and > possibly file another new bug to track this packets loss issue. Update: This ping loss is expected with scenario "vhost-user + vIOMMU + kernel virtio-net driver in guest". This was explained by Maxime with Bug 1572879#c13: "This issue happens when using vhost-user with vIOMMU enabled and with Kernel Virtio-net driver in guest. This combination is not recommended for performance reasons[0], but the problem is real and should be fixed in the future. I think we can put a low priority on this bug. [0]: This setup is not recommended, because when using kernel driver in guest the performance with vIOMMU enabled is very bad, because Kernel driver has a dynamic mapping that creates huge overhead on vhost-user backend side as every packet results in an IOTLB cache miss. The use of vIOMMU with vhost-user backend is recommended when using DPDK Virtio PMD in guest, in that case it provides the guest with user application and kernel isolation, while the overhead is close to nil. " After mail discussion with David, as there is packet loss during the testing in Comment 10, we should fail on_qa. Move back to Assigned status. OK, given that SEV has some problems with vhost-user due to the iommu behaviour, I'm turning this from a testonly into a full bug. However, I'm not sure we're needing it to work with vhost-user imminently; so lets leave this bug on the backlog. Based on comment 16, adding Triaged so that it's really in the backlog After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |