Bug 1531543
Summary: | [RFE] add iommu support to virtio-gpu | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | yafu <yafu> | ||||||||||||
Component: | qemu-kvm | Assignee: | Gerd Hoffmann <kraxel> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Guo, Zhiyi <zhguo> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 8.0 | CC: | areis, brijesh.singh, chayang, ddutile, dyuan, fjin, hpopal, jinzhao, juzhang, knoel, kraxel, lersek, marcandre.lureau, pezhang, qzhang, rbalakri, thomas.lendacky, virt-maint, wchadwic, zhguo, zpeng | ||||||||||||
Target Milestone: | pre-dev-freeze | Keywords: | FutureFeature | ||||||||||||
Target Release: | 8.1 | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01 | Doc Type: | If docs needed, set a value | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 1540181 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2019-11-05 20:46:46 UTC | Type: | Feature Request | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | 1685552, 1738631, 1739291 | ||||||||||||||
Bug Blocks: | |||||||||||||||
Attachments: |
|
Description
yafu
2018-01-05 13:16:04 UTC
Created attachment 1377489 [details]
domain xml
kernel side fix is in drm-misc (should land in the next merge window, i.e. 4.20 (unless that'll be 5.0 ...) pull request pending for the qemu side fix, should make it into qemu 3.1 Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1501618#c20, AMD SEV needs virtio-vga to support iommu too. Without iommu, SEV guest will suffer repeat call trace and lose response. scratch build (still building atm), please test: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21632356 I guess we cannot check the behavior of this bug until drm rebase is finished.. Gerd, could you help to build a scratch build after drm rebase done? Thanks! BR/ Zhiyi (In reply to Guo, Zhiyi from comment #20) > I guess we cannot check the behavior of this bug until drm rebase is > finished.. RHEL-8 yes. Testing with a fedora guest should work though, they have a new enough kernel. Well..Seems some regression happen on latest Fedora kernel 5.1.5-300.fc30.x86_64 Using qemu-kvm-2.12.0-63.el8_0.1.bz1531543.3.x86_64 and enable sev options, two issues are found: 1. guest kernel call trace: [ 2.721928] ? device_driver_attach+0x60/0x60 [ 2.721929] bus_for_each_dev+0x78/0xc0 [ 2.721931] bus_add_driver+0x14a/0x1e0 [ 2.721932] driver_register+0x6c/0xb0 [ 2.721933] ? 0xffffffffc03fe000 [ 2.721936] do_one_initcall+0x46/0x1c4 [ 2.721938] ? free_unref_page_commit+0x95/0x110 [ 2.721941] ? _cond_resched+0x15/0x30 [ 2.721943] ? kmem_cache_alloc_trace+0x154/0x1c0 [ 2.721945] ? do_init_module+0x23/0x210 [ 2.721946] do_init_module+0x5c/0x210 [ 2.721947] load_module+0x23de/0x2910 [ 2.721948] ? _cond_resched+0x15/0x30 [ 2.721950] ? __do_sys_init_module+0x162/0x190 [ 2.721951] __do_sys_init_module+0x162/0x190 [ 2.721953] do_syscall_64+0x5b/0x170 [ 2.721955] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2.721956] RIP: 0033:0x7f6068e0ebae [ 2.721959] Code: 48 8b 0d dd 42 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 42 0c 00 f7 d8 64 89 01 48 [ 2.721959] RSP: 002b:00007ffdb3b4b068 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 2.721960] RAX: ffffffffffffffda RBX: 00005642a1a89480 RCX: 00007f6068e0ebae [ 2.721961] RDX: 00007f6068a6384d RSI: 000000000001d9be RDI: 00005642a2328a50 [ 2.721961] RBP: 00005642a2328a50 R08: 00005642a1a89e50 R09: 0000000000000006 [ 2.721962] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f6068a6384d [ 2.721962] R13: 0000000000000001 R14: 00005642a1a7c1e0 R15: 00005642a1a7e390 [ 2.721964] ---[ end trace f77acc239227cf53 ]--- [ 2.775462] fbcon: Deferring console take-over [ 2.776320] virtio_gpu virtio0: fb0: DRM emulated frame buffer device 2. Errors prompt repeatedly from dmesg: [ 7.922289] fbcon: Taking over console [ 7.923279] Console: switching to colour frame buffer device 128x48 [ 7.925859] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.927510] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.929083] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.944523] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.946081] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.947606] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.956789] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 7.960513] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 8.129292] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 8.335154] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 8.543169] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 8.751152] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) Issues cannot be reproduced after remove sev option from guest qemu cli used: /usr/libexec/qemu-kvm \ -name guest=Fedora30,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-15-Fedora30/master-key.aes \ -machine pc-q35-rhel7.6.0,accel=kvm,usb=off,smm=on,dump-guest-core=off,memory-encryption=sev0 \ -cpu EPYC-IBPB,x2apic=on,tsc-deadline=on,hypervisor=on,tsc_adjust=on,cmp_legacy=on,perfctr_core=on,virt-ssbd=on,monitor=off,svm=off \ -global driver=cfi.pflash01,property=secure,value=on \ -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/Fedora30_VARS.fd,if=pflash,format=raw,unit=1 \ -m 8192 \ -realtime mlock=off \ -smp 2,sockets=1,cores=1,threads=2 \ -uuid 6dcdf378-3786-40bf-9d8e-e53522b95822 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=29,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \ -device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.2,addr=0x0 \ -device virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0 \ -drive file=/home/fedora30.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on \ -netdev tap,fd=30,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:85:41:b0,bus=pci.1,addr=0x0,romfile=,iommu_platform=on \ -chardev socket,id=charserial0,fd=31,server,nowait \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=32,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:0 \ -device virtio-vga,id=video0,max_outputs=1,bus=pcie.0,addr=0x1,iommu_platform=on,ats=on \ -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0,iommu_platform=on \ -object rng-random,id=objrng0,filename=/dev/urandom \ -device virtio-rng-pci,rng=objrng0,id=rng0,iommu_platform=on,bus=pci.6,addr=0x0 \ -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x1 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on Hmm..call trace for issue 1 is not complete, please see this one: [ 2.661728] [drm] pci: virtio-vga detected at 0000:00:01.0 [ 2.665214] fb0: switching to virtiodrmfb from EFI VGA [ 2.668179] virtio-pci 0000:00:01.0: vgaarb: deactivate vga console [ 2.672353] [drm] virgl 3d acceleration not supported by host [ 2.682343] [TTM] Zone kernel: Available graphics memory: 4074364 kiB [ 2.683466] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 2.684484] [TTM] Initializing pool allocator [ 2.694486] [TTM] Initializing DMA pool allocator [ 2.696680] [drm] number of scanouts: 1 [ 2.697911] [drm] number of cap sets: 0 [ 2.699238] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0 [ 2.701683] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes) [ 2.703004] virtio_net virtio1 enp1s0: renamed from eth0 [ 2.703163] virtio-pci 0000:00:01.0: overflow 0x000080026ea00000+2097152 of DMA mask ffffffffffffffff bus mask 0 [ 2.705797] WARNING: CPU: 0 PID: 429 at kernel/dma/direct.c:43 report_addr+0x33/0x60 [ 2.707117] Modules linked in: crc32c_intel virtio_gpu(+) drm_kms_helper serio_raw ttm virtio_console virtio_net drm virtio_scsi net_failover failover qemu_fw_cfg [ 2.709563] CPU: 0 PID: 429 Comm: systemd-udevd Tainted: G W 5.1.5-300.fc30.x86_64 #1 [ 2.711108] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015 [ 2.712117] RIP: 0010:report_addr+0x33/0x60 [ 2.712824] Code: 48 8b 87 30 02 00 00 48 89 34 24 48 85 c0 74 2d 4c 8b 00 b8 fe ff ff ff 49 39 c0 76 14 80 3d 6b cd 21 01 00 0f 84 b9 06 00 00 <0f> 0b 48 83 c4 08 c3 48 83 bf 40 02 00 00 00 74 ef eb e0 80 3d 4c [ 2.716006] RSP: 0018:ffffa53b010bb8d0 EFLAGS: 00010246 [ 2.716904] RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000000 [ 2.718147] RDX: ffff948bb7a1ce80 RSI: ffff948bb7a168c8 RDI: ffff948bb7a168c8 [ 2.719353] RBP: ffff948bb750b0b0 R08: ffff948bb7a168c8 R09: 00000000000002be [ 2.720576] R10: ffffffffb09cc158 R11: ffffa53b010bb635 R12: 0000000000200000 [ 2.721842] R13: 0000000000000001 R14: 0000000000000000 R15: ffff948bae869000 [ 2.721845] FS: 00007f6067e0d940(0000) GS:ffff948bb7a00000(0000) knlGS:0000000000000000 [ 2.721846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.721847] CR2: 00007f6068c21fe0 CR3: 0000800270202000 CR4: 00000000003406f0 [ 2.721849] Call Trace: [ 2.721855] dma_direct_map_page+0xdf/0xf0 [ 2.721858] dma_direct_map_sg+0x67/0xb0 [ 2.721866] virtio_gpu_object_attach+0x1fb/0x250 [virtio_gpu] [ 2.721872] virtio_gpu_mode_dumb_create+0xaa/0xd0 [virtio_gpu] [ 2.721884] drm_client_framebuffer_create+0xa1/0x220 [drm] [ 2.721893] drm_fb_helper_generic_probe+0x4d/0x1f0 [drm_kms_helper] [ 2.721899] __drm_fb_helper_initial_config_and_unlock+0x29b/0x430 [drm_kms_helper] [ 2.721906] drm_fbdev_client_hotplug+0xea/0x160 [drm_kms_helper] [ 2.721911] drm_fbdev_generic_setup+0x93/0x120 [drm_kms_helper] [ 2.721915] virtio_gpu_probe+0xe8/0x100 [virtio_gpu] [ 2.721918] virtio_dev_probe+0x142/0x1e0 [ 2.721921] really_probe+0xf9/0x3a0 [ 2.721923] driver_probe_device+0xb6/0x100 [ 2.721925] device_driver_attach+0x55/0x60 [ 2.721926] __driver_attach+0x8a/0x150 [ 2.721928] ? device_driver_attach+0x60/0x60 [ 2.721929] bus_for_each_dev+0x78/0xc0 [ 2.721931] bus_add_driver+0x14a/0x1e0 [ 2.721932] driver_register+0x6c/0xb0 [ 2.721933] ? 0xffffffffc03fe000 [ 2.721936] do_one_initcall+0x46/0x1c4 [ 2.721938] ? free_unref_page_commit+0x95/0x110 [ 2.721941] ? _cond_resched+0x15/0x30 [ 2.721943] ? kmem_cache_alloc_trace+0x154/0x1c0 [ 2.721945] ? do_init_module+0x23/0x210 [ 2.721946] do_init_module+0x5c/0x210 [ 2.721947] load_module+0x23de/0x2910 [ 2.721948] ? _cond_resched+0x15/0x30 [ 2.721950] ? __do_sys_init_module+0x162/0x190 [ 2.721951] __do_sys_init_module+0x162/0x190 [ 2.721953] do_syscall_64+0x5b/0x170 [ 2.721955] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2.721956] RIP: 0033:0x7f6068e0ebae [ 2.721959] Code: 48 8b 0d dd 42 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 42 0c 00 f7 d8 64 89 01 48 [ 2.721959] RSP: 002b:00007ffdb3b4b068 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 2.721960] RAX: ffffffffffffffda RBX: 00005642a1a89480 RCX: 00007f6068e0ebae [ 2.721961] RDX: 00007f6068a6384d RSI: 000000000001d9be RDI: 00005642a2328a50 [ 2.721961] RBP: 00005642a2328a50 R08: 00005642a1a89e50 R09: 0000000000000006 [ 2.721962] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f6068a6384d [ 2.721962] R13: 0000000000000001 R14: 00005642a1a7c1e0 R15: 00005642a1a7e390 [ 2.721964] ---[ end trace f77acc239227cf53 ]--- [ 2.775462] fbcon: Deferring console take-over [ 2.776320] virtio_gpu virtio0: fb0: DRM emulated frame buffer device fresh brew build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22016408 (In reply to Gerd Hoffmann from comment #24) > fresh brew build > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22016408 Building is failed: Error: Problem: package systemtap-4.1-2.el8.x86_64 requires systemtap-client = 4.1-2.el8, but none of the providers can be installed - package systemtap-client-4.1-2.el8.x86_64 requires systemtap-runtime = 4.1-2.el8, but none of the providers can be installed - conflicting requests - nothing provides libdyninstAPI.so.9.3()(64bit) needed by systemtap-runtime-4.1-2.el8.x86_64 [ 2.701683] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes) [ 2.703163] virtio-pci 0000:00:01.0: overflow 0x000080026ea00000+2097152 of DMA mask ffffffffffffffff bus mask 0 [ 2.705797] WARNING: CPU: 0 PID: 429 at kernel/dma/direct.c:43 report_addr+0x33/0x60 [ 2.721849] Call Trace: [ 2.721855] dma_direct_map_page+0xdf/0xf0 [ 2.721858] dma_direct_map_sg+0x67/0xb0 [ 2.721866] virtio_gpu_object_attach+0x1fb/0x250 [virtio_gpu] [ 2.721872] virtio_gpu_mode_dumb_create+0xaa/0xd0 [virtio_gpu] [ 2.721884] drm_client_framebuffer_create+0xa1/0x220 [drm] Fails over while trying to map the framebuffer for the console. Which is 1024x768 @ 32bpp by default -> 3145728 bytes. That is larger than the whole swiotlb buffer size (see first line). There is the "swiotlb=<size>" command line argument to configure the size. Does it help to make it larger? From the https://github.com/AMDESE/AMDSEV, here is a description about swiotlb size: "When SEV is enabled, all the DMA operations inside the guest are performed on the shared memory. Linux kernel uses SWIOTLB bounce buffer for DMA operations inside SEV guest. A guest panic will occur if kernel runs out of the SWIOTLB pool. Linux kernel default to 64MB SWIOTLB pool. It is recommended to increase the swiotlb pool size to 512MB. The swiotlb pool size can be increased in guest by appending the following in the grub.cfg file Append the following in /etc/defaults/grub GRUB_CMDLINE_LINUX_DEFAULT=".... swiotlb=262144" Change the size to 1GB(swiotlb=1048576) will always cause kernel panic(I tried 5 times): [ 0.205306] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:146 phys_pud_init+0x31e/0x396 [ 0.205307] Modules linked in: [ 0.205310] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.5-300.fc30.x86_64 #1 [ 0.205311] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015 [ 0.205313] RIP: 0010:phys_pud_init+0x31e/0x396 [ 0.205315] Code: 2b 3d e1 2d 7f 00 48 01 ef 48 0b 3d 2f 19 8a 00 48 83 cf 67 ff 14 25 10 e4 22 8d 48 8b 3b 48 89 c6 e8 5d 16 6d ff 85 c0 75 02 <0f> 0b 48 8b 3d 7c 91 87 00 4d 85 ff 75 0e 48 c7 c7 00 00 00 80 48 [ 0.205316] RSP: 0000:ffffffff8d203e10 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 [ 0.205318] RAX: 0000000000000000 RBX: ffff89f906e01f68 RCX: ffffffffffffffff [ 0.205319] RDX: 80008002400001e3 RSI: 0000800006e06067 RDI: 0000800006e06067 [ 0.205320] RBP: ffff89f986e06000 R08: 8000800000000163 R09: 0000000000000092 [ 0.205321] R10: ffffffff8d9c05b4 R11: 0000000000000007 R12: 0000000280000000 [ 0.205322] R13: 0000000280000000 R14: 0000000000000004 R15: 0000000000000000 [ 0.205328] FS: 0000000000000000(0000) GS:ffff89fb77a00000(0000) knlGS:0000000000000000 [ 0.205329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.205330] CR2: ffff89fb7ffff000 CR3: 000080000620e000 CR4: 00000000000406b0 [ 0.205335] Call Trace: [ 0.205340] kernel_physical_mapping_init+0xc9/0x259 [ 0.205344] early_set_memory_enc_dec+0x122/0x176 [ 0.205348] kvm_smp_prepare_boot_cpu+0x65/0x95 [ 0.205351] start_kernel+0x1cf/0x512 [ 0.205355] secondary_startup_64+0xa4/0xb0 [ 0.205358] ---[ end trace 372224d2e1643db5 ]--- [ 0.205369] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 phys_pmd_init+0x309/0x380 [ 0.205370] Modules linked in: [ 0.205371] CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.1.5-300.fc30.x86_64 #1 [ 0.205372] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015 [ 0.205373] RIP: 0010:phys_pmd_init+0x309/0x380 [ 0.205375] Code: 2b 3d 76 31 7f 00 4c 01 ef 48 0b 3d c4 1c 8a 00 48 83 cf 67 ff 14 25 00 e4 22 8d 48 8b 3b 48 89 c6 e8 d4 19 6d ff 85 c0 75 02 <0f> 0b 48 8b 3d 11 95 87 00 48 85 ed 75 0e 48 c7 c7 00 00 00 80 48 [ 0.205376] RSP: 0000:ffffffff8d203da8 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 [ 0.205377] RAX: 0000000000000000 RBX: ffff89f906e06de8 RCX: ffffffffffffffff [ 0.205378] RDX: 8000800277a001e3 RSI: 0000800006e07067 RDI: 0000800006e07067 [ 0.205379] RBP: 0000000000000000 R08: ffff89f906e07000 R09: 00000000000000a9 [ 0.205379] R10: ffffffff8d9c0c54 R11: 0000000000000007 R12: 0000000277c00000 [ 0.205380] R13: ffff89f986e07000 R14: 0000000277c00000 R15: 0000000277c00000 [ 0.205385] FS: 0000000000000000(0000) GS:ffff89fb77a00000(0000) knlGS:0000000000000000 [ 0.205386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.205387] CR2: ffff89fb7ffff000 CR3: 000080000620e000 CR4: 00000000000406b0 [ 0.205390] Call Trace: [ 0.205392] phys_pud_init+0x165/0x396 [ 0.205394] kernel_physical_mapping_init+0xc9/0x259 [ 0.205396] early_set_memory_enc_dec+0x122/0x176 [ 0.205398] kvm_smp_prepare_boot_cpu+0x65/0x95 [ 0.205400] start_kernel+0x1cf/0x512 [ 0.205402] secondary_startup_64+0xa4/0xb0 [ 0.205404] ---[ end trace 372224d2e1643db6 ]--- [ 0.205464] KVM setup async PF for cpu 0 [ 0.205474] kvm-stealtime: cpu 0, msr 277a23040 [ 0.205480] Built 1 zonelists, mobility grouping on. Total pages: 2058073 [ 0.205481] Policy zone: Normal [ 0.205484] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.1.5-300.fc30.x86_64 root=/dev/mapper/fedora_bootp--73--227--160-root ro resume=/dev/mapper/fedora_bootp--73--227--160-swap rd.lvm.lv=fedora_bootp-73-227-160/root rd.lvm.lv=fedora_bootp-73-227-160/swap console=ttyS0,115200 swiotlb=1048576 [ 0.205604] software IO TLB: Cannot allocate buffer ... [ 1.486031] Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer [ 1.486652] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 5.1.5-300.fc30.x86_64 #1 [ 1.486652] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015 [ 1.486652] Call Trace: [ 1.486652] dump_stack+0x5c/0x80 [ 1.486652] panic+0x101/0x2a7 [ 1.486652] swiotlb_tbl_map_single.cold+0x28/0x28 [ 1.486652] swiotlb_map+0x65/0x190 [ 1.486652] ? vp_synchronize_vectors+0x60/0x60 [ 1.486652] ? virtrng_scan+0x30/0x30 [ 1.486652] dma_direct_map_page+0xbe/0xf0 [ 1.486652] virtqueue_add_inbuf+0x21a/0x670 [ 1.486652] virtio_read+0xae/0xd0 [ 1.486652] add_early_randomness+0x4f/0xc0 [ 1.486652] set_current_rng+0x4c/0x140 [ 1.486652] hwrng_register+0x161/0x180 [ 1.486652] virtrng_scan+0x15/0x30 [ 1.486652] virtio_dev_probe+0x174/0x1e0 [ 1.486652] really_probe+0xf9/0x3a0 [ 1.486652] driver_probe_device+0xb6/0x100 [ 1.486652] device_driver_attach+0x55/0x60 [ 1.486652] __driver_attach+0x8a/0x150 [ 1.486652] ? device_driver_attach+0x60/0x60 [ 1.486652] bus_for_each_dev+0x78/0xc0 [ 1.486652] bus_add_driver+0x14a/0x1e0 [ 1.486652] driver_register+0x6c/0xb0 [ 1.486652] ? hwrng_modinit+0x82/0x82 [ 1.486652] do_one_initcall+0x46/0x1c4 [ 1.486652] kernel_init_freeable+0x1a6/0x24d [ 1.486652] ? rest_init+0xaa/0xaa [ 1.486652] kernel_init+0xa/0x106 [ 1.486652] ret_from_fork+0x22/0x40 [ 1.486652] Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 1.486652] ---[ end Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer ]--- Change the size to 512MB(swiotlb=524288) has two results. In 5 trials, kernel panic(logs are same as 1GB swiotlb pool) happen 2 times. For the rest of trials, the behaviors are same as comment 22 Change the size to 768MB(swiotlb=786432) has the same behaviors as 512MB (In reply to Guo, Zhiyi from comment #27) > From the https://github.com/AMDESE/AMDSEV, here is a description about > swiotlb size: > "When SEV is enabled, all the DMA operations inside the guest are performed > on the shared memory. Linux kernel uses SWIOTLB bounce buffer for DMA > operations inside SEV guest. A guest panic will occur if kernel runs out of > the SWIOTLB pool. Linux kernel default to 64MB SWIOTLB pool. That seems not to be the case with the fedora kernel though, the error message (comment 26) indicates a 2MB pool. > Change the size to 1GB(swiotlb=1048576) will always cause kernel panic(I > tried 5 times): 1G is a bit excessive. > Change the size to 512MB(swiotlb=524288) has two results. In 5 trials, > kernel panic(logs are same as 1GB swiotlb pool) happen 2 times. > For the rest of trials, the behaviors are same as comment 22 Hmm. Something wrong with the fedora kernel it seems, ignoring the request for a larger swiotlb ... I guess we should wait for the 8.1 drm rebase then instead of debugging fedora. > Building is failed:
> Error:
> Problem: package systemtap-4.1-2.el8.x86_64 requires systemtap-client =
> 4.1-2.el8, but none of the providers can be installed
> - package systemtap-client-4.1-2.el8.x86_64 requires systemtap-runtime =
> 4.1-2.el8, but none of the providers can be installed
> - conflicting requests
> - nothing provides libdyninstAPI.so.9.3()(64bit) needed by
> systemtap-runtime-4.1-2.el8.x86_64
Doesn't look related to my patches.
Guess I just try again tomorrow.
> Doesn't look related to my patches.
> Guess I just try again tomorrow.
Grr, now the build fails somewhere in gluster ...
(In reply to Gerd Hoffmann from comment #30) > > Doesn't look related to my patches. > > Guess I just try again tomorrow. > > Grr, now the build fails somewhere in gluster ... Do we have a scratch build now? BR/ Zhiyi, Guo No, build still fails in block/gluster.c (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441). (In reply to Gerd Hoffmann from comment #34) > No, build still fails in block/gluster.c > (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441). Gluster fixed, finally worked: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22367789 (In reply to Gerd Hoffmann from comment #36) > (In reply to Gerd Hoffmann from comment #34) > > No, build still fails in block/gluster.c > > (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441). > > Gluster fixed, finally worked: > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22367789 Hmm..I forgot to pick the pkgs, can you help to rebuild them? Thanks! BR/ Zhiyi, Guo > Hmm..I forgot to pick the pkgs, can you help to rebuild them? Thanks! https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22489228 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22490549 (one patch added). Created attachment 1589386 [details]
ovmf log
hmm.. Enable sev & IOMMU for all of the virtio devices I used, rhel 8.1 vm stuck at ovmf and cannot process..
Remove sev option doesn't give any help here..
Upload the ovmf log
Hello Zhiyi, (1) the log in comment 44 indicates that SEV was not enabled: > InstallProtocolInterface: F8775D50-8ABD-4ADF-92AC-853E51F6C8DC 0 the GUID on that line stands for: > InstallProtocolInterface: [IoMmuAbsentProtocol] 0 This is done by AmdSevDxe in OVMF if SEV is not enabled. (2) when you use a virtio-vga device with OVMF, OVMF will not drive the device via virtio; it will drive the "VGA" part. That too should work with SEV, of course; I just wanted to clarify this. If you specifically want to test VirtioGpuDxe in OVMF, you'll have to request the virtio-gpu-pci device on the QEMU cmdline. (Note that when using libvirt (with x86 guests), this is not possible; with x86 guests, "virtio" in the domain XML's video element maps to the "virtio-vga" device model only.) (3) The log in comment 44 ends with: > InstallProtocolInterface: [VirtioDeviceProtocol] 7E02ED20 > InstallProtocolInterface: [EfiSimpleNetworkProtocol] 7E02D028 > InstallProtocolInterface: [EfiDevicePathProtocol] 7E02F198 > InstallProtocolInterface: [EfiVlanConfigProtocol] 7E02D930 > InstallProtocolInterface: [EfiManagedNetworkServiceBindingProtocol] 7DCBBBC0 > InstallProtocolInterface: [EfiDevicePathProtocol] 7DCBBE18 > InstallProtocolInterface: [EfiHiiConfigAccessProtocol] 7DCBB120 > InstallProtocolInterface: [VlanConfigDxe] 7DCBB118 > InstallProtocolInterface: [EfiManagedNetworkProtocol] 7DCBB2C0 Implying that the problem is related to the virtio-net NIC. The most recent QEMU command line is ~1 month old, from comment 22. Can you please include the cmdline that you used in comment 44? Anyway, please note that we've discussed this stuff before: - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c77 - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c79 Therefore, can you please ensure the following: (3.1) The virtio devices should have the "disable-legacy=on" property. (Spelling this out should not be necessary as long as the device is located in a PCI Express Root Port, or PCIe switch downstream port -- because in that case, virtio-0.9.5 compat is disabled automatically; i.e., the device becomes modern-only automatically.) (3.2) The virtio devices should have the "iommu_platform=true" property. (3.3) "-netdev tap" should have the "vhost=off" property. Thanks. (3.4) Please also add "romfile=''" to "-device virtio-net-pci". See: - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c81 - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c84 - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c88 host: - edk2-20190308git89910a39dcfd-5.el8 [buildID=923901] - glusterfs-6.0-7.el8 [buildID=920471] - kernel-4.18.0-114.el8 [buildID=928444] - libseccomp-2.4.1-1.el8 [buildID=909913] - libvirt-4.5.0-30.module+el8.1.0+3574+3a63752b [buildID=924456] - qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01 [buildID=926959] - spice-0.14.2-1.el8 [buildID=899194] host config: - add "options kvm_amd sev=1" to "/etc/modprobe.d/kvm.conf" - run "dracut -f" - reboot, or manually reload "kvm_amd" guest config: - see subsequent attached domain XML - and subsequent generated QEMU command line guest workload: - RHEL-8.1.0-20190701.0 (= RHEL-8.1.0-Beta-1.1) installer ISO - implies guest kernel 4.18.0-107.el8 - in grub2, extend the kernel command line with "swiotlb=262144" (512MB) - also append: "ignore_loglevel console=tty console=ttyS0,115200n8" Result: OVMF works fine, but then I get identical results to comment 27 / comment 28. The large swiotlb is allocated successfully, but then 2MB allocations cannot be satisfied: > [ 6.136727] PCI-DMA: Using software bounce buffering for IO > (SWIOTLB) > [ 6.137670] software IO TLB: mapped [mem 0x5afbe000-0x7afbe000] > (512MB) > ... > [ 6.278992] software IO TLB: SEV is active and system is using DMA > bounce buffers > ... > [ 8.359505] virtio-pci 0000:00:02.0: swiotlb buffer is full (sz: > 2097152 bytes) > [ 8.360620] virtio-pci 0000:00:02.0: overflow > 0x00008001e6200000+2097152 of DMA mask ffffffffffffffff > bus mask 0 > [ 8.362130] WARNING: CPU: 2 PID: 852 at kernel/dma/direct.c:43 > dma_direct_map_page+0xfb/0x160 > ... > [ 8.420552] Console: switching to colour frame buffer device 128x48 > [ 8.423468] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* > response 0x1203 (command 0x105) > ... Because of this, the guest kernel can write to the serial console, but not to the graphical display. Created attachment 1589703 [details] domain XML for comment 47 Created attachment 1589704 [details] QEMU cmdline (by libvirt) for comment 47 (In reply to Laszlo Ersek from comment #45) > (2) when you use a virtio-vga device with OVMF, OVMF will not drive > the device via virtio; it will drive the "VGA" part. That too should > work with SEV, of course; I just wanted to clarify this. > > If you specifically want to test VirtioGpuDxe in OVMF, you'll have to > request the virtio-gpu-pci device on the QEMU cmdline. (Note that when > using libvirt (with x86 guests), this is not possible; with x86 > guests, "virtio" in the domain XML's video element maps to the > "virtio-vga" device model only.) The following (unsupported) domain XML snippet can force the virtio-gpu-pci device: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.video0.driver=virtio-gpu-pci'/> </qemu:commandline> </domain> (Note that the xmlns:qemu attribute (namespace definition) in the root element is required for the <qemu:commandline> element to work. ) This is useful for unit testing the fix for this RHBZ, with libvirt, until the guest kernel issue described in comment 47 is corrected. When using the raw QEMU cmdline, simply change "-device virtio-vga" to "-device virtio-gpu-pci". With the above tweak, I managed to verify "qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01": OVMF produces graphical output through VirtioGpuDxe, so the device model works fine. (If we also add <qemu:arg value='-global'/> <qemu:arg value='isa-debugcon.iobase=0x402'/> <qemu:arg value='-debugcon'/> <qemu:arg value='file:/tmp/ovmf.rhel8.sev.q35.log'/> to the domain XML, then "/tmp/ovmf.rhel8.sev.q35.log" will contain the key message > VirtioGpuDriverBindingStart: produced GOP while binding VirtIo=7DE859A0 FWIW note that enabling the QEMU debug console in a SEV guest has a serious impact on firmware boot duration -- the impact is more severe than without SEV.) (1) The swiotlb machinery limits the maximum contiguous allocation size (i.e., individual bounce buffer size) to 256 KB. From "include/linux/swiotlb.h": > /* > * Maximum allowable number of contiguous slabs to map, > * must be a power of 2. What is the appropriate value ? > * The complexity of {map,unmap}_single is linearly dependent on this value. > */ > #define IO_TLB_SEGSIZE 128 > > /* > * log of the size of each IO TLB slab. The number of slabs is command line > * controllable. > */ > #define IO_TLB_SHIFT 11 And from swiotlb_init_with_tbl() [kernel/dma/swiotlb.c]: > /* > * Allocate and initialize the free list array. This array is used > * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE > * between io_tlb_start and io_tlb_end. > */ (2) If I add "tp_printk trace_event=swiotlb_bounced" to the guest kernel command line, then every bounce buffer allocation *request* is logged (regardless of outcome): dma_direct_map_sg() [kernel/dma/direct.c] dma_direct_map_page() [kernel/dma/direct.c] swiotlb_map() [kernel/dma/swiotlb.c] trace_swiotlb_bounced() <-- LOGGED HERE Such mapping requests work fine initially (from multiple virtio devices); the requested buffer sizes are minimal -- there is nothing larger than a single page (4KB). Even the virtio-gpu driver submits a number of such small (and successful) requests: > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e898e7c8 size=24 FORCE > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e8b7ee00 size=408 FORCE > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001eb0acd20 size=32 FORCE > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e80afc48 size=40 FORCE > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e80afc70 size=24 FORCE > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e80e2f20 size=32 FORCE ( Side comment: please ignore the FORCE parameter at the end. That is *wrong*. I think it is an issue with the tracing itself. Because, FORCE would correspond to "swiotlb=force" ("force using of bounce buffers even if they wouldn't be automatically used by the kernel"), and that was set neither on the kernel command line, nor programmatically in the kernel. ) (3) Some time after these small bounce buffer allocations, virtio-gpu requests a 2MB bounce buffer: [a] > swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff > dev_addr=8001e5c00000 size=2097152 FORCE This cannot succeed regardless of the full swiotlb size, because it exceeds the individual contiguous bounce buffer limit (which is 256KB). The call tree and the kernel messages related to this allocation failure are listed below. The limit is exceeded in the innermost swiotlb_tbl_map_single() function. virtio_gpu_mode_dumb_create() [drivers/gpu/drm/virtio/virtgpu_gem.c] virtio_gpu_object_attach() [drivers/gpu/drm/virtio/virtgpu_vq.c] dma_map_sg() [include/linux/dma-mapping.h] dma_map_sg_attrs() [include/linux/dma-mapping.h] dma_direct_map_sg() [kernel/dma/direct.c] dma_direct_map_page() [kernel/dma/direct.c] swiotlb_map() [kernel/dma/swiotlb.c] trace_swiotlb_bounced() LOGS [a] swiotlb_tbl_map_single() [kernel/dma/swiotlb.c] LOGS [b] report_addr() [kernel/dma/direct.c] LOGS [c] [b] > virtio-pci 0000:00:02.0: swiotlb buffer is full (sz: 2097152 bytes) [c] > virtio-pci 0000:00:02.0: overflow 0x00008001e5c00000+2097152 of DMA > mask ffffffffffffffff bus mask 0 ( Some side comments on the log messages. First, the address 8001e5c00000 -- shown in both messages [a] and [c] -- has bit#47 set, which stands for "encrypted" (the "C" bit of SEV). Second, messages [b] and [c] are misleading -- we don't run out of swiotlb space, we exceed the individual bounce buffer limit instead. ) ( Another side comment: the bounce buffer allocation failure is propagated out of dma_map_sg(), as return value zero. However, virtio_gpu_object_attach() does not check, or propagate further, that return value. Ideally, the virtio-gpu driver should fail the initialization at this point. ) (4) In the upstream kernel, this issue has been identified and fixed, although for virtio-blk only (not virtio-gpu). Please refer to the following commits, part of v5.1: 1 abe420bfae52 swiotlb: Introduce swiotlb_max_mapping_size() 2 492366f7b423 swiotlb: Add is_swiotlb_active() function 3 133d624b1cee dma: Introduce dma_max_mapping_size() 4 e6d6dd6c875e virtio: Introduce virtio_max_dma_size() 5 fd1068e1860e virtio-blk: Consider virtio_max_dma_size() for maximum segment size ( Side comment: the documentation introduced in commit 133d624b1cee (i.e. patch#3) has been touched-up later in commit 99d2b9386729 ("Documentation: DMA-API: fix a function name of max_mapping_size", 2019-06-07), which isn't part of any released kernel yet. But, that's just a simple typo fix and not related to the core issue. ) In the upstream kernel, this work specifically targeted SEV guests: > Looks good. Booted and tested using an SEV guest without any issues. > > Tested-by: Tom Lendacky <thomas.lendacky> (Archived at: - https://lkml.org/lkml/2019/1/30/816 - https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg33545.html ) (5) So I *speculate* the following kernel patch (on top of upstream v5.2-7765-g964a4eacef67) might fix the issue (not even build tested yet): > commit 9821816e5f960b59232d23911556e7e2662ddc48 > Author: Laszlo Ersek <lersek> > Date: Sat Jul 13 18:47:05 2019 +0200 > > drm/virtio: limit each scatterlist node by virtio_max_dma_size() > > SWIOTLB limits individual bounce buffers to 256KB in size. This limit can > interfere with virtio drivers requesting larger (contiguous) DMA mappings, > for example when they run in SEV guests. > > The issue was previously solved for virtio-blk in commit range > 1c163f4c7b3f..fd1068e1860e. > > Fix the problem for virtio-gpu as well, by limiting each scatterlist node > to virtio_max_dma_size(), in virtio_gpu_object_get_sg_table(). > > Cc: Daniel Vetter <daniel> > Cc: David Airlie <airlied> > Cc: Gerd Hoffmann <kraxel> > Cc: Joerg Roedel <jroedel> > Cc: Tom Lendacky <thomas.lendacky> > Cc: dri-devel.org > Cc: linux-kernel.org > Cc: virtualization.org > Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1531543 > Signed-off-by: Laszlo Ersek <lersek> > > diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c > index b2da31310d24..524c783cffd4 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_object.c > +++ b/drivers/gpu/drm/virtio/virtgpu_object.c > @@ -201,25 +201,32 @@ int virtio_gpu_object_get_sg_table(struct virtio_gpu_device *qdev, > struct page **pages = bo->tbo.ttm->pages; > int nr_pages = bo->tbo.num_pages; > struct ttm_operation_ctx ctx = { > .interruptible = false, > .no_wait_gpu = false > }; > + size_t max_segment; > > /* wtf swapping */ > if (bo->pages) > return 0; > > if (bo->tbo.ttm->state == tt_unpopulated) > bo->tbo.ttm->bdev->driver->ttm_tt_populate(bo->tbo.ttm, &ctx); > bo->pages = kmalloc(sizeof(struct sg_table), GFP_KERNEL); > if (!bo->pages) > goto out; > > - ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0, > - nr_pages << PAGE_SHIFT, GFP_KERNEL); > + max_segment = virtio_max_dma_size(); > + max_segment &= ~(size_t)(PAGE_SIZE - 1); > + if (max_segment > SCATTERLIST_MAX_SEGMENT) { > + max_segment = SCATTERLIST_MAX_SEGMENT; > + } > + ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0, > + nr_pages << PAGE_SHIFT, > + (unsigned)max_segment, GFP_KERNEL); > if (ret) > goto out; > return 0; > out: > kfree(bo->pages); > bo->pages = NULL; Should be max_segment = virtio_max_dma_size(qdev->vdev); but even with this typo fixed, the guest kernel driver doesn't work. The error messages are gone, but the screen stays blank. Other stuff I tried on the weekend: * Adding "tp_printk trace_event=swiotlb_bounced" to the command line of the guest kernel, from comment 51 bullet (5), confirms that the 2MB buffer from TTM is mapped by 8 bounce buffers, handed out by swiotlb, each 256KB in size. * Built & booted upstream kernels v4.19 and v4.20 in the guest. Neither works. Strange for two reasons: (a) The presentation titled "AMD SEV Update / Linux Security Summit 2018" [AMD-Encrypted-Virtualization-Update-David-Kaplan-AMD.pdf] says VirtIO-GPU support is available in Linux 4.19/QEMU 3.1 (slide 6). (b) Kernel commit 8f44ca223345 ("drm/virtio: add dma sync for dma mapped virtio gpu framebuffer pages", 2018-09-19) is part of v4.20. At this point I feel compelled to think that (virtio-gpu + SEV) must never have worked in the Linux guest. O_o * Logged of virtio-gpu trace points in qemu-kvm (see version in comment 47), with systemtap. Operations of the guest kernel from comment 51 bullet (5) looked valid, from the trace. * Turned on guest error logging in qemu-kvm ("-d guest_errors"). Nothing was logged. * Disabled SEV, but kept "iommu_platform" enabled in the domain XML. Also kept the half gig swiotlb in the guest kernel. This way the guest driver would continue using the DMA-API but those DMA ops wouldn't be backed by SEV logic. Result: everything worked fine. Confusing: that suggests the bug is with SEV code in the guest -- but in that case, why didn't I get garbage (= encrypted data), instead of a blank screen, when SEV was enabled? * Re: side remark in comment 51 bullet (2): learned from code inspection that SEV enablement does force swiotlb (see "arch/x86/mm/mem_encrypt.c"), so the "swiotlb_bounced" tracepoint didn't lie after all. Hi Laszlo, Thanks for the debug progress from comment 45 ~ comment 53 :) I'm using a beaker job to configure and test sev automatically, if you have interests, please refer to a task can do these automatically: http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/virt/sev/guest-sev-setup And here is the reference beaker job xml for using this task: https://beaker.engineering.redhat.com/jobs/3666425(slow train) https://beaker.engineering.redhat.com/jobs/3617539(fast train) For comment 44, my changes made the task /kernel/virt/sev/guest-sev-setup broken:( After fixing the problem, I can boot sev guest + virito devices with iommu enabled. I can also confirm kernel will get call trace and prompt error message repeatedly: [ 6.388096] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0 [ 6.390362] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes) [ 6.393213] virtio-pci 0000:00:01.0: overflow 0x0000800272400000+2097152 of DMA mask ffffffffffffffff bus mask 0 [ 6.395180] WARNING: CPU: 0 PID: 791 at kernel/dma/direct.c:43 dma_direct_map_page+0xfb/0x160 [ 6.396163] Modules linked in: crct10dif_pclmul snd_hda_codec_generic(+) crc32_pclmul ledtrig_audio snd_hda_intel snd_hda_codec virtio_gpu(+) ttm ghash_clmulni_intel snd_hda_core snd_hwdep drm_kms_helper snd_pcm pcspkr snd_timer syscopyarea snd sysfillrect i2c_i801 sysimgblt fb_sys_fops sg drm soundcore lpc_ich virtio_input virtio_balloon xfs libcrc32c sd_mod ahci libahci libata crc32c_intel serio_raw virtio_blk virtio_console virtio_net virtio_scsi net_failover failover dm_mirror dm_region_hash dm_log dm_mod [ 6.401174] CPU: 0 PID: 791 Comm: systemd-udevd Tainted: G W --------- - - 4.18.0-115.el8.x86_64 #1 [ 6.401174] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015 [ 6.401174] RIP: 0010:dma_direct_map_page+0xfb/0x160 [ 6.401174] Code: 83 78 02 00 00 48 85 c0 74 30 4c 8b 00 b8 fe ff ff ff 49 39 c0 77 0a 48 83 bb 88 02 00 00 00 74 09 80 3d 6b 83 2d 01 00 74 31 <0f> 0b 48 c7 c0 ff ff ff ff eb 8d 48 89 ca eb 83 80 3d 53 83 2d 01 [ 6.401174] RSP: 0018:ffffa72101217940 EFLAGS: 00010246 [ 6.401174] RAX: 0000000000000000 RBX: ffff8fb507c380b0 RCX: 0000000000000000 [ 6.401174] RDX: ffff8fb677a1ee00 RSI: ffff8fb677a16a08 RDI: ffff8fb677a16a08 [ 6.401174] RBP: 0000000000200000 R08: 0000000000000325 R09: ffff8fb675030f00 [ 6.401174] R10: 0720072007200720 R11: 0720072007200720 R12: ffff8fb507c380b0 [ 6.401174] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8fb673866000 [ 6.401174] FS: 00007f10e4939940(0000) GS:ffff8fb677a00000(0000) knlGS:0000000000000000 [ 6.401174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.401174] CR2: 000056288f57f4c0 CR3: 0000800274e28000 CR4: 00000000003406f0 [ 6.401174] Call Trace: [ 6.401174] dma_direct_map_sg+0x64/0xb0 [ 6.401174] virtio_gpu_object_attach+0x1f1/0x230 [virtio_gpu] [ 6.401174] virtio_gpu_mode_dumb_create+0xb1/0xd0 [virtio_gpu] [ 6.401174] drm_client_framebuffer_create+0xb3/0x220 [drm] [ 6.401174] drm_fb_helper_generic_probe+0x4d/0x210 [drm_kms_helper] [ 6.401174] __drm_fb_helper_initial_config_and_unlock+0x27e/0x540 [drm_kms_helper] [ 6.401174] drm_fbdev_client_hotplug+0xed/0x170 [drm_kms_helper] [ 6.401174] drm_fbdev_generic_setup+0xa3/0x110 [drm_kms_helper] [ 6.401174] virtio_gpu_probe+0xe9/0x100 [virtio_gpu] [ 6.401174] virtio_dev_probe+0x170/0x230 [ 6.401174] driver_probe_device+0x12d/0x460 [ 6.401174] __driver_attach+0xe0/0x110 [ 6.401174] ? driver_probe_device+0x460/0x460 [ 6.401174] bus_for_each_dev+0x77/0xc0 [ 6.401174] ? klist_add_tail+0x57/0x70 [ 6.401174] bus_add_driver+0x155/0x230 [ 6.401174] ? 0xffffffffc05fc000 [ 6.401174] driver_register+0x6b/0xb0 [ 6.401174] ? 0xffffffffc05fc000 [ 6.401174] do_one_initcall+0x46/0x1c3 [ 6.401174] ? free_unref_page_commit+0x91/0x100 [ 6.401174] ? _cond_resched+0x15/0x30 [ 6.401174] ? kmem_cache_alloc_trace+0x151/0x1d0 [ 6.401174] do_init_module+0x5a/0x210 [ 6.401174] load_module+0x1440/0x17d0 [ 6.401174] ? __do_sys_init_module+0x13d/0x180 [ 6.401174] ? _cond_resched+0x15/0x30 [ 6.401174] __do_sys_init_module+0x13d/0x180 [ 6.401174] do_syscall_64+0x5b/0x1b0 [ 6.401174] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 6.401174] RIP: 0033:0x7f10e352deae [ 6.401174] Code: 48 8b 0d dd ff 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa ff 2b 00 f7 d8 64 89 01 48 [ 6.401174] RSP: 002b:00007ffe7e9a6ef8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 6.401174] RAX: ffffffffffffffda RBX: 000056288f591bd0 RCX: 00007f10e352deae [ 6.401174] RDX: 00007f10e409980d RSI: 0000000000020818 RDI: 000056288ff46e70 [ 6.401174] RBP: 00007f10e409980d R08: 000056288f5890b0 R09: 0000000000000005 [ 6.401174] R10: 0000000000000006 R11: 0000000000000246 R12: 000056288ff46e70 [ 6.401174] R13: 000056288f585830 R14: 0000000000020000 R15: 0000000000000000 [ 6.401174] ---[ end trace 54a2bb0b7387a604 ]--- [ 6.844794] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.844799] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.845313] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.848662] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.850341] virtio_gpu virtio0: fb0: DRM emulated frame buffer device [ 6.862270] [drm] pci: virtio-gpu-pci detected at 0000:0a:00.0 [ 6.864260] [drm] virgl 3d acceleration not supported by host [ 6.870313] [drm] number of scanouts: 1 [ 6.871247] [drm] number of cap sets: 0 [ 6.875634] [drm] Initialized virtio_gpu 0.1.0 0 for virtio9 on minor 1 [ 6.877367] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.878964] virtio-pci 0000:0a:00.0: swiotlb buffer is full (sz: 2097152 bytes) [ 6.879849] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.886328] virtio_gpu virtio9: fb1: DRM emulated frame buffer device [ 6.891610] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.897540] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.899783] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.901890] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.904215] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.906725] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.909514] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) [ 6.911828] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105) .... (repeat..) BR/ Zhiyi, Guo Created attachment 1591309 [details] VM xml rhel 8.1 vm xml used in comment 54, configured virtio as both primary and secondary video device: ... <video> <driver iommu='on' ats='on'/> <model type='virtio' heads='1' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video> <video> <driver iommu='on' ats='on'/> <model type='virtio' heads='1'/> <alias name='video1'/> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> </video> ... qtree info: dev: q35-pcihost, id "" MCFG = 2952790016 (0xb0000000) pci-hole64-size = 34359738368 (32 GiB) short_root_bus = 0 (0x0) below-4g-mem-size = 2147483648 (2 GiB) above-4g-mem-size = 6442450944 (6 GiB) x-pci-hole64-fix = true bus: pcie.0 type PCIE dev: ich9-intel-hda, id "sound0" debug = 0 (0x0) msi = "auto" old_msi_addr = false addr = 1b.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class Audio controller, addr 00:1b.0, pci id 8086:293e (sub 1af4:1100) bar 0: mem at 0xc1c10000 [0xc1c13fff] bus: sound0.0 type HDA dev: hda-duplex, id "sound0-codec0" debug = 0 (0x0) mixer = true cad = 0 (0x0) dev: virtio-vga, id "video0" ioeventfd = false vectors = 3 (0x3) virtio-pci-bus-master-bug-migration = false disable-legacy = "on" disable-modern = false migrate-extra = true modern-pio-notify = false x-disable-pcie = false page-per-vq = false x-ignore-backend-features = false ats = true x-pcie-deverr-init = true x-pcie-lnkctl-init = true x-pcie-pm-init = true addr = 01.0 romfile = "vgabios-virtio.bin" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class VGA controller, addr 00:01.0, pci id 1af4:1050 (sub 1af4:1100) bar 0: mem at 0xc0000000 [0xc07fffff] bar 2: mem at 0x800900000 [0x800903fff] bar 4: mem at 0xc1c1f000 [0xc1c1ffff] bar 6: mem at 0xffffffffffffffff [0xfffe] bus: virtio-bus type virtio-pci-bus dev: virtio-gpu-device, id "" max_outputs = 1 (0x1) max_hostmem = 268435456 (256 MiB) xres = 1024 (0x400) yres = 768 (0x300) indirect_desc = true event_idx = true notify_on_empty = true any_layout = true iommu_platform = true dev: pcie-root-port, id "pci.10" x-migrate-msix = true bus-reserve = 4294967295 (0xffffffff) io-reserve = 18446744073709551615 (16 EiB) mem-reserve = 18446744073709551615 (16 EiB) pref32-reserve = 18446744073709551615 (16 EiB) pref64-reserve = 18446744073709551615 (16 EiB) power_controller_present = true chassis = 10 (0xa) slot = 0 (0x0) port = 25 (0x19) aer_log_max = 8 (0x8) addr = 03.1 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class PCI bridge, addr 00:03.1, pci id 1b36:000c (sub 0008:0000) bar 0: mem at 0xc1c15000 [0xc1c15fff] bus: pci.10 type PCIE dev: virtio-gpu-pci, id "video1" ioeventfd = false vectors = 3 (0x3) virtio-pci-bus-master-bug-migration = false disable-legacy = "on" disable-modern = false migrate-extra = true modern-pio-notify = false x-disable-pcie = false page-per-vq = false x-ignore-backend-features = false ats = true x-pcie-deverr-init = true x-pcie-lnkctl-init = true x-pcie-pm-init = true addr = 00.0 romfile = "" rombar = 1 (0x1) multifunction = false command_serr_enable = true x-pcie-lnksta-dllla = true x-pcie-extcap-init = true class Display controller, addr 0a:00.0, pci id 1af4:1050 (sub 1af4:1100) bar 1: mem at 0xc0800000 [0xc0800fff] bar 4: mem at 0x800800000 [0x800803fff] bus: virtio-bus type virtio-pci-bus dev: virtio-gpu-device, id "" max_outputs = 1 (0x1) max_hostmem = 268435456 (256 MiB) xres = 1024 (0x400) yres = 768 (0x300) indirect_desc = true event_idx = true notify_on_empty = true any_layout = true iommu_platform = true FailQA per comment 47 ~ 54 Configure qxl-vga as primary video device and virtio-gpu-pci as secondary video device: ... <video> <model type='qxl' heads='1' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> </video> <video> <driver iommu='on' ats='on'/> <model type='virtio' heads='1'/> <alias name='video1'/> <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> </video> ... And configure virtio-gpu-pci as fb console device: #cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-115.el8.x86_64 root=/dev/mapper/rhel_bootp--73--225--25-root ro crashkernel=auto resume=/dev/mapper/rhel_bootp--73--225--25-swap rd.lvm.lv=rhel_bootp-73-225-25/root rd.lvm.lv=rhel_bootp-73-225-25/swap console=ttyS0,115200 fbcon=map:1 Kernel call trace and error messages can also be reproduced against virtio-gpu-pci. Zhiyi, re: "After fixing the problem, I can boot sev guest + virito devices with iommu enabled", from comment 54; that's great -- in my case, all virtio devices *except* virtio-gpu-pci / virtio-vga seem to work fine. The problem is specifically with virtio-gpu-pci / virtio-vga. That said, I think it is still too early to move this BZ from ON_QA to ASSIGNED (and to set FailedQA). That's because we simply don't have enough information at this point -- our results are inconclusive in *either* direction. We don't have evidence whether the qemu-kvm backport is correct *or* incorrect. We have *some* positive evidence (the VirtioGpuDxe driver in OVMF works fine), so that suggests that the qemu-kvm backport is correct. However, the Linux guest driver is a *lot* more sophisticated and demanding, so that is the ultimate test for the host side backport. Basically, I'm suggesting to move this BZ back to ON_QA status, and to clear the FailedQA mark. In addition, file a separate RHBZ for the guest kernel issue. Then, the guest kernel RHBZ should be marked as blocking the present RHBZ. Meanwhile I've contacted Tom & Brijesh @ AMD -- they've confirmed that the guest kernel driver for virtio-gpu certainly worked at some point. So, in the new guest kernel RHBZ to file, we should first establish a working baseline -- because, I've been unable to do even that!, see my comments above. Having a functional guest kernel baseline will then let us do two things: - bisect the guest kernel regression, - validate the present BZ for good. Thanks. Regarding the warning / stack trace in comment 54, that's *exactly* what my guest kernel patch in comment 51 / comment 52 fixes (written for the upstream kernel). With the patch applied, the error messages are gone. However, the display *still* doesn't work. That's the real problem: we have a deeper/worse problem in the guest kernel than simply requesting bounce buffers that are too large. When the bounce buffers are suitably sized, the display *still* doesn't work. Move back to ON_QA and blocker is Bug 1731046 I was able to bisect the commit which caused the black screen commit 55897af63091ebc2c3f239c6a6666f748113ac50 (HEAD, refs/bisect/bad) Author: Christoph Hellwig <hch> Date: Mon Dec 3 11:43:54 2018 +0100 dma-direct: merge swiotlb_dma_ops into the dma_direct code While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. Signed-off-by: Christoph Hellwig <hch> Acked-by: Jesper Dangaard Brouer <brouer> Tested-by: Jesper Dangaard Brouer <brouer> Tested-by: Tony Luck <tony.luck> As we know, SEV needs swiotlb to bounce the virtio buffer, it appears that the above commit has introduced the regression when buffers are sync from cpu to device. Some other folks have also reported similar issue in non-sev environment (e.g 32-bit devices on the bare metal system). Optionally, you can reproduce the issue on non-sev guest if you pass "swiotlb=force" command line. This will force non-SEV guest to also use the swiotlb and you will get the same black screen. Here is the patch which fixes the issue https://lkml.org/lkml/2019/7/19/672 Thank you, Brijesh! I'll copy this information to bug 1731046. > > commit 9821816e5f960b59232d23911556e7e2662ddc48 > > Author: Laszlo Ersek <lersek> > > Date: Sat Jul 13 18:47:05 2019 +0200 > > > > drm/virtio: limit each scatterlist node by virtio_max_dma_size() What is the state here? Does that help (together with comment 62 fix)? Can you submit the patch upstream (looks sane, with the tyops fixed of course)? Hi Gerd, the patch I hacked up didn't help in itself (comment 52). I didn't test the patch linked by Brijesh in comment 62. Don has a backport queued for that (bug 1731046 comment 7), but even if it fixes the dma-direct regression in RHEL8, virtio-gpu still needs a backport that puts DMA to use in some more spots (bug 1731046 comment 6). So basically I ran out of steam on this. The delta between RHEL8 and upstream is very large (meaning each of DMA and virtio-gpu, independently); I'm totally unfamiliar with these subsystems (I don't even know what DRM shorthands such as "TTM" stand for), and experimenting with bleeding edge upstream Linux ultimately exceeded my enthusiasm. Hi Gerd, I suppose we also need a kernel bug to track the calltrace in comment 54? Seems Laszlo has done the investigation regarding this part in comment 51 BR/ Zhiyi https://patchwork.freedesktop.org/patch/322726/ 5.3-rc3 + that patch works with swiotlb=force. > I suppose we also need a kernel bug to track the calltrace in comment 54?
Yes.
(In reply to Gerd Hoffmann from comment #69) > > I suppose we also need a kernel bug to track the calltrace in comment 54? > > Yes. The fix has been tracked by Bug 1739291 - Kernel calltrace when using virtio-vga + sev Per https://bugzilla.redhat.com/show_bug.cgi?id=1739291#c15, iommu of virtio-vga device works well now Verified per comment 71 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3345 |