Bug 1531543

Summary:

[RFE] add iommu support to virtio-gpu

Product:

Red Hat Enterprise Linux 8

Reporter:

yafu <yafu>

Component:

qemu-kvm

Assignee:

Gerd Hoffmann <kraxel>

Status:

CLOSED ERRATA

QA Contact:

Guo, Zhiyi <zhguo>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

8.0

CC:

areis, brijesh.singh, chayang, ddutile, dyuan, fjin, hpopal, jinzhao, juzhang, knoel, kraxel, lersek, marcandre.lureau, pezhang, qzhang, rbalakri, thomas.lendacky, virt-maint, wchadwic, zhguo, zpeng

Target Milestone:

pre-dev-freeze

Keywords:

FutureFeature

Target Release:

8.1

Flags:

hpopal: mirror+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1540181 (view as bug list)

Environment:

Last Closed:

2019-11-05 20:46:46 UTC

Type:

Feature Request

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1685552, 1738631, 1739291

Bug Blocks:

Attachments:

Description	Flags
domain xml	none
ovmf log	none
domain XML for comment 47	none
QEMU cmdline (by libvirt) for comment 47	none
VM xml	none

Description yafu 2018-01-05 13:16:04 UTC

Description of problem:
guest killed on the target host when do migraion during rebooting guest with iommu config virtio video. 

Version-Release number of selected component (if applicable):
libvirt-3.9.0-7.el7.x86_64
qemu-kvm-rhev-2.10.0-15.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a guest with iommu config virtio video:
#virsh dumpxml iommu1
<features>
    <acpi/>
    <apic/>
    <pmu state='on'/>
    <vmport state='off'/>
    <ioapic driver='qemu'/>
  </features>
...
<device>
 <video>
      <driver iommu='on' ats='on'/>
      <model type='virtio' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
 <iommu model='intel'>
      <driver intremap='on' caching_mode='on' eim='on' iotlb='on'/>
 </iommu>

2.Enable iommu in the guest os:
(1)Edit file  "/etc/default/grub":
Append "intel_iommu=on" or "amd_iommu=on" to the value of "GRUB_CMDLINE_LINUX=......". Such as the following one:
GRUB_CMDLINE_LINUX="rd.md=0 rd.lvm=0 rd.dm=0 vconsole.keymap=us $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) rd.luks=0 vconsole.font=latarcyrheb-sun16 rhgb quiet intel_iommu=on"

(2)Then run the following command to generate the updated grub file:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the guest.

3.Do migration during guest rebooting:
virsh reboot iommu1 && virsh migrate iommu1 qemu+ssh://10.66.4.124/system --live --verbose
Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: Bad ram offset fffff402


Actual results:
guest killed on the target host when do migraion during rebooting guest with iommu config virtio video. 

Expected results:
Migration completed successfully and guest works well on the target host.

Additional info:
1.Backtrace of the killed qemu-kvm process:
(gdb) t a a bt

Thread 8 (Thread 0x7f3f42dff700 (LWP 32185)):
#0  0x00007f3fd5d7f8f5 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000555af9a9f060 in qemu_cond_wait (cond=cond@entry=0x555afba35ef0, mutex=mutex@entry=0x555afba35f28) at util/qemu-thread-posix.c:161
#2  0x0000555af99d07eb in vnc_worker_thread_loop (queue=queue@entry=0x555afba35ef0) at ui/vnc-jobs.c:205
#3  0x0000555af99d0d28 in vnc_worker_thread (arg=0x555afba35ef0) at ui/vnc-jobs.c:312
#4  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3f42dff700) at pthread_create.c:308
#5  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 7 (Thread 0x7f3fc186b700 (LWP 32184)):
#0  0x00007f3fd5d7f8f5 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000555af9a9f060 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at util/qemu-thread-posix.c:161
#2  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.10.0/cpus.c:1095
#3  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (arg=0x555afc022000) at /usr/src/debug/qemu-2.10.0/cpus.c:1133
#4  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3fc186b700) at pthread_create.c:308
#5  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f3f425fe700 (LWP 32186)):
#0  0x00007f3fd5a9aced in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f3fd761dbdc in g_main_context_iterate (priority=2147483647, n_fds=4, fds=0x555affce7880, timeout=100, context=0x555afba4fad0) at gmain.c:4185
#2  0x00007f3fd761dbdc in g_main_context_iterate (context=0x555afba4fad0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3879
#3  0x00007f3fd761df2a in g_main_loop_run (loop=0x555afd276c50) at gmain.c:4080
#4  0x00007f3fd7b45f2a in red_worker_main (arg=0x555afbbb17a0) at red-worker.c:1372
#5  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3f425fe700) at pthread_create.c:308
#6  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f3fc586f700 (LWP 32178)):
#0  0x00007f3fd5d8247d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f3fd5d7dd7b in _L_lock_812 () at /lib64/libpthread.so.0
#2  0x00007f3fd5d7dc48 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000555af9a9edcf in qemu_mutex_lock (mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at util/qemu-thread-posix.c:65
#4  0x0000555af97b694c in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.10.0/cpus.c:1581
#5  0x0000555af9aaf3be in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:257
#6  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3fc586f700) at pthread_create.c:308
#7  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7f3fc286d700 (LWP 32182)):
#0  0x00007f3fd5d7f8f5 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000555af9a9f060 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at util/qemu-thread-posix.c:161
#2  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.10.0/cpus.c:1095
#3  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (arg=0x555afbfe6000) at /usr/src/debug/qemu-2.10.0/cpus.c:1133
#4  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3fc286d700) at pthread_create.c:308
#5  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f3fc206c700 (LWP 32183)):
#0  0x00007f3fd5d7f8f5 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000555af9a9f060 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at util/qemu-thread-posix.c:161
#2  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.10.0/cpus.c:1095
#3  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (arg=0x555afc002000) at /usr/src/debug/qemu-2.10.0/cpus.c:1133
---Type <return> to continue, or q <return> to quit---
#4  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3fc206c700) at pthread_create.c:308
#5  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f3fc306e700 (LWP 32181)):
#0  0x00007f3fd5d7f8f5 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000555af9a9f060 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x555afa063a40 <qemu_global_mutex>) at util/qemu-thread-posix.c:161
#2  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (cpu=<optimized out>) at /usr/src/debug/qemu-2.10.0/cpus.c:1095
#3  0x0000555af97b6adb in qemu_kvm_cpu_thread_fn (arg=0x555afbf86000) at /usr/src/debug/qemu-2.10.0/cpus.c:1133
#4  0x00007f3fd5d7bdd5 in start_thread (arg=0x7f3fc306e700) at pthread_create.c:308
#5  0x00007f3fd5aa594d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f3fdbe2a040 (LWP 32175)):
#0  0x00007f3fd59dd1a7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f3fd59de898 in __GI_abort () at abort.c:90
#2  0x0000555af977fd1a in qemu_get_ram_block (addr=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:1061
#3  0x0000555af9783905 in qemu_map_ram_ptr (ram_block=<optimized out>, addr=4294964226) at /usr/src/debug/qemu-2.10.0/exec.c:2193
#4  0x0000555af978a1fb in lduw_le_phys_cached (cache=<optimized out>, addr=addr@entry=2) at /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:281
#5  0x0000555af980b385 in virtio_load (vdev=<optimized out>, pa=2, cache=<optimized out>) at /usr/src/debug/qemu-2.10.0/include/hw/virtio/virtio-access.h:166
#6  0x0000555af980b385 in virtio_load (vq=0x555afe444000) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:228
#7  0x0000555af980b385 in virtio_load (vdev=0x555afe27e170, f=<optimized out>, version_id=<optimized out>) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:2176
#8  0x0000555af99841a6 in vmstate_load_state (f=0x555b01d8e000, vmsd=0x555af9e55480 <vmstate_virtio_gpu>, opaque=0x555afe27e170, version_id=1) at migration/vmstate.c:140
#9  0x0000555af998037d in qemu_loadvm_state_main (mis=<optimized out>, f=0x555b01d8e000) at migration/savevm.c:1860

#10 0x0000555af998037d in qemu_loadvm_state_main (f=f@entry=0x555b01d8e000, mis=0x555afa094de0 <mis_current.31917>) at migration/savevm.c:1955
#11 0x0000555af9981fda in qemu_loadvm_state (f=f@entry=0x555b01d8e000) at migration/savevm.c:2034
#12 0x0000555af997c816 in process_incoming_migration_co (opaque=0x555b01d8e000) at migration/migration.c:325
#13 0x0000555af9ab075a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#14 0x00007f3fd59eef50 in __start_context () at /lib64/libc.so.6
#15 0x00007ffea3e40510 in  ()
#16 0x0000000000000000 in  ()

Comment 2 yafu 2018-01-05 13:17:05 UTC

Created attachment 1377489 [details]
domain xml

Comment 10 Gerd Hoffmann 2018-09-05 06:33:23 UTC

kernel side fix is in drm-misc (should land in the next merge window, i.e. 4.20 (unless that'll be 5.0 ...)

pull request pending for the qemu side fix, should make it into qemu 3.1

Comment 12 Guo, Zhiyi 2018-11-23 07:13:00 UTC

Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1501618#c20, AMD SEV needs virtio-vga to support iommu too. Without iommu, SEV guest will suffer repeat call trace and lose response.

Comment 19 Gerd Hoffmann 2019-05-13 14:24:34 UTC

scratch build (still building atm), please test:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=21632356

Comment 20 Guo, Zhiyi 2019-06-05 02:12:53 UTC

I guess we cannot check the behavior of this bug until drm rebase is finished..
Gerd, could you help to build a scratch build after drm rebase done? Thanks!

BR/
Zhiyi

Comment 21 Gerd Hoffmann 2019-06-05 06:12:48 UTC

(In reply to Guo, Zhiyi from comment #20)
> I guess we cannot check the behavior of this bug until drm rebase is
> finished..

RHEL-8 yes.  Testing with a fedora guest should work though, they have a new enough kernel.

Comment 22 Guo, Zhiyi 2019-06-05 06:48:30 UTC

Well..Seems some regression happen on latest Fedora kernel 5.1.5-300.fc30.x86_64

Using qemu-kvm-2.12.0-63.el8_0.1.bz1531543.3.x86_64 and enable sev options, two issues are found:
1. guest kernel call trace:
[    2.721928]  ? device_driver_attach+0x60/0x60
[    2.721929]  bus_for_each_dev+0x78/0xc0
[    2.721931]  bus_add_driver+0x14a/0x1e0
[    2.721932]  driver_register+0x6c/0xb0
[    2.721933]  ? 0xffffffffc03fe000
[    2.721936]  do_one_initcall+0x46/0x1c4
[    2.721938]  ? free_unref_page_commit+0x95/0x110
[    2.721941]  ? _cond_resched+0x15/0x30
[    2.721943]  ? kmem_cache_alloc_trace+0x154/0x1c0
[    2.721945]  ? do_init_module+0x23/0x210
[    2.721946]  do_init_module+0x5c/0x210
[    2.721947]  load_module+0x23de/0x2910
[    2.721948]  ? _cond_resched+0x15/0x30
[    2.721950]  ? __do_sys_init_module+0x162/0x190
[    2.721951]  __do_sys_init_module+0x162/0x190
[    2.721953]  do_syscall_64+0x5b/0x170
[    2.721955]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.721956] RIP: 0033:0x7f6068e0ebae
[    2.721959] Code: 48 8b 0d dd 42 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 42 0c 00 f7 d8 64 89 01 48
[    2.721959] RSP: 002b:00007ffdb3b4b068 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    2.721960] RAX: ffffffffffffffda RBX: 00005642a1a89480 RCX: 00007f6068e0ebae
[    2.721961] RDX: 00007f6068a6384d RSI: 000000000001d9be RDI: 00005642a2328a50
[    2.721961] RBP: 00005642a2328a50 R08: 00005642a1a89e50 R09: 0000000000000006
[    2.721962] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f6068a6384d
[    2.721962] R13: 0000000000000001 R14: 00005642a1a7c1e0 R15: 00005642a1a7e390
[    2.721964] ---[ end trace f77acc239227cf53 ]---
[    2.775462] fbcon: Deferring console take-over
[    2.776320] virtio_gpu virtio0: fb0: DRM emulated frame buffer device


2. Errors prompt repeatedly from dmesg:
[    7.922289] fbcon: Taking over console
[    7.923279] Console: switching to colour frame buffer device 128x48
[    7.925859] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.927510] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.929083] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.944523] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.946081] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.947606] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.956789] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    7.960513] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    8.129292] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    8.335154] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    8.543169] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
[    8.751152] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)

Issues cannot be reproduced after remove sev option from guest

qemu cli used:
/usr/libexec/qemu-kvm \
-name guest=Fedora30,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-15-Fedora30/master-key.aes \
-machine pc-q35-rhel7.6.0,accel=kvm,usb=off,smm=on,dump-guest-core=off,memory-encryption=sev0 \
-cpu EPYC-IBPB,x2apic=on,tsc-deadline=on,hypervisor=on,tsc_adjust=on,cmp_legacy=on,perfctr_core=on,virt-ssbd=on,monitor=off,svm=off \
-global driver=cfi.pflash01,property=secure,value=on \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/var/lib/libvirt/qemu/nvram/Fedora30_VARS.fd,if=pflash,format=raw,unit=1 \
-m 8192 \
-realtime mlock=off \
-smp 2,sockets=1,cores=1,threads=2 \
-uuid 6dcdf378-3786-40bf-9d8e-e53522b95822 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=29,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \
-device virtio-scsi-pci,iommu_platform=on,id=scsi0,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,iommu_platform=on,bus=pci.4,addr=0x0 \
-drive file=/home/fedora30.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on \
-netdev tap,fd=30,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:85:41:b0,bus=pci.1,addr=0x0,romfile=,iommu_platform=on \
-chardev socket,id=charserial0,fd=31,server,nowait \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=32,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:0 \
-device virtio-vga,id=video0,max_outputs=1,bus=pcie.0,addr=0x1,iommu_platform=on,ats=on \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0,iommu_platform=on \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,iommu_platform=on,bus=pci.6,addr=0x0 \
-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x1 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Comment 23 Guo, Zhiyi 2019-06-05 06:50:38 UTC

Hmm..call trace for issue 1 is not complete, please see this one:
[    2.661728] [drm] pci: virtio-vga detected at 0000:00:01.0
[    2.665214] fb0: switching to virtiodrmfb from EFI VGA
[    2.668179] virtio-pci 0000:00:01.0: vgaarb: deactivate vga console
[    2.672353] [drm] virgl 3d acceleration not supported by host
[    2.682343] [TTM] Zone  kernel: Available graphics memory: 4074364 kiB
[    2.683466] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    2.684484] [TTM] Initializing pool allocator
[    2.694486] [TTM] Initializing DMA pool allocator
[    2.696680] [drm] number of scanouts: 1
[    2.697911] [drm] number of cap sets: 0
[    2.699238] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
[    2.701683] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes)
[    2.703004] virtio_net virtio1 enp1s0: renamed from eth0
[    2.703163] virtio-pci 0000:00:01.0: overflow 0x000080026ea00000+2097152 of DMA mask ffffffffffffffff bus mask 0
[    2.705797] WARNING: CPU: 0 PID: 429 at kernel/dma/direct.c:43 report_addr+0x33/0x60
[    2.707117] Modules linked in: crc32c_intel virtio_gpu(+) drm_kms_helper serio_raw ttm virtio_console virtio_net drm virtio_scsi net_failover failover qemu_fw_cfg
[    2.709563] CPU: 0 PID: 429 Comm: systemd-udevd Tainted: G        W         5.1.5-300.fc30.x86_64 #1
[    2.711108] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015
[    2.712117] RIP: 0010:report_addr+0x33/0x60
[    2.712824] Code: 48 8b 87 30 02 00 00 48 89 34 24 48 85 c0 74 2d 4c 8b 00 b8 fe ff ff ff 49 39 c0 76 14 80 3d 6b cd 21 01 00 0f 84 b9 06 00 00 <0f> 0b 48 83 c4 08 c3 48 83 bf 40 02 00 00 00 74 ef eb e0 80 3d 4c
[    2.716006] RSP: 0018:ffffa53b010bb8d0 EFLAGS: 00010246
[    2.716904] RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000000
[    2.718147] RDX: ffff948bb7a1ce80 RSI: ffff948bb7a168c8 RDI: ffff948bb7a168c8
[    2.719353] RBP: ffff948bb750b0b0 R08: ffff948bb7a168c8 R09: 00000000000002be
[    2.720576] R10: ffffffffb09cc158 R11: ffffa53b010bb635 R12: 0000000000200000
[    2.721842] R13: 0000000000000001 R14: 0000000000000000 R15: ffff948bae869000
[    2.721845] FS:  00007f6067e0d940(0000) GS:ffff948bb7a00000(0000) knlGS:0000000000000000
[    2.721846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.721847] CR2: 00007f6068c21fe0 CR3: 0000800270202000 CR4: 00000000003406f0
[    2.721849] Call Trace:
[    2.721855]  dma_direct_map_page+0xdf/0xf0
[    2.721858]  dma_direct_map_sg+0x67/0xb0
[    2.721866]  virtio_gpu_object_attach+0x1fb/0x250 [virtio_gpu]
[    2.721872]  virtio_gpu_mode_dumb_create+0xaa/0xd0 [virtio_gpu]
[    2.721884]  drm_client_framebuffer_create+0xa1/0x220 [drm]
[    2.721893]  drm_fb_helper_generic_probe+0x4d/0x1f0 [drm_kms_helper]
[    2.721899]  __drm_fb_helper_initial_config_and_unlock+0x29b/0x430 [drm_kms_helper]
[    2.721906]  drm_fbdev_client_hotplug+0xea/0x160 [drm_kms_helper]
[    2.721911]  drm_fbdev_generic_setup+0x93/0x120 [drm_kms_helper]
[    2.721915]  virtio_gpu_probe+0xe8/0x100 [virtio_gpu]
[    2.721918]  virtio_dev_probe+0x142/0x1e0
[    2.721921]  really_probe+0xf9/0x3a0
[    2.721923]  driver_probe_device+0xb6/0x100
[    2.721925]  device_driver_attach+0x55/0x60
[    2.721926]  __driver_attach+0x8a/0x150
[    2.721928]  ? device_driver_attach+0x60/0x60
[    2.721929]  bus_for_each_dev+0x78/0xc0
[    2.721931]  bus_add_driver+0x14a/0x1e0
[    2.721932]  driver_register+0x6c/0xb0
[    2.721933]  ? 0xffffffffc03fe000
[    2.721936]  do_one_initcall+0x46/0x1c4
[    2.721938]  ? free_unref_page_commit+0x95/0x110
[    2.721941]  ? _cond_resched+0x15/0x30
[    2.721943]  ? kmem_cache_alloc_trace+0x154/0x1c0
[    2.721945]  ? do_init_module+0x23/0x210
[    2.721946]  do_init_module+0x5c/0x210
[    2.721947]  load_module+0x23de/0x2910
[    2.721948]  ? _cond_resched+0x15/0x30
[    2.721950]  ? __do_sys_init_module+0x162/0x190
[    2.721951]  __do_sys_init_module+0x162/0x190
[    2.721953]  do_syscall_64+0x5b/0x170
[    2.721955]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.721956] RIP: 0033:0x7f6068e0ebae
[    2.721959] Code: 48 8b 0d dd 42 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 42 0c 00 f7 d8 64 89 01 48
[    2.721959] RSP: 002b:00007ffdb3b4b068 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    2.721960] RAX: ffffffffffffffda RBX: 00005642a1a89480 RCX: 00007f6068e0ebae
[    2.721961] RDX: 00007f6068a6384d RSI: 000000000001d9be RDI: 00005642a2328a50
[    2.721961] RBP: 00005642a2328a50 R08: 00005642a1a89e50 R09: 0000000000000006
[    2.721962] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f6068a6384d
[    2.721962] R13: 0000000000000001 R14: 00005642a1a7c1e0 R15: 00005642a1a7e390
[    2.721964] ---[ end trace f77acc239227cf53 ]---
[    2.775462] fbcon: Deferring console take-over
[    2.776320] virtio_gpu virtio0: fb0: DRM emulated frame buffer device

Comment 24 Gerd Hoffmann 2019-06-05 06:57:04 UTC

fresh brew build
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22016408

Comment 25 Guo, Zhiyi 2019-06-05 07:01:11 UTC

(In reply to Gerd Hoffmann from comment #24)
> fresh brew build
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22016408

Building is failed:
Error: 
 Problem: package systemtap-4.1-2.el8.x86_64 requires systemtap-client = 4.1-2.el8, but none of the providers can be installed
  - package systemtap-client-4.1-2.el8.x86_64 requires systemtap-runtime = 4.1-2.el8, but none of the providers can be installed
  - conflicting requests
  - nothing provides libdyninstAPI.so.9.3()(64bit) needed by systemtap-runtime-4.1-2.el8.x86_64

Comment 26 Gerd Hoffmann 2019-06-05 07:17:41 UTC

[    2.701683] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes)
[    2.703163] virtio-pci 0000:00:01.0: overflow 0x000080026ea00000+2097152 of DMA mask ffffffffffffffff bus mask 0

[    2.705797] WARNING: CPU: 0 PID: 429 at kernel/dma/direct.c:43 report_addr+0x33/0x60

[    2.721849] Call Trace:
[    2.721855]  dma_direct_map_page+0xdf/0xf0
[    2.721858]  dma_direct_map_sg+0x67/0xb0
[    2.721866]  virtio_gpu_object_attach+0x1fb/0x250 [virtio_gpu]
[    2.721872]  virtio_gpu_mode_dumb_create+0xaa/0xd0 [virtio_gpu]
[    2.721884]  drm_client_framebuffer_create+0xa1/0x220 [drm]

Fails over while trying to map the framebuffer for the console.
Which is 1024x768 @ 32bpp by default -> 3145728 bytes.
That is larger than the whole swiotlb buffer size (see first line).

There is the "swiotlb=<size>" command line argument to configure
the size.  Does it help to make it larger?

Comment 27 Guo, Zhiyi 2019-06-05 08:06:44 UTC

From the https://github.com/AMDESE/AMDSEV, here is a description about swiotlb size:
"When SEV is enabled, all the DMA operations inside the guest are performed on the shared memory. Linux kernel uses SWIOTLB bounce buffer for DMA operations inside SEV guest. A guest panic will occur if kernel runs out of the SWIOTLB pool. Linux kernel default to 64MB SWIOTLB pool. It is recommended to increase the swiotlb pool size to 512MB. The swiotlb pool size can be increased in guest by appending the following in the grub.cfg file

Append the following in /etc/defaults/grub

GRUB_CMDLINE_LINUX_DEFAULT=".... swiotlb=262144"

Change the size to 1GB(swiotlb=1048576) will always cause kernel panic(I tried 5 times):

[    0.205306] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:146 phys_pud_init+0x31e/0x396                                                                                  
[    0.205307] Modules linked in:                                                                                                                                                      
[    0.205310] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.5-300.fc30.x86_64 #1                                                                                                        
[    0.205311] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015                                                                                                                       
[    0.205313] RIP: 0010:phys_pud_init+0x31e/0x396                                                                                                                                     
[    0.205315] Code: 2b 3d e1 2d 7f 00 48 01 ef 48 0b 3d 2f 19 8a 00 48 83 cf 67 ff 14 25 10 e4 22 8d 48 8b 3b 48 89 c6 e8 5d 16 6d ff 85 c0 75 02 <0f> 0b 48 8b 3d 7c 91 87 00 4d 85 ff 75 0e 48 c7 c7 00 00 00 80 48                                                                                                                                                        
[    0.205316] RSP: 0000:ffffffff8d203e10 EFLAGS: 00010046 ORIG_RAX: 0000000000000000                                                                                                  
[    0.205318] RAX: 0000000000000000 RBX: ffff89f906e01f68 RCX: ffffffffffffffff                                                                                                       
[    0.205319] RDX: 80008002400001e3 RSI: 0000800006e06067 RDI: 0000800006e06067                                                                                                       
[    0.205320] RBP: ffff89f986e06000 R08: 8000800000000163 R09: 0000000000000092                                                                                                       
[    0.205321] R10: ffffffff8d9c05b4 R11: 0000000000000007 R12: 0000000280000000                                                                                                       
[    0.205322] R13: 0000000280000000 R14: 0000000000000004 R15: 0000000000000000                                                                                                       
[    0.205328] FS:  0000000000000000(0000) GS:ffff89fb77a00000(0000) knlGS:0000000000000000                                                                                            
[    0.205329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                       
[    0.205330] CR2: ffff89fb7ffff000 CR3: 000080000620e000 CR4: 00000000000406b0                                                                                                       
[    0.205335] Call Trace:                                                                                                                                                             
[    0.205340]  kernel_physical_mapping_init+0xc9/0x259                                                                                                                                
[    0.205344]  early_set_memory_enc_dec+0x122/0x176                                                                                                                                   
[    0.205348]  kvm_smp_prepare_boot_cpu+0x65/0x95                                                                                                                                     
[    0.205351]  start_kernel+0x1cf/0x512                                                                                                                                               
[    0.205355]  secondary_startup_64+0xa4/0xb0                                                                                                                                         
[    0.205358] ---[ end trace 372224d2e1643db5 ]---                                                                                                                                    
[    0.205369] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 phys_pmd_init+0x309/0x380                                                                                   
[    0.205370] Modules linked in:                                                                                                                                                      
[    0.205371] CPU: 0 PID: 0 Comm: swapper Tainted: G        W         5.1.5-300.fc30.x86_64 #1                                                                                        
[    0.205372] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015                                                                                                                       
[    0.205373] RIP: 0010:phys_pmd_init+0x309/0x380                                                                                                                                     
[    0.205375] Code: 2b 3d 76 31 7f 00 4c 01 ef 48 0b 3d c4 1c 8a 00 48 83 cf 67 ff 14 25 00 e4 22 8d 48 8b 3b 48 89 c6 e8 d4 19 6d ff 85 c0 75 02 <0f> 0b 48 8b 3d 11 95 87 00 48 85 ed 75 0e 48 c7 c7 00 00 00 80 48
[    0.205376] RSP: 0000:ffffffff8d203da8 EFLAGS: 00010046 ORIG_RAX: 0000000000000000                                                                                                  
[    0.205377] RAX: 0000000000000000 RBX: ffff89f906e06de8 RCX: ffffffffffffffff                                                                                                       
[    0.205378] RDX: 8000800277a001e3 RSI: 0000800006e07067 RDI: 0000800006e07067                                                                                                       
[    0.205379] RBP: 0000000000000000 R08: ffff89f906e07000 R09: 00000000000000a9                                                                                                       
[    0.205379] R10: ffffffff8d9c0c54 R11: 0000000000000007 R12: 0000000277c00000                                                                                                       
[    0.205380] R13: ffff89f986e07000 R14: 0000000277c00000 R15: 0000000277c00000                                                                                                       
[    0.205385] FS:  0000000000000000(0000) GS:ffff89fb77a00000(0000) knlGS:0000000000000000                                                                                            
[    0.205386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                       
[    0.205387] CR2: ffff89fb7ffff000 CR3: 000080000620e000 CR4: 00000000000406b0                                                                                                       
[    0.205390] Call Trace:                                                                                                                                                             
[    0.205392]  phys_pud_init+0x165/0x396                                                                                                                                              
[    0.205394]  kernel_physical_mapping_init+0xc9/0x259                                                                                                                                
[    0.205396]  early_set_memory_enc_dec+0x122/0x176                                                                                                                                   
[    0.205398]  kvm_smp_prepare_boot_cpu+0x65/0x95                                                                                                                                     
[    0.205400]  start_kernel+0x1cf/0x512                                                                                                                                               
[    0.205402]  secondary_startup_64+0xa4/0xb0                                                                                                                                         
[    0.205404] ---[ end trace 372224d2e1643db6 ]---                                                                                                                                    
[    0.205464] KVM setup async PF for cpu 0                                                                                                                                            
[    0.205474] kvm-stealtime: cpu 0, msr 277a23040                                                                                                                                     
[    0.205480] Built 1 zonelists, mobility grouping on.  Total pages: 2058073                                                                                                          
[    0.205481] Policy zone: Normal                                                                                                                                                     
[    0.205484] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.1.5-300.fc30.x86_64 root=/dev/mapper/fedora_bootp--73--227--160-root ro resume=/dev/mapper/fedora_bootp--73--227--160-swap rd.lvm.lv=fedora_bootp-73-227-160/root rd.lvm.lv=fedora_bootp-73-227-160/swap console=ttyS0,115200 swiotlb=1048576                                                             
[    0.205604] software IO TLB: Cannot allocate buffer  
...
[    1.486031] Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer                                                
[    1.486652] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.1.5-300.fc30.x86_64 #1                                                                                      
[    1.486652] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015                                                                                                                       
[    1.486652] Call Trace:                                                                                                                                                             
[    1.486652]  dump_stack+0x5c/0x80                                                                                                                                                   
[    1.486652]  panic+0x101/0x2a7                                                                                                                                                      
[    1.486652]  swiotlb_tbl_map_single.cold+0x28/0x28                                                                                                                                  
[    1.486652]  swiotlb_map+0x65/0x190                                                                                                                                                 
[    1.486652]  ? vp_synchronize_vectors+0x60/0x60                                                                                                                                     
[    1.486652]  ? virtrng_scan+0x30/0x30                                                                                                                                               
[    1.486652]  dma_direct_map_page+0xbe/0xf0                                                                                                                                          
[    1.486652]  virtqueue_add_inbuf+0x21a/0x670                                                                                                                                        
[    1.486652]  virtio_read+0xae/0xd0                                                                                                                                                  
[    1.486652]  add_early_randomness+0x4f/0xc0                                                                                                                                         
[    1.486652]  set_current_rng+0x4c/0x140                                                                                                                                             
[    1.486652]  hwrng_register+0x161/0x180                                                                                                                                             
[    1.486652]  virtrng_scan+0x15/0x30                                                                                                                                                 
[    1.486652]  virtio_dev_probe+0x174/0x1e0                                                                                                                                           
[    1.486652]  really_probe+0xf9/0x3a0                                                                                                                                                
[    1.486652]  driver_probe_device+0xb6/0x100                                                                                                                                         
[    1.486652]  device_driver_attach+0x55/0x60                                                                                                                                         
[    1.486652]  __driver_attach+0x8a/0x150                                                                                                                                             
[    1.486652]  ? device_driver_attach+0x60/0x60                                                                                                                                       
[    1.486652]  bus_for_each_dev+0x78/0xc0                                                                                                                                             
[    1.486652]  bus_add_driver+0x14a/0x1e0                                                                                                                                             
[    1.486652]  driver_register+0x6c/0xb0                                                                                                                                              
[    1.486652]  ? hwrng_modinit+0x82/0x82                                                                                                                                              
[    1.486652]  do_one_initcall+0x46/0x1c4                                                                                                                                             
[    1.486652]  kernel_init_freeable+0x1a6/0x24d                                                                                                                                       
[    1.486652]  ? rest_init+0xaa/0xaa                                                                                                                                                  
[    1.486652]  kernel_init+0xa/0x106                                                                                                                                                  
[    1.486652]  ret_from_fork+0x22/0x40                                                                                                                                                
[    1.486652] Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)                                                              
[    1.486652] ---[ end Kernel panic - not syncing: Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer ]---  

Change the size to 512MB(swiotlb=524288) has two results. In 5 trials, kernel panic(logs are same as 1GB swiotlb pool) happen 2 times.
For the rest of trials, the behaviors are same as comment 22

Change the size to 768MB(swiotlb=786432) has the same behaviors as 512MB

Comment 28 Gerd Hoffmann 2019-06-05 10:53:08 UTC

(In reply to Guo, Zhiyi from comment #27)
> From the https://github.com/AMDESE/AMDSEV, here is a description about
> swiotlb size:
> "When SEV is enabled, all the DMA operations inside the guest are performed
> on the shared memory. Linux kernel uses SWIOTLB bounce buffer for DMA
> operations inside SEV guest. A guest panic will occur if kernel runs out of
> the SWIOTLB pool. Linux kernel default to 64MB SWIOTLB pool.

That seems not to be the case with the fedora kernel though,
the error message (comment 26) indicates a 2MB pool.

> Change the size to 1GB(swiotlb=1048576) will always cause kernel panic(I
> tried 5 times):

1G is a bit excessive.

> Change the size to 512MB(swiotlb=524288) has two results. In 5 trials,
> kernel panic(logs are same as 1GB swiotlb pool) happen 2 times.
> For the rest of trials, the behaviors are same as comment 22

Hmm.  Something wrong with the fedora kernel it seems,
ignoring the request for a larger swiotlb ...

I guess we should wait for the 8.1 drm rebase then
instead of debugging fedora.

Comment 29 Gerd Hoffmann 2019-06-05 10:54:30 UTC

> Building is failed:
> Error: 
>  Problem: package systemtap-4.1-2.el8.x86_64 requires systemtap-client =
> 4.1-2.el8, but none of the providers can be installed
>   - package systemtap-client-4.1-2.el8.x86_64 requires systemtap-runtime =
> 4.1-2.el8, but none of the providers can be installed
>   - conflicting requests
>   - nothing provides libdyninstAPI.so.9.3()(64bit) needed by
> systemtap-runtime-4.1-2.el8.x86_64

Doesn't look related to my patches.
Guess I just try again tomorrow.

Comment 30 Gerd Hoffmann 2019-06-13 06:28:39 UTC

> Doesn't look related to my patches.
> Guess I just try again tomorrow.

Grr, now the build fails somewhere in gluster ...

Comment 33 Guo, Zhiyi 2019-06-18 06:35:32 UTC

(In reply to Gerd Hoffmann from comment #30)
> > Doesn't look related to my patches.
> > Guess I just try again tomorrow.
> 
> Grr, now the build fails somewhere in gluster ...

Do we have a scratch build now?

BR/
Zhiyi, Guo

Comment 34 Gerd Hoffmann 2019-06-18 07:05:14 UTC

No, build still fails in block/gluster.c (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441).

Comment 36 Gerd Hoffmann 2019-06-27 05:07:24 UTC

(In reply to Gerd Hoffmann from comment #34)
> No, build still fails in block/gluster.c
> (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441).

Gluster fixed, finally worked:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22367789

Comment 38 Guo, Zhiyi 2019-07-03 08:52:27 UTC

(In reply to Gerd Hoffmann from comment #36)
> (In reply to Gerd Hoffmann from comment #34)
> > No, build still fails in block/gluster.c
> > (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22222441).
> 
> Gluster fixed, finally worked:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22367789

Hmm..I forgot to pick the pkgs, can you help to rebuild them? Thanks!

BR/
Zhiyi, Guo

Comment 39 Gerd Hoffmann 2019-07-04 08:29:50 UTC

> Hmm..I forgot to pick the pkgs, can you help to rebuild them? Thanks!

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22489228

Comment 40 Gerd Hoffmann 2019-07-04 10:08:09 UTC

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22490549 (one patch added).

Comment 44 Guo, Zhiyi 2019-07-11 10:14:51 UTC

Created attachment 1589386 [details]
ovmf log

hmm.. Enable sev & IOMMU for all of the virtio devices I used, rhel 8.1 vm stuck at ovmf and cannot process..

Remove sev option doesn't give any help here..

Upload the ovmf log

Comment 45 Laszlo Ersek 2019-07-11 19:28:01 UTC

Hello Zhiyi,

(1) the log in comment 44 indicates that SEV was not enabled:

> InstallProtocolInterface: F8775D50-8ABD-4ADF-92AC-853E51F6C8DC 0

the GUID on that line stands for:

> InstallProtocolInterface: [IoMmuAbsentProtocol] 0

This is done by AmdSevDxe in OVMF if SEV is not enabled.

(2) when you use a virtio-vga device with OVMF, OVMF will not drive the
device via virtio; it will drive the "VGA" part. That too should work
with SEV, of course; I just wanted to clarify this.

If you specifically want to test VirtioGpuDxe in OVMF, you'll have to
request the virtio-gpu-pci device on the QEMU cmdline. (Note that when
using libvirt (with x86 guests), this is not possible; with x86 guests,
"virtio" in the domain XML's video element maps to the "virtio-vga"
device model only.)

(3) The log in comment 44 ends with:

> InstallProtocolInterface: [VirtioDeviceProtocol] 7E02ED20
> InstallProtocolInterface: [EfiSimpleNetworkProtocol] 7E02D028
> InstallProtocolInterface: [EfiDevicePathProtocol] 7E02F198
> InstallProtocolInterface: [EfiVlanConfigProtocol] 7E02D930
> InstallProtocolInterface: [EfiManagedNetworkServiceBindingProtocol] 7DCBBBC0
> InstallProtocolInterface: [EfiDevicePathProtocol] 7DCBBE18
> InstallProtocolInterface: [EfiHiiConfigAccessProtocol] 7DCBB120
> InstallProtocolInterface: [VlanConfigDxe] 7DCBB118
> InstallProtocolInterface: [EfiManagedNetworkProtocol] 7DCBB2C0

Implying that the problem is related to the virtio-net NIC.

The most recent QEMU command line is ~1 month old, from comment 22. Can
you please include the cmdline that you used in comment 44?

Anyway, please note that we've discussed this stuff before:
- https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c77
- https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c79

Therefore, can you please ensure the following:

(3.1) The virtio devices should have the "disable-legacy=on" property.

      (Spelling this out should not be necessary as long as the device
      is located in a PCI Express Root Port, or PCIe switch downstream
      port -- because in that case, virtio-0.9.5 compat is disabled
      automatically; i.e., the device becomes modern-only
      automatically.)

(3.2) The virtio devices should have the "iommu_platform=true" property.

(3.3) "-netdev tap" should have the "vhost=off" property.

Thanks.

Comment 46 Laszlo Ersek 2019-07-11 19:40:01 UTC

(3.4) Please also add "romfile=''" to "-device virtio-net-pci". See:
      - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c81
      - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c84
      - https://bugzilla.redhat.com/show_bug.cgi?id=1361286#c88

Comment 47 Laszlo Ersek 2019-07-12 00:23:46 UTC

host:
- edk2-20190308git89910a39dcfd-5.el8              [buildID=923901]
- glusterfs-6.0-7.el8                             [buildID=920471]
- kernel-4.18.0-114.el8                           [buildID=928444]
- libseccomp-2.4.1-1.el8                          [buildID=909913]
- libvirt-4.5.0-30.module+el8.1.0+3574+3a63752b   [buildID=924456]
- qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01 [buildID=926959]
- spice-0.14.2-1.el8                              [buildID=899194]

host config:
- add "options kvm_amd sev=1" to "/etc/modprobe.d/kvm.conf"
- run "dracut -f"
- reboot, or manually reload "kvm_amd"

guest config:
- see subsequent attached domain XML
- and subsequent generated QEMU command line

guest workload:
- RHEL-8.1.0-20190701.0 (= RHEL-8.1.0-Beta-1.1) installer ISO
- implies guest kernel 4.18.0-107.el8
- in grub2, extend the kernel command line with "swiotlb=262144" (512MB)
- also append: "ignore_loglevel console=tty console=ttyS0,115200n8"

Result:

OVMF works fine, but then I get identical results to comment 27 /
comment 28. The large swiotlb is allocated successfully, but then 2MB
allocations cannot be satisfied:

> [    6.136727] PCI-DMA: Using software bounce buffering for IO
>                (SWIOTLB)
> [    6.137670] software IO TLB: mapped [mem 0x5afbe000-0x7afbe000]
>                (512MB)
> ...
> [    6.278992] software IO TLB: SEV is active and system is using DMA
>                bounce buffers
> ...
> [    8.359505] virtio-pci 0000:00:02.0: swiotlb buffer is full (sz:
>                2097152 bytes)
> [    8.360620] virtio-pci 0000:00:02.0: overflow
>                0x00008001e6200000+2097152 of DMA mask ffffffffffffffff
>                bus mask 0
> [    8.362130] WARNING: CPU: 2 PID: 852 at kernel/dma/direct.c:43
>                dma_direct_map_page+0xfb/0x160
> ...
> [    8.420552] Console: switching to colour frame buffer device 128x48
> [    8.423468] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR*
>                response 0x1203 (command 0x105)
> ...

Because of this, the guest kernel can write to the serial console, but
not to the graphical display.

Comment 48 Laszlo Ersek 2019-07-12 00:24:44 UTC

Created attachment 1589703 [details]
domain XML for comment 47

Comment 49 Laszlo Ersek 2019-07-12 00:25:29 UTC

Created attachment 1589704 [details]
QEMU cmdline (by libvirt) for comment 47

Comment 50 Laszlo Ersek 2019-07-13 12:47:51 UTC

(In reply to Laszlo Ersek from comment #45)

> (2) when you use a virtio-vga device with OVMF, OVMF will not drive
> the device via virtio; it will drive the "VGA" part. That too should
> work with SEV, of course; I just wanted to clarify this.
>
> If you specifically want to test VirtioGpuDxe in OVMF, you'll have to
> request the virtio-gpu-pci device on the QEMU cmdline. (Note that when
> using libvirt (with x86 guests), this is not possible; with x86
> guests, "virtio" in the domain XML's video element maps to the
> "virtio-vga" device model only.)

The following (unsupported) domain XML snippet can force the
virtio-gpu-pci device:

<domain type='kvm'
 xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.video0.driver=virtio-gpu-pci'/>
  </qemu:commandline>
</domain>

(Note that the xmlns:qemu attribute (namespace definition) in the root
element is required for the <qemu:commandline> element to work. )

This is useful for unit testing the fix for this RHBZ, with libvirt,
until the guest kernel issue described in comment 47 is corrected.

When using the raw QEMU cmdline, simply change "-device virtio-vga" to
"-device virtio-gpu-pci".

With the above tweak, I managed to verify
"qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01": OVMF produces
graphical output through VirtioGpuDxe, so the device model works fine.

(If we also add

    <qemu:arg value='-global'/>
    <qemu:arg value='isa-debugcon.iobase=0x402'/>
    <qemu:arg value='-debugcon'/>
    <qemu:arg value='file:/tmp/ovmf.rhel8.sev.q35.log'/>

to the domain XML, then "/tmp/ovmf.rhel8.sev.q35.log" will contain the
key message

> VirtioGpuDriverBindingStart: produced GOP while binding VirtIo=7DE859A0

FWIW note that enabling the QEMU debug console in a SEV guest has a
serious impact on firmware boot duration -- the impact is more severe
than without SEV.)

Comment 51 Laszlo Ersek 2019-07-13 17:04:50 UTC

(1) The swiotlb machinery limits the maximum contiguous allocation size
(i.e., individual bounce buffer size) to 256 KB. From
"include/linux/swiotlb.h":

> /*
>  * Maximum allowable number of contiguous slabs to map,
>  * must be a power of 2.  What is the appropriate value ?
>  * The complexity of {map,unmap}_single is linearly dependent on this value.
>  */
> #define IO_TLB_SEGSIZE	128
>
> /*
>  * log of the size of each IO TLB slab.  The number of slabs is command line
>  * controllable.
>  */
> #define IO_TLB_SHIFT 11

And from swiotlb_init_with_tbl() [kernel/dma/swiotlb.c]:

> 	/*
> 	 * Allocate and initialize the free list array.  This array is used
> 	 * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
> 	 * between io_tlb_start and io_tlb_end.
> 	 */


(2) If I add "tp_printk trace_event=swiotlb_bounced" to the guest kernel
command line, then every bounce buffer allocation *request* is logged
(regardless of outcome):

dma_direct_map_sg()           [kernel/dma/direct.c]
  dma_direct_map_page()       [kernel/dma/direct.c]
    swiotlb_map()             [kernel/dma/swiotlb.c]
      trace_swiotlb_bounced() <-- LOGGED HERE

Such mapping requests work fine initially (from multiple virtio
devices); the requested buffer sizes are minimal -- there is nothing
larger than a single page (4KB). Even the virtio-gpu driver submits a
number of such small (and successful) requests:

> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e898e7c8 size=24 FORCE
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e8b7ee00 size=408 FORCE
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001eb0acd20 size=32 FORCE
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e80afc48 size=40 FORCE
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e80afc70 size=24 FORCE
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e80e2f20 size=32 FORCE

(

  Side comment: please ignore the FORCE parameter at the end. That is
  *wrong*. I think it is an issue with the tracing itself.

  Because, FORCE would correspond to "swiotlb=force" ("force using of
  bounce buffers even if they wouldn't be automatically used by the
  kernel"), and that was set neither on the kernel command line, nor
  programmatically in the kernel.

)


(3) Some time after these small bounce buffer allocations, virtio-gpu
requests a 2MB bounce buffer:

[a]
> swiotlb_bounced: dev_name: 0000:00:02.0 dma_mask=ffffffffffffffff
>                  dev_addr=8001e5c00000 size=2097152 FORCE

This cannot succeed regardless of the full swiotlb size, because it
exceeds the individual contiguous bounce buffer limit (which is 256KB).

The call tree and the kernel messages related to this allocation failure
are listed below. The limit is exceeded in the innermost
swiotlb_tbl_map_single() function.

virtio_gpu_mode_dumb_create()          [drivers/gpu/drm/virtio/virtgpu_gem.c]
  virtio_gpu_object_attach()           [drivers/gpu/drm/virtio/virtgpu_vq.c]
    dma_map_sg()                       [include/linux/dma-mapping.h]
      dma_map_sg_attrs()               [include/linux/dma-mapping.h]
        dma_direct_map_sg()            [kernel/dma/direct.c]
          dma_direct_map_page()        [kernel/dma/direct.c]
            swiotlb_map()              [kernel/dma/swiotlb.c]
              trace_swiotlb_bounced()
                LOGS [a]
              swiotlb_tbl_map_single() [kernel/dma/swiotlb.c]
                LOGS [b]
            report_addr()              [kernel/dma/direct.c]
              LOGS [c]

[b]
> virtio-pci 0000:00:02.0: swiotlb buffer is full (sz: 2097152 bytes)

[c]
> virtio-pci 0000:00:02.0: overflow 0x00008001e5c00000+2097152 of DMA
>                          mask ffffffffffffffff bus mask 0

(

  Some side comments on the log messages. First, the address
  8001e5c00000 -- shown in both messages [a] and [c] -- has bit#47 set,
  which stands for "encrypted" (the "C" bit of SEV).

  Second, messages [b] and [c] are misleading -- we don't run out of
  swiotlb space, we exceed the individual bounce buffer limit instead.

)

(

  Another side comment: the bounce buffer allocation failure is
  propagated out of dma_map_sg(), as return value zero. However,
  virtio_gpu_object_attach() does not check, or propagate further, that
  return value. Ideally, the virtio-gpu driver should fail the
  initialization at this point.

)


(4) In the upstream kernel, this issue has been identified and fixed,
although for virtio-blk only (not virtio-gpu). Please refer to the
following commits, part of v5.1:

  1 abe420bfae52 swiotlb: Introduce swiotlb_max_mapping_size()
  2 492366f7b423 swiotlb: Add is_swiotlb_active() function
  3 133d624b1cee dma: Introduce dma_max_mapping_size()
  4 e6d6dd6c875e virtio: Introduce virtio_max_dma_size()
  5 fd1068e1860e virtio-blk: Consider virtio_max_dma_size() for maximum
                 segment size

(

  Side comment: the documentation introduced in commit 133d624b1cee
  (i.e. patch#3) has been touched-up later in commit 99d2b9386729
  ("Documentation: DMA-API: fix a function name of max_mapping_size",
  2019-06-07), which isn't part of any released kernel yet. But, that's
  just a simple typo fix and not related to the core issue.

)

In the upstream kernel, this work specifically targeted SEV guests:

> Looks good. Booted and tested using an SEV guest without any issues.
>
> Tested-by: Tom Lendacky <thomas.lendacky>

(Archived at:
- https://lkml.org/lkml/2019/1/30/816
- https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg33545.html
)


(5) So I *speculate* the following kernel patch (on top of upstream
v5.2-7765-g964a4eacef67) might fix the issue (not even build tested
yet):

> commit 9821816e5f960b59232d23911556e7e2662ddc48
> Author: Laszlo Ersek <lersek>
> Date:   Sat Jul 13 18:47:05 2019 +0200
>
>     drm/virtio: limit each scatterlist node by virtio_max_dma_size()
>
>     SWIOTLB limits individual bounce buffers to 256KB in size. This limit can
>     interfere with virtio drivers requesting larger (contiguous) DMA mappings,
>     for example when they run in SEV guests.
>
>     The issue was previously solved for virtio-blk in commit range
>     1c163f4c7b3f..fd1068e1860e.
>
>     Fix the problem for virtio-gpu as well, by limiting each scatterlist node
>     to virtio_max_dma_size(), in virtio_gpu_object_get_sg_table().
>
>     Cc: Daniel Vetter <daniel>
>     Cc: David Airlie <airlied>
>     Cc: Gerd Hoffmann <kraxel>
>     Cc: Joerg Roedel <jroedel>
>     Cc: Tom Lendacky <thomas.lendacky>
>     Cc: dri-devel.org
>     Cc: linux-kernel.org
>     Cc: virtualization.org
>     Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1531543
>     Signed-off-by: Laszlo Ersek <lersek>
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c b/drivers/gpu/drm/virtio/virtgpu_object.c
> index b2da31310d24..524c783cffd4 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -201,25 +201,32 @@ int virtio_gpu_object_get_sg_table(struct virtio_gpu_device *qdev,
>  	struct page **pages = bo->tbo.ttm->pages;
>  	int nr_pages = bo->tbo.num_pages;
>  	struct ttm_operation_ctx ctx = {
>  		.interruptible = false,
>  		.no_wait_gpu = false
>  	};
> +	size_t max_segment;
>
>  	/* wtf swapping */
>  	if (bo->pages)
>  		return 0;
>
>  	if (bo->tbo.ttm->state == tt_unpopulated)
>  		bo->tbo.ttm->bdev->driver->ttm_tt_populate(bo->tbo.ttm, &ctx);
>  	bo->pages = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
>  	if (!bo->pages)
>  		goto out;
>
> -	ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
> -					nr_pages << PAGE_SHIFT, GFP_KERNEL);
> +	max_segment = virtio_max_dma_size();
> +	max_segment &= ~(size_t)(PAGE_SIZE - 1);
> +	if (max_segment > SCATTERLIST_MAX_SEGMENT) {
> +		max_segment = SCATTERLIST_MAX_SEGMENT;
> +	}
> +	ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
> +					  nr_pages << PAGE_SHIFT,
> +					  (unsigned)max_segment, GFP_KERNEL);
>  	if (ret)
>  		goto out;
>  	return 0;
>  out:
>  	kfree(bo->pages);
>  	bo->pages = NULL;

Comment 52 Laszlo Ersek 2019-07-13 21:42:36 UTC

Should be

	max_segment = virtio_max_dma_size(qdev->vdev);

but even with this typo fixed, the guest kernel driver doesn't work. The error messages are gone, but the screen stays blank.

Comment 53 Laszlo Ersek 2019-07-15 10:18:45 UTC

Other stuff I tried on the weekend:

* Adding "tp_printk trace_event=swiotlb_bounced" to the command line of
  the guest kernel, from comment 51 bullet (5), confirms that the 2MB
  buffer from TTM is mapped by 8 bounce buffers, handed out by swiotlb,
  each 256KB in size.

* Built & booted upstream kernels v4.19 and v4.20 in the guest. Neither
  works. Strange for two reasons:

  (a) The presentation titled "AMD SEV Update / Linux Security Summit
      2018" [AMD-Encrypted-Virtualization-Update-David-Kaplan-AMD.pdf]
      says VirtIO-GPU support is available in Linux 4.19/QEMU 3.1 (slide
      6).

  (b) Kernel commit 8f44ca223345 ("drm/virtio: add dma sync for dma
      mapped virtio gpu framebuffer pages", 2018-09-19) is part of
      v4.20.

  At this point I feel compelled to think that (virtio-gpu + SEV) must
  never have worked in the Linux guest. O_o

* Logged of virtio-gpu trace points in qemu-kvm (see version in comment
  47), with systemtap. Operations of the guest kernel from comment 51
  bullet (5) looked valid, from the trace.

* Turned on guest error logging in qemu-kvm ("-d guest_errors"). Nothing
  was logged.

* Disabled SEV, but kept "iommu_platform" enabled in the domain XML.
  Also kept the half gig swiotlb in the guest kernel. This way the guest
  driver would continue using the DMA-API but those DMA ops wouldn't be
  backed by SEV logic. Result: everything worked fine. Confusing: that
  suggests the bug is with SEV code in the guest -- but in that case,
  why didn't I get garbage (= encrypted data), instead of a blank
  screen, when SEV was enabled?

* Re: side remark in comment 51 bullet (2): learned from code inspection
  that SEV enablement does force swiotlb (see
  "arch/x86/mm/mem_encrypt.c"), so the "swiotlb_bounced" tracepoint
  didn't lie after all.

Comment 54 Guo, Zhiyi 2019-07-17 07:11:54 UTC

Hi Laszlo,

   Thanks for the debug progress from comment 45 ~ comment 53 :) I'm using a beaker job to configure and test sev automatically, if you have interests, please refer to a task can do these automatically: http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/virt/sev/guest-sev-setup
And here is the reference beaker job xml for using this task:
https://beaker.engineering.redhat.com/jobs/3666425(slow train)
https://beaker.engineering.redhat.com/jobs/3617539(fast train)

For comment 44, my changes made the task /kernel/virt/sev/guest-sev-setup broken:( After fixing the problem, I can boot sev guest + virito devices with iommu enabled.

I can also confirm kernel will get call trace and prompt error message repeatedly:
[    6.388096] [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0                                                                                             
[    6.390362] virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 2097152 bytes)                                                                                     
[    6.393213] virtio-pci 0000:00:01.0: overflow 0x0000800272400000+2097152 of DMA mask ffffffffffffffff bus mask 0                                                    
[    6.395180] WARNING: CPU: 0 PID: 791 at kernel/dma/direct.c:43 dma_direct_map_page+0xfb/0x160                                                                       
[    6.396163] Modules linked in: crct10dif_pclmul snd_hda_codec_generic(+) crc32_pclmul ledtrig_audio snd_hda_intel snd_hda_codec virtio_gpu(+) ttm ghash_clmulni_intel snd_hda_core snd_hwdep drm_kms_helper snd_pcm pcspkr snd_timer syscopyarea snd sysfillrect i2c_i801 sysimgblt fb_sys_fops sg drm soundcore lpc_ich virtio_input virtio_balloon xfs libcrc32c sd_mod ahci libahci libata crc32c_intel serio_raw virtio_blk virtio_console virtio_net virtio_scsi net_failover failover dm_mirror dm_region_hash dm_log dm_mod                                                                                                                                                       
[    6.401174] CPU: 0 PID: 791 Comm: systemd-udevd Tainted: G        W        --------- -  - 4.18.0-115.el8.x86_64 #1                                                  
[    6.401174] Hardware name: Red Hat KVM, BIOS 0.0.0 02/06/2015                                                                                                       
[    6.401174] RIP: 0010:dma_direct_map_page+0xfb/0x160                                                                                                                
[    6.401174] Code: 83 78 02 00 00 48 85 c0 74 30 4c 8b 00 b8 fe ff ff ff 49 39 c0 77 0a 48 83 bb 88 02 00 00 00 74 09 80 3d 6b 83 2d 01 00 74 31 <0f> 0b 48 c7 c0 ff ff ff ff eb 8d 48 89 ca eb 83 80 3d 53 83 2d 01                                                                                                                        
[    6.401174] RSP: 0018:ffffa72101217940 EFLAGS: 00010246                                                                                                             
[    6.401174] RAX: 0000000000000000 RBX: ffff8fb507c380b0 RCX: 0000000000000000                                                                                       
[    6.401174] RDX: ffff8fb677a1ee00 RSI: ffff8fb677a16a08 RDI: ffff8fb677a16a08                                                                                       
[    6.401174] RBP: 0000000000200000 R08: 0000000000000325 R09: ffff8fb675030f00                                                                                       
[    6.401174] R10: 0720072007200720 R11: 0720072007200720 R12: ffff8fb507c380b0                                                                                       
[    6.401174] R13: 0000000000000001 R14: 0000000000000000 R15: ffff8fb673866000                                                                                       
[    6.401174] FS:  00007f10e4939940(0000) GS:ffff8fb677a00000(0000) knlGS:0000000000000000                                                                            
[    6.401174] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                       
[    6.401174] CR2: 000056288f57f4c0 CR3: 0000800274e28000 CR4: 00000000003406f0                                                                                       
[    6.401174] Call Trace:
[    6.401174]  dma_direct_map_sg+0x64/0xb0                                                                                                                            
[    6.401174]  virtio_gpu_object_attach+0x1f1/0x230 [virtio_gpu]                                                                                                      
[    6.401174]  virtio_gpu_mode_dumb_create+0xb1/0xd0 [virtio_gpu]                                                                                                     
[    6.401174]  drm_client_framebuffer_create+0xb3/0x220 [drm]                                                                                                         
[    6.401174]  drm_fb_helper_generic_probe+0x4d/0x210 [drm_kms_helper]                                                                                                
[    6.401174]  __drm_fb_helper_initial_config_and_unlock+0x27e/0x540 [drm_kms_helper]                                                                                 
[    6.401174]  drm_fbdev_client_hotplug+0xed/0x170 [drm_kms_helper]                                                                                                   
[    6.401174]  drm_fbdev_generic_setup+0xa3/0x110 [drm_kms_helper]                                                                                                    
[    6.401174]  virtio_gpu_probe+0xe9/0x100 [virtio_gpu]                                                                                                               
[    6.401174]  virtio_dev_probe+0x170/0x230                                                                                                                           
[    6.401174]  driver_probe_device+0x12d/0x460                                                                                                                        
[    6.401174]  __driver_attach+0xe0/0x110                                                                                                                             
[    6.401174]  ? driver_probe_device+0x460/0x460                                                                                                                      
[    6.401174]  bus_for_each_dev+0x77/0xc0                                                                                                                             
[    6.401174]  ? klist_add_tail+0x57/0x70                                                                                                                             
[    6.401174]  bus_add_driver+0x155/0x230                                                                                                                             
[    6.401174]  ? 0xffffffffc05fc000                                                                                                                                   
[    6.401174]  driver_register+0x6b/0xb0                                                                                                                              
[    6.401174]  ? 0xffffffffc05fc000                                                                                                                                   
[    6.401174]  do_one_initcall+0x46/0x1c3                                                                                                                             
[    6.401174]  ? free_unref_page_commit+0x91/0x100                                                                                                                    
[    6.401174]  ? _cond_resched+0x15/0x30                                                                                                                              
[    6.401174]  ? kmem_cache_alloc_trace+0x151/0x1d0                                                                                                                   
[    6.401174]  do_init_module+0x5a/0x210                                                                                                                              
[    6.401174]  load_module+0x1440/0x17d0                                                                                                                              
[    6.401174]  ? __do_sys_init_module+0x13d/0x180                                                                                                                     
[    6.401174]  ? _cond_resched+0x15/0x30                                                                                                                              
[    6.401174]  __do_sys_init_module+0x13d/0x180                                                                                                                       
[    6.401174]  do_syscall_64+0x5b/0x1b0                                                                                                                               
[    6.401174]  entry_SYSCALL_64_after_hwframe+0x65/0xca                                                                                                               
[    6.401174] RIP: 0033:0x7f10e352deae                                                                                                                                
[    6.401174] Code: 48 8b 0d dd ff 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa ff 2b 00 f7 d8 64 89 01 48                                                                                                                        
[    6.401174] RSP: 002b:00007ffe7e9a6ef8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af                                                                                  
[    6.401174] RAX: ffffffffffffffda RBX: 000056288f591bd0 RCX: 00007f10e352deae                                                                                       
[    6.401174] RDX: 00007f10e409980d RSI: 0000000000020818 RDI: 000056288ff46e70                                                                                       
[    6.401174] RBP: 00007f10e409980d R08: 000056288f5890b0 R09: 0000000000000005                                                                                       
[    6.401174] R10: 0000000000000006 R11: 0000000000000246 R12: 000056288ff46e70                                                                                       
[    6.401174] R13: 000056288f585830 R14: 0000000000020000 R15: 0000000000000000                                                                                       
[    6.401174] ---[ end trace 54a2bb0b7387a604 ]---
[    6.844794] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.844799] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.845313] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.848662] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.850341] virtio_gpu virtio0: fb0: DRM emulated frame buffer device                                                                                               
[    6.862270] [drm] pci: virtio-gpu-pci detected at 0000:0a:00.0                                                                                                      
[    6.864260] [drm] virgl 3d acceleration not supported by host                                                                                                       
[    6.870313] [drm] number of scanouts: 1                                                                                                                             
[    6.871247] [drm] number of cap sets: 0                                                                                                                             
[    6.875634] [drm] Initialized virtio_gpu 0.1.0 0 for virtio9 on minor 1                                                                                             
[    6.877367] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.878964] virtio-pci 0000:0a:00.0: swiotlb buffer is full (sz: 2097152 bytes)                                                                                     
[    6.879849] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.886328] virtio_gpu virtio9: fb1: DRM emulated frame buffer device                                                                                               
[    6.891610] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.897540] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.899783] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.901890] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.904215] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.906725] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.909514] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)                                                                 
[    6.911828] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1203 (command 0x105)
....
(repeat..)

BR/
Zhiyi, Guo

Comment 55 Guo, Zhiyi 2019-07-17 07:18:58 UTC

Created attachment 1591309 [details]
VM xml

rhel 8.1 vm xml used in comment 54, configured virtio as both primary and secondary video device:
...
    <video>
      <driver iommu='on' ats='on'/>
      <model type='virtio' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <video>
      <driver iommu='on' ats='on'/>
      <model type='virtio' heads='1'/>
      <alias name='video1'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </video>
...

qtree info:
  dev: q35-pcihost, id ""
    MCFG = 2952790016 (0xb0000000)
    pci-hole64-size = 34359738368 (32 GiB)
    short_root_bus = 0 (0x0)
    below-4g-mem-size = 2147483648 (2 GiB)
    above-4g-mem-size = 6442450944 (6 GiB)
    x-pci-hole64-fix = true
    bus: pcie.0
      type PCIE
      dev: ich9-intel-hda, id "sound0"
        debug = 0 (0x0)
        msi = "auto"
        old_msi_addr = false
        addr = 1b.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class Audio controller, addr 00:1b.0, pci id 8086:293e (sub 1af4:1100)
        bar 0: mem at 0xc1c10000 [0xc1c13fff]
        bus: sound0.0
          type HDA
          dev: hda-duplex, id "sound0-codec0"
            debug = 0 (0x0)
            mixer = true
            cad = 0 (0x0)
      dev: virtio-vga, id "video0"
        ioeventfd = false
        vectors = 3 (0x3)
        virtio-pci-bus-master-bug-migration = false
        disable-legacy = "on"
        disable-modern = false
        migrate-extra = true
        modern-pio-notify = false
        x-disable-pcie = false
        page-per-vq = false
        x-ignore-backend-features = false
        ats = true
        x-pcie-deverr-init = true
        x-pcie-lnkctl-init = true
        x-pcie-pm-init = true
        addr = 01.0
        romfile = "vgabios-virtio.bin"
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class VGA controller, addr 00:01.0, pci id 1af4:1050 (sub 1af4:1100)
        bar 0: mem at 0xc0000000 [0xc07fffff]
        bar 2: mem at 0x800900000 [0x800903fff]
        bar 4: mem at 0xc1c1f000 [0xc1c1ffff]
        bar 6: mem at 0xffffffffffffffff [0xfffe]
        bus: virtio-bus
          type virtio-pci-bus
          dev: virtio-gpu-device, id ""
            max_outputs = 1 (0x1)
            max_hostmem = 268435456 (256 MiB)
            xres = 1024 (0x400)
            yres = 768 (0x300)
            indirect_desc = true
            event_idx = true
            notify_on_empty = true
            any_layout = true
            iommu_platform = true
      dev: pcie-root-port, id "pci.10"
        x-migrate-msix = true
        bus-reserve = 4294967295 (0xffffffff)
        io-reserve = 18446744073709551615 (16 EiB)
        mem-reserve = 18446744073709551615 (16 EiB)
        pref32-reserve = 18446744073709551615 (16 EiB)
        pref64-reserve = 18446744073709551615 (16 EiB)
        power_controller_present = true
        chassis = 10 (0xa)
        slot = 0 (0x0)
        port = 25 (0x19)
        aer_log_max = 8 (0x8)
        addr = 03.1
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        x-pcie-lnksta-dllla = true
        x-pcie-extcap-init = true
        class PCI bridge, addr 00:03.1, pci id 1b36:000c (sub 0008:0000)
        bar 0: mem at 0xc1c15000 [0xc1c15fff]
        bus: pci.10
          type PCIE
          dev: virtio-gpu-pci, id "video1"
            ioeventfd = false
            vectors = 3 (0x3)
            virtio-pci-bus-master-bug-migration = false
            disable-legacy = "on"
            disable-modern = false
            migrate-extra = true
            modern-pio-notify = false
            x-disable-pcie = false
            page-per-vq = false
            x-ignore-backend-features = false
            ats = true
            x-pcie-deverr-init = true
            x-pcie-lnkctl-init = true
            x-pcie-pm-init = true
            addr = 00.0
            romfile = ""
            rombar = 1 (0x1)
            multifunction = false
            command_serr_enable = true
            x-pcie-lnksta-dllla = true
            x-pcie-extcap-init = true
            class Display controller, addr 0a:00.0, pci id 1af4:1050 (sub 1af4:1100)
            bar 1: mem at 0xc0800000 [0xc0800fff]
            bar 4: mem at 0x800800000 [0x800803fff]
            bus: virtio-bus
              type virtio-pci-bus
              dev: virtio-gpu-device, id ""
                max_outputs = 1 (0x1)
                max_hostmem = 268435456 (256 MiB)
                xres = 1024 (0x400)
                yres = 768 (0x300)
                indirect_desc = true
                event_idx = true
                notify_on_empty = true
                any_layout = true
                iommu_platform = true

Comment 56 Guo, Zhiyi 2019-07-17 07:28:46 UTC

FailQA per comment 47 ~ 54

Comment 57 Guo, Zhiyi 2019-07-17 07:42:29 UTC

Configure qxl-vga as primary video device and virtio-gpu-pci as secondary video device:
...
    <video>
      <model type='qxl' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <video>
      <driver iommu='on' ats='on'/>
      <model type='virtio' heads='1'/>
      <alias name='video1'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </video>
...

And configure virtio-gpu-pci as fb console device:
#cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-115.el8.x86_64 root=/dev/mapper/rhel_bootp--73--225--25-root ro crashkernel=auto resume=/dev/mapper/rhel_bootp--73--225--25-swap rd.lvm.lv=rhel_bootp-73-225-25/root rd.lvm.lv=rhel_bootp-73-225-25/swap console=ttyS0,115200 fbcon=map:1

Kernel call trace and error messages can also be reproduced against virtio-gpu-pci.

Comment 58 Laszlo Ersek 2019-07-17 10:04:15 UTC

Zhiyi,

re: "After fixing the problem, I can boot sev guest + virito devices
with iommu enabled", from comment 54; that's great -- in my case, all
virtio devices *except* virtio-gpu-pci / virtio-vga seem to work fine.
The problem is specifically with virtio-gpu-pci / virtio-vga.

That said, I think it is still too early to move this BZ from ON_QA to
ASSIGNED (and to set FailedQA). That's because we simply don't have
enough information at this point -- our results are inconclusive in
*either* direction. We don't have evidence whether the qemu-kvm backport
is correct *or* incorrect.

We have *some* positive evidence (the VirtioGpuDxe driver in OVMF works
fine), so that suggests that the qemu-kvm backport is correct. However,
the Linux guest driver is a *lot* more sophisticated and demanding, so
that is the ultimate test for the host side backport.

Basically, I'm suggesting to move this BZ back to ON_QA status, and to
clear the FailedQA mark. In addition, file a separate RHBZ for the guest
kernel issue. Then, the guest kernel RHBZ should be marked as blocking
the present RHBZ.

Meanwhile I've contacted Tom & Brijesh @ AMD -- they've confirmed that
the guest kernel driver for virtio-gpu certainly worked at some point.
So, in the new guest kernel RHBZ to file, we should first establish a
working baseline -- because, I've been unable to do even that!, see my
comments above. Having a functional guest kernel baseline will then let
us do two things:

- bisect the guest kernel regression,
- validate the present BZ for good.

Thanks.

Comment 59 Laszlo Ersek 2019-07-17 10:08:47 UTC

Regarding the warning / stack trace in comment 54, that's *exactly* what my guest kernel patch in comment 51 / comment 52 fixes (written for the upstream kernel). With the patch applied, the error messages are gone. However, the display *still* doesn't work. That's the real problem: we have a deeper/worse problem in the guest kernel than simply requesting bounce buffers that are too large. When the bounce buffers are suitably sized, the display *still* doesn't work.

Comment 60 Guo, Zhiyi 2019-07-18 08:35:33 UTC

Move back to ON_QA and blocker is Bug 1731046

Comment 61 Brijesh Singh 2019-07-19 13:41:24 UTC

I was able to bisect the commit which caused the black screen

commit 55897af63091ebc2c3f239c6a6666f748113ac50 (HEAD, refs/bisect/bad)
Author: Christoph Hellwig <hch>
Date:   Mon Dec 3 11:43:54 2018 +0100

    dma-direct: merge swiotlb_dma_ops into the dma_direct code
    
    While the dma-direct code is (relatively) clean and simple we actually
    have to use the swiotlb ops for the mapping on many architectures due
    to devices with addressing limits.  Instead of keeping two
    implementations around this commit allows the dma-direct
    implementation to call the swiotlb bounce buffering functions and
    thus share the guts of the mapping implementation.  This also
    simplified the dma-mapping setup on a few architectures where we
    don't have to differenciate which implementation to use.
    
    Signed-off-by: Christoph Hellwig <hch>
    Acked-by: Jesper Dangaard Brouer <brouer>
    Tested-by: Jesper Dangaard Brouer <brouer>
    Tested-by: Tony Luck <tony.luck>

As we know, SEV needs swiotlb to bounce the virtio buffer, it appears that the above commit has introduced the regression when buffers are sync from cpu to device. Some other folks have also reported similar issue in non-sev environment (e.g 32-bit devices on the bare metal system). Optionally, you can reproduce the issue on non-sev guest if you pass "swiotlb=force" command line. This will force non-SEV guest to also use the swiotlb and you will get the same black screen.

Comment 62 Brijesh Singh 2019-07-19 14:01:22 UTC

Here is the patch which fixes the issue https://lkml.org/lkml/2019/7/19/672

Comment 63 Laszlo Ersek 2019-07-22 15:49:40 UTC

Thank you, Brijesh! I'll copy this information to bug 1731046.

Comment 64 Gerd Hoffmann 2019-08-01 11:43:02 UTC

> > commit 9821816e5f960b59232d23911556e7e2662ddc48
> > Author: Laszlo Ersek <lersek>
> > Date:   Sat Jul 13 18:47:05 2019 +0200
> >
> >     drm/virtio: limit each scatterlist node by virtio_max_dma_size()

What is the state here?  Does that help (together with comment 62 fix)?
Can you submit the patch upstream (looks sane, with the tyops fixed of course)?

Comment 65 Laszlo Ersek 2019-08-02 00:50:36 UTC

Hi Gerd,

the patch I hacked up didn't help in itself (comment 52).

I didn't test the patch linked by Brijesh in comment 62. Don has a backport queued for that (bug 1731046 comment 7), but even if it fixes the dma-direct regression in RHEL8, virtio-gpu still needs a backport that puts DMA to use in some more spots (bug 1731046 comment 6).

So basically I ran out of steam on this. The delta between RHEL8 and upstream is very large (meaning each of DMA and virtio-gpu, independently); I'm totally unfamiliar with these subsystems (I don't even know what DRM shorthands such as "TTM" stand for), and experimenting with bleeding edge upstream Linux ultimately exceeded my enthusiasm.

Comment 66 Guo, Zhiyi 2019-08-06 03:08:20 UTC

Hi Gerd,

   I suppose we also need a kernel bug to track the calltrace in comment 54? Seems Laszlo has done the investigation regarding this part in comment 51

BR/
Zhiyi

Comment 68 Gerd Hoffmann 2019-08-08 15:37:35 UTC

https://patchwork.freedesktop.org/patch/322726/

5.3-rc3 + that patch works with swiotlb=force.

Comment 69 Gerd Hoffmann 2019-08-08 15:39:18 UTC

>    I suppose we also need a kernel bug to track the calltrace in comment 54?

Yes.

Comment 70 Guo, Zhiyi 2019-08-09 05:37:04 UTC


(In reply to Gerd Hoffmann from comment #69)
> >    I suppose we also need a kernel bug to track the calltrace in comment 54?
> 
> Yes.

The fix has been tracked by Bug 1739291 - Kernel calltrace when using virtio-vga + sev

Comment 71 Guo, Zhiyi 2019-09-11 02:47:18 UTC

Per https://bugzilla.redhat.com/show_bug.cgi?id=1739291#c15, iommu of virtio-vga device works well now

Comment 72 Guo, Zhiyi 2019-09-11 02:49:03 UTC

Verified per comment 71

Comment 74 errata-xmlrpc 2019-11-05 20:46:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3345