Bug 1965145
| Summary: | [RHEL9] BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm] on AMD host | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | liunana <nanliu> |
| Component: | kernel | Assignee: | Vitaly Kuznetsov <vkuznets> |
| kernel sub component: | KVM | QA Contact: | liunana <nanliu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bdas, dgilbert, hkrzesin, nilal, pbonzini, virt-maint, vkuznets |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-5.14.0-14.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-17 15:38:18 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
liunana
2021-05-27 01:46:27 UTC
dmesg log: [root@amd-daytona-08 ~]# [ 806.530983] FS-Cache: Loaded [ 806.698971] FS-Cache: Netfs 'nfs' registered for caching [ 806.727147] Key type dns_resolver registered [ 807.013959] NFS: Registering the id_resolver key type [ 807.019062] Key type id_resolver registered [ 807.023265] Key type id_legacy registered [ 807.779459] mount.nfs (4334) used greatest stack depth: 21776 bytes left [ 829.319478] Bluetooth: Core ver 2.22 [ 829.323285] NET: Registered protocol family 31 [ 829.327752] Bluetooth: HCI device and connection manager initialized [ 829.334233] Bluetooth: HCI socket layer initialized [ 829.339130] Bluetooth: L2CAP socket layer initialized [ 829.344254] Bluetooth: SCO socket layer initialized [ 844.813454] tun: Universal TUN/TAP device driver, 1.6 [ 844.828593] switch: port 2(t0-HuClnA) entered blocking state [ 844.834288] switch: port 2(t0-HuClnA) entered disabled state [ 844.840440] device t0-HuClnA entered promiscuous mode [ 844.858586] switch: port 2(t0-HuClnA) entered blocking state [ 844.864296] switch: port 2(t0-HuClnA) entered forwarding state [ 919.564489] ================================================================== [ 919.572015] BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 919.580491] Read of size 8 at addr ffffc9001364f638 by task qemu-kvm/4798 [ 919.587279] [ 919.588780] CPU: 0 PID: 4798 Comm: qemu-kvm Tainted: G X --------- --- 5.12.0-1.el9.x86_64+debug #1 [ 919.599116] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0081C 07/13/2020 [ 919.607203] Call Trace: [ 919.609660] dump_stack+0xa5/0xe6 [ 919.612986] print_address_description.constprop.0+0x18/0x130 [ 919.618738] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 919.624433] __kasan_report.cold+0x7f/0x114 [ 919.628623] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 919.634316] kasan_report+0x38/0x50 [ 919.637806] kasan_check_range+0xf5/0x1d0 [ 919.641819] kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 919.647349] kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm] [ 919.653217] ? kvm_arch_exit+0x110/0x110 [kvm] [ 919.657694] ? sched_clock+0x5/0x10 [ 919.661196] ioapic_write_indirect+0x59f/0x9e0 [kvm] [ 919.666196] ? static_obj+0xc0/0xc0 [ 919.669692] ? __lock_acquired+0x1d2/0x8c0 [ 919.673790] ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm] [ 919.679422] ? __lock_contended+0x910/0x910 [ 919.683608] ? do_raw_spin_trylock+0xb5/0x180 [ 919.687974] ? ioapic_mmio_write+0xe9/0x1e0 [kvm] [ 919.692712] ioapic_mmio_write+0xff/0x1e0 [kvm] [ 919.697280] __kvm_io_bus_write+0x1d1/0x450 [kvm] [ 919.702018] ? check_prev_add+0x20f0/0x20f0 [ 919.706207] kvm_io_bus_write+0x105/0x1f0 [kvm] [ 919.710772] ? kvm_stat_data_get+0x380/0x380 [kvm] [ 919.715600] ? __lock_acquire+0xb69/0x18e0 [ 919.719705] write_mmio+0x13b/0x3a0 [kvm] [ 919.723761] emulator_read_write_onepage+0x168/0x470 [kvm] [ 919.729275] ? vcpu_mmio_gva_to_gpa+0x5e0/0x5e0 [kvm] [ 919.734364] ? decode_imm+0x7d0/0x7d0 [kvm] [ 919.738585] emulator_read_write+0x157/0x550 [kvm] [ 919.743409] ? decode_operand+0xb68/0x2cf0 [kvm] [ 919.748059] segmented_write.isra.0+0xc9/0x110 [kvm] [ 919.753055] ? segmented_read.isra.0+0x330/0x330 [kvm] [ 919.758228] writeback+0x6a7/0x8b0 [kvm] [ 919.762181] ? emulator_task_switch+0x2b0/0x2b0 [kvm] [ 919.767261] ? em_loop+0x530/0x530 [kvm] [ 919.771222] ? mmio_info_in_cache+0x32c/0x410 [kvm] [ 919.776139] x86_emulate_insn+0x1a0c/0x3cf0 [kvm] [ 919.780876] ? kvm_mmu_reset_context+0x20/0x20 [kvm] [ 919.785874] ? rcu_read_unlock+0x40/0x40 [ 919.789832] x86_emulate_instruction+0x5e5/0x1180 [kvm] [ 919.795096] vcpu_enter_guest+0x1ae5/0x39c0 [kvm] [ 919.799829] ? lock_acquire+0x1ca/0x490 [ 919.803671] ? kvm_vcpu_reload_apic_access_page+0x60/0x60 [kvm] [ 919.809631] ? rcu_read_unlock+0x40/0x40 [ 919.813557] ? mark_lock_irq+0x1d00/0x1d00 [ 919.817656] ? kvm_vcpu_ioctl+0x153/0xac0 [kvm] [ 919.822227] ? kvm_get_linear_rip+0x12c/0x260 [kvm] [ 919.827142] ? vcpu_run+0x144/0x7f0 [kvm] [ 919.831185] vcpu_run+0x144/0x7f0 [kvm] [ 919.835057] kvm_arch_vcpu_ioctl_run+0x23b/0xd10 [kvm] [ 919.840226] kvm_vcpu_ioctl+0x384/0xac0 [kvm] [ 919.844616] ? __lock_release+0x494/0xa40 [ 919.848632] ? install_new_memslots+0x270/0x270 [kvm] [ 919.853718] ? generic_block_fiemap+0x60/0x60 [ 919.858085] ? insert_inode_locked+0x1de/0x4f0 [ 919.862532] ? selinux_inode_getsecctx+0x80/0x80 [ 919.867162] ? __fget_files+0x1bf/0x2d0 [ 919.871007] __x64_sys_ioctl+0x127/0x190 [ 919.874935] do_syscall_64+0x33/0x40 [ 919.878522] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 919.883583] RIP: 0033:0x7fbf581dd21b [ 919.887164] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 25 bc 0c 00 f7 d8 64 89 01 48 [ 919.905906] RSP: 002b:00007fbf54a04588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 919.913475] RAX: ffffffffffffffda RBX: 0000560e95142520 RCX: 00007fbf581dd21b [ 919.920605] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [ 919.927738] RBP: 00007fbf58b22000 R08: 0000560e94830b90 R09: 00000000000000ff [ 919.934871] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001 [ 919.942006] R13: 0000000000000001 R14: 00000000000003f9 R15: 0000000000000000 [ 919.949150] [ 919.950646] [ 919.952144] addr ffffc9001364f638 is located in stack of task qemu-kvm/4798 at offset 40 in frame: [ 919.961097] ioapic_write_indirect+0x0/0x9e0 [kvm] [ 919.965924] [ 919.967424] this frame has 2 objects: [ 919.971091] [32, 40) 'vcpu_bitmap' [ 919.971094] [64, 88) 'irq' [ 919.974582] [ 919.978872] Memory state around the buggy address: [ 919.983665] ffffc9001364f500: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 [ 919.990884] ffffc9001364f580: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 919.998104] >ffffc9001364f600: 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 00 00 f3 f3 f3 [ 920.005323] ^ [ 920.010376] ffffc9001364f680: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 920.017604] ffffc9001364f700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 920.024848] ================================================================== [ 920.032068] Disabling lock debugging due to kernel taint Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage. I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. I see no trace launching a guest. Is there a specific guest config that causes this ? Could you also try a more recent build ? (In reply to Bandan Das from comment #4) > I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on > dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. > I see no trace launching a guest. It occurs in the machine's use, I didn't meet the issue at first while doing my test. But when I met it once, it is easy to reproduce. Is there a specific guest config that > causes this ? Could you also try a more recent build ? Ok, I will try it again using the recent build, and will update the result. Best regards Liu Nana (In reply to liunana from comment #5) > (In reply to Bandan Das from comment #4) > > I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on > > dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. > > I see no trace launching a guest. > > It occurs in the machine's use, I didn't meet the issue at first while doing > my test. > But when I met it once, it is easy to reproduce. > > > Is there a specific guest config that > > causes this ? Could you also try a more recent build ? > > Ok, I will try it again using the recent build, and will update the result. > > > > > Best regards > Liu Nana Liu: Just a guess, but how big are your VMs? What's the command line you're using for the qemu? (In reply to Dr. David Alan Gilbert from comment #6) > (In reply to liunana from comment #5) > > (In reply to Bandan Das from comment #4) > > > I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on > > > dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. > > > I see no trace launching a guest. > > > > It occurs in the machine's use, I didn't meet the issue at first while doing > > my test. > > But when I met it once, it is easy to reproduce. > > > > > > Is there a specific guest config that > > > causes this ? Could you also try a more recent build ? > > > > Ok, I will try it again using the recent build, and will update the result. > > > > > > > > > > Best regards > > Liu Nana > > Liu: Just a guess, but how big are your VMs? What's the command line you're > using for the qemu? About 6 VMs, including windows guests and RHEL guests. I installed them with avocado automatically. And I still can reproduce this bug with the latest kernel at the first VM installation (RHEL.9.0) this time. Please check the command line follows 'QEMU command line [1]'. And the guest is installed successfully. Test Environments: amd-milan-07.khw1.lab.eng.bos.redhat.com 5.14.0-0.rc4.35.el9.x86_64+debug qemu-kvm-6.0.0-10.el9.x86_64 QEMU command line [1] /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc,memory-backend=mem-machine_mem \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 105472 \ -object memory-backend-ram,size=105472M,id=mem-machine_mem \ -smp 128,maxcpus=128,cores=64,threads=1,dies=1,sockets=2 \ -cpu 'EPYC-Milan',+kvm_pv_unhalt \ -chardev socket,path=/tmp/avocado_w2tu9r_3/monitor-qmpmonitor1-20210803-225107-89L3F5Lz,wait=off,server=on,id=qmp_id_qmpmonitor1 \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,path=/tmp/avocado_w2tu9r_3/monitor-catch_monitor-20210803-225107-89L3F5Lz,wait=off,server=on,id=qmp_id_catch_monitor \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id6fT7W7 \ -chardev socket,path=/tmp/avocado_w2tu9r_3/serial-serial0-20210803-225107-89L3F5Lz,wait=off,server=on,id=chardev_serial0 \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20210803-225107-89L3F5Lz,path=/tmp/avocado_w2tu9r_3/seabios-20210803-225107-89L3F5Lz,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20210803-225107-89L3F5Lz,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=0x1d.0x7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0x0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.0x2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.0x4,firstport=4,bus=pci.0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-pci,mac=9a:3a:9a:e5:09:d4,id=idEsbHMX,netdev=idznZlxd,bus=pci.0,addr=0x4 \ -netdev tap,id=idznZlxd,vhost=on,vhostfd=18,fd=15 \ -blockdev node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/iso/linux/RHEL-9.0.0-20210707.2-x86_64-dvd1.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_cd1 \ -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ -blockdev node-name=file_unattended,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64/ks.iso,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_unattended,driver=raw,read-only=on,cache.direct=on,cache.no-flush=off,file=file_unattended \ -device scsi-cd,id=unattended,drive=drive_unattended,write-cache=on \ -kernel '/home/kvm_autotest_root/images/rhel900-64/vmlinuz' \ -append 'inst.sshd ksdevice=link inst.repo=cdrom inst.ks=cdrom:/ks.cfg nicdelay=60 biosdevname=0 net.ifnames=0 console=ttyS0,115200 console=tty0' \ -initrd '/home/kvm_autotest_root/images/rhel900-64/initrd.img' \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=d,strict=off \ -no-shutdown \ -enable-kvm error log: [ 671.917354] ================================================================== [ 671.924778] BUG: KASAN: stack-out-of-bounds in kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 671.933976] Read of size 8 at addr ffffc90010fe75e0 by task qemu-kvm/4844 [ 671.940759] [ 671.942262] CPU: 58 PID: 4844 Comm: qemu-kvm Not tainted 5.14.0-0.rc4.35.el9.x86_64+debug #1 [ 671.950700] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RYM0092C 11/03/2020 [ 671.958787] Call Trace: [ 671.961241] dump_stack_lvl+0x57/0x7d [ 671.965922] print_address_description.constprop.0+0x1f/0x140 [ 671.971674] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 671.977369] __kasan_report.cold+0x7f/0x11e [ 671.981559] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 671.987251] kasan_report+0x38/0x50 [ 671.990741] kasan_check_range+0xf5/0x1d0 [ 671.994754] kvm_make_vcpus_request_mask+0x174/0x440 [kvm] [ 672.001289] kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm] [ 672.007165] ? inject_pending_event+0x1080/0x1080 [kvm] [ 672.012421] ioapic_write_indirect+0x59f/0x9e0 [kvm] [ 672.017414] ? static_obj+0x40/0xc0 [ 672.020911] ? __lock_acquired+0x1d2/0x8c0 [ 672.025009] ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm] [ 672.031612] ? __lock_contended+0x910/0x910 [ 672.035798] ? do_raw_spin_trylock+0xb5/0x180 [ 672.040163] ? ioapic_mmio_write+0xe9/0x1e0 [kvm] [ 672.044902] ioapic_mmio_write+0xff/0x1e0 [kvm] [ 672.049468] __kvm_io_bus_write+0x1d1/0x450 [kvm] [ 672.054203] kvm_io_bus_write+0xfe/0x1d0 [kvm] [ 672.058677] ? check_prev_add+0x20f0/0x20f0 [ 672.063549] ? __bpf_trace_kvm_test_age_hva+0xb0/0xb0 [kvm] [ 672.069166] write_mmio+0x13b/0x3a0 [kvm] [ 672.073218] emulator_read_write_onepage+0x167/0x4b0 [kvm] [ 672.078736] ? vcpu_mmio_gva_to_gpa+0x5b0/0x5b0 [kvm] [ 672.083811] ? decode_register+0xf1/0x400 [kvm] [ 672.088369] ? fetch_possible_mmx_operand.part.0+0x120/0x120 [kvm] [ 672.095501] emulator_read_write+0x157/0x550 [kvm] [ 672.100331] ? decode_operand+0x9a9/0x2920 [kvm] [ 672.104996] segmented_write.isra.0+0xc9/0x110 [kvm] [ 672.109993] ? segmented_read.isra.0+0x380/0x380 [kvm] [ 672.115165] writeback+0x6a5/0x8c0 [kvm] [ 672.119119] ? emulator_task_switch+0x2b0/0x2b0 [kvm] [ 672.124196] ? em_rdmsr+0x420/0x420 [kvm] [ 672.129147] x86_emulate_insn+0x1a0c/0x3cf0 [kvm] [ 672.133888] ? ept_invlpg+0xc0/0xc0 [kvm] [ 672.137932] ? rcu_read_unlock+0x40/0x40 [ 672.141864] x86_emulate_instruction+0x5e5/0x1190 [kvm] [ 672.147129] vcpu_enter_guest+0x1af3/0x3ac0 [kvm] [ 672.151861] ? lock_acquire+0x1ca/0x570 [ 672.155702] ? kvm_vcpu_reload_apic_access_page+0x50/0x50 [kvm] [ 672.162530] ? rcu_read_unlock+0x40/0x40 [ 672.166453] ? mark_lock_irq+0xda0/0xda0 [ 672.170371] ? __mutex_lock+0xb77/0x1170 [ 672.174298] ? mark_lock+0xd3/0xae0 [ 672.177794] ? kvm_get_linear_rip+0x12c/0x260 [kvm] [ 672.182710] ? vcpu_run+0x144/0x7f0 [kvm] [ 672.186751] vcpu_run+0x144/0x7f0 [kvm] [ 672.191567] kvm_arch_vcpu_ioctl_run+0x23d/0xf40 [kvm] [ 672.196854] kvm_vcpu_ioctl+0x42c/0xb20 [kvm] [ 672.201243] ? __bpf_trace_kvm_age_hva+0xe0/0xe0 [kvm] [ 672.206410] ? __lock_release+0x494/0xa40 [ 672.210424] ? lock_downgrade+0x110/0x110 [ 672.214432] ? __lock_contended+0x4de/0x910 [ 672.218620] ? selinux_inode_getsecctx+0x80/0x80 [ 672.224286] ? lock_acquire+0x80/0x570 [ 672.228044] ? __fget_files+0x189/0x2f0 [ 672.231887] ? security_file_ioctl+0x50/0x90 [ 672.236164] __x64_sys_ioctl+0x127/0x190 [ 672.240090] do_syscall_64+0x3b/0x90 [ 672.243666] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 672.248719] RIP: 0033:0x7f648c3253eb [ 672.252297] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 2a 0f 00 f7 d8 64 89 01 48 [ 672.271989] RSP: 002b:00007f64889f3548 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 672.279555] RAX: ffffffffffffffda RBX: 00005613af6071e0 RCX: 00007f648c3253eb [ 672.286688] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [ 672.294719] RBP: 00007f648cc06000 R08: 00005613acedd210 R09: 00000000000000ff [ 672.301852] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001 [ 672.308984] R13: 0000000000000001 R14: 00000000000003f9 R15: 0000000000000000 [ 672.316124] [ 672.317616] [ 672.319109] addr ffffc90010fe75e0 is located in stack of task qemu-kvm/4844 at offset 40 in frame: [ 672.328781] ioapic_write_indirect+0x0/0x9e0 [kvm] [ 672.333609] [ 672.335106] this frame has 2 objects: [ 672.338774] [32, 40) 'vcpu_bitmap' [ 672.338776] [64, 88) 'irq' [ 672.342265] [ 672.346547] Memory state around the buggy address: [ 672.351341] ffffc90010fe7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 [ 672.359382] ffffc90010fe7500: f1 f1 f1 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 [ 672.366600] >ffffc90010fe7580: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 [ 672.373821] ^ [ 672.380171] ffffc90010fe7600: 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 [ 672.388164] ffffc90010fe7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 672.395381] ================================================================== [ 672.402595] Disabling lock debugging due to kernel taint [ 673.811651] hrtimer: interrupt took 722589 ns Host kernel line info: # cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-0.rc4.35.el9.x86_64+debug root=/dev/mapper/rhel_amd--milan--07-root ro resume=/dev/mapper/rhel_amd--milan--07-swap rd.lvm.lv=rhel_amd-milan-07/root rd.lvm.lv=rhel_amd-milan-07/swap console=ttyS0,115200n81 crashkernel=auto Hi, would you please help to check this? Thanks. Best regards Liu Nana (In reply to liunana from comment #7) > (In reply to Dr. David Alan Gilbert from comment #6) > > (In reply to liunana from comment #5) > > > (In reply to Bandan Das from comment #4) > > > > I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on > > > > dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. > > > > I see no trace launching a guest. > > > > > > It occurs in the machine's use, I didn't meet the issue at first while doing > > > my test. > > > But when I met it once, it is easy to reproduce. > > > > > > > > > Is there a specific guest config that > > > > causes this ? Could you also try a more recent build ? > > > > > > Ok, I will try it again using the recent build, and will update the result. > > > > > > > > > > > > > > > Best regards > > > Liu Nana > > > > Liu: Just a guess, but how big are your VMs? What's the command line you're > > using for the qemu? > > > About 6 VMs, including windows guests and RHEL guests. I installed them with > avocado automatically. > Thanks for confirming that it's still there. I need to reproduce this on my setup. Can you either - give me instructions on how you set this up using avocado ? Or - Give me setup instructions using qemu ? I know you posted the qemu command line but is that enough to reproduce ? Do I have to run 6 guests to reproduce this ? Should the guests be idle or should I run something for the trace to occur ? Dave, you asked about large guests. Are you aware of any known issue with Zen3 with the address sanitizer ? > And I still can reproduce this bug with the latest kernel at the first VM > installation (RHEL.9.0) this time. Please check the command line follows > 'QEMU command line [1]'. And the guest is installed successfully. > > > Test Environments: > amd-milan-07.khw1.lab.eng.bos.redhat.com > 5.14.0-0.rc4.35.el9.x86_64+debug > qemu-kvm-6.0.0-10.el9.x86_64 > > > QEMU command line [1] > /usr/libexec/qemu-kvm \ > -S \ > -name 'avocado-vt-vm1' \ > -sandbox on \ > -machine pc,memory-backend=mem-machine_mem \ > -nodefaults \ > -device VGA,bus=pci.0,addr=0x2 \ > -m 105472 \ > -object memory-backend-ram,size=105472M,id=mem-machine_mem \ > -smp 128,maxcpus=128,cores=64,threads=1,dies=1,sockets=2 \ > -cpu 'EPYC-Milan',+kvm_pv_unhalt \ > -chardev > socket,path=/tmp/avocado_w2tu9r_3/monitor-qmpmonitor1-20210803-225107- > 89L3F5Lz,wait=off,server=on,id=qmp_id_qmpmonitor1 \ > -mon chardev=qmp_id_qmpmonitor1,mode=control \ > -chardev > socket,path=/tmp/avocado_w2tu9r_3/monitor-catch_monitor-20210803-225107- > 89L3F5Lz,wait=off,server=on,id=qmp_id_catch_monitor \ > -mon chardev=qmp_id_catch_monitor,mode=control \ > -device pvpanic,ioport=0x505,id=id6fT7W7 \ > -chardev > socket,path=/tmp/avocado_w2tu9r_3/serial-serial0-20210803-225107-89L3F5Lz, > wait=off,server=on,id=chardev_serial0 \ > -device isa-serial,id=serial0,chardev=chardev_serial0 \ > -chardev > socket,id=seabioslog_id_20210803-225107-89L3F5Lz,path=/tmp/avocado_w2tu9r_3/ > seabios-20210803-225107-89L3F5Lz,server=on,wait=off \ > -device > isa-debugcon,chardev=seabioslog_id_20210803-225107-89L3F5Lz,iobase=0x402 \ > -device ich9-usb-ehci1,id=usb1,addr=0x1d.0x7,multifunction=on,bus=pci.0 \ > -device > ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0x0, > firstport=0,bus=pci.0 \ > -device > ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.0x2, > firstport=2,bus=pci.0 \ > -device > ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.0x4, > firstport=4,bus=pci.0 \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \ > -blockdev > node-name=file_image1,driver=file,auto-read-only=on,discard=unmap, > aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi. > qcow2,cache.direct=on,cache.no-flush=off \ > -blockdev > node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no- > flush=off,file=file_image1 \ > -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ > -device > virtio-net-pci,mac=9a:3a:9a:e5:09:d4,id=idEsbHMX,netdev=idznZlxd,bus=pci.0, > addr=0x4 \ > -netdev tap,id=idznZlxd,vhost=on,vhostfd=18,fd=15 \ > -blockdev > node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads, > filename=/home/kvm_autotest_root/iso/linux/RHEL-9.0.0-20210707.2-x86_64-dvd1. > iso,cache.direct=on,cache.no-flush=off \ > -blockdev > node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no- > flush=off,file=file_cd1 \ > -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ > -blockdev > node-name=file_unattended,driver=file,auto-read-only=on,discard=unmap, > aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64/ks.iso,cache. > direct=on,cache.no-flush=off \ > -blockdev > node-name=drive_unattended,driver=raw,read-only=on,cache.direct=on,cache.no- > flush=off,file=file_unattended \ > -device scsi-cd,id=unattended,drive=drive_unattended,write-cache=on \ > -kernel '/home/kvm_autotest_root/images/rhel900-64/vmlinuz' \ > -append 'inst.sshd ksdevice=link inst.repo=cdrom inst.ks=cdrom:/ks.cfg > nicdelay=60 biosdevname=0 net.ifnames=0 console=ttyS0,115200 console=tty0' \ > -initrd '/home/kvm_autotest_root/images/rhel900-64/initrd.img' \ > -vnc :0 \ > -rtc base=utc,clock=host,driftfix=slew \ > -boot menu=off,order=cdn,once=d,strict=off \ > -no-shutdown \ > -enable-kvm > > > > error log: > > [ 671.917354] > ================================================================== > [ 671.924778] BUG: KASAN: stack-out-of-bounds in > kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > [ 671.933976] Read of size 8 at addr ffffc90010fe75e0 by task qemu-kvm/4844 > [ 671.940759] > [ 671.942262] CPU: 58 PID: 4844 Comm: qemu-kvm Not tainted > 5.14.0-0.rc4.35.el9.x86_64+debug #1 > [ 671.950700] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS > RYM0092C 11/03/2020 > [ 671.958787] Call Trace: > [ 671.961241] dump_stack_lvl+0x57/0x7d > [ 671.965922] print_address_description.constprop.0+0x1f/0x140 > [ 671.971674] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > [ 671.977369] __kasan_report.cold+0x7f/0x11e > [ 671.981559] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > [ 671.987251] kasan_report+0x38/0x50 > [ 671.990741] kasan_check_range+0xf5/0x1d0 > [ 671.994754] kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > [ 672.001289] kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm] > [ 672.007165] ? inject_pending_event+0x1080/0x1080 [kvm] > [ 672.012421] ioapic_write_indirect+0x59f/0x9e0 [kvm] > [ 672.017414] ? static_obj+0x40/0xc0 > [ 672.020911] ? __lock_acquired+0x1d2/0x8c0 > [ 672.025009] ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm] > [ 672.031612] ? __lock_contended+0x910/0x910 > [ 672.035798] ? do_raw_spin_trylock+0xb5/0x180 > [ 672.040163] ? ioapic_mmio_write+0xe9/0x1e0 [kvm] > [ 672.044902] ioapic_mmio_write+0xff/0x1e0 [kvm] > [ 672.049468] __kvm_io_bus_write+0x1d1/0x450 [kvm] > [ 672.054203] kvm_io_bus_write+0xfe/0x1d0 [kvm] > [ 672.058677] ? check_prev_add+0x20f0/0x20f0 > [ 672.063549] ? __bpf_trace_kvm_test_age_hva+0xb0/0xb0 [kvm] > [ 672.069166] write_mmio+0x13b/0x3a0 [kvm] > [ 672.073218] emulator_read_write_onepage+0x167/0x4b0 [kvm] > [ 672.078736] ? vcpu_mmio_gva_to_gpa+0x5b0/0x5b0 [kvm] > [ 672.083811] ? decode_register+0xf1/0x400 [kvm] > [ 672.088369] ? fetch_possible_mmx_operand.part.0+0x120/0x120 [kvm] > [ 672.095501] emulator_read_write+0x157/0x550 [kvm] > [ 672.100331] ? decode_operand+0x9a9/0x2920 [kvm] > [ 672.104996] segmented_write.isra.0+0xc9/0x110 [kvm] > [ 672.109993] ? segmented_read.isra.0+0x380/0x380 [kvm] > [ 672.115165] writeback+0x6a5/0x8c0 [kvm] > [ 672.119119] ? emulator_task_switch+0x2b0/0x2b0 [kvm] > [ 672.124196] ? em_rdmsr+0x420/0x420 [kvm] > [ 672.129147] x86_emulate_insn+0x1a0c/0x3cf0 [kvm] > [ 672.133888] ? ept_invlpg+0xc0/0xc0 [kvm] > [ 672.137932] ? rcu_read_unlock+0x40/0x40 > [ 672.141864] x86_emulate_instruction+0x5e5/0x1190 [kvm] > [ 672.147129] vcpu_enter_guest+0x1af3/0x3ac0 [kvm] > [ 672.151861] ? lock_acquire+0x1ca/0x570 > [ 672.155702] ? kvm_vcpu_reload_apic_access_page+0x50/0x50 [kvm] > [ 672.162530] ? rcu_read_unlock+0x40/0x40 > [ 672.166453] ? mark_lock_irq+0xda0/0xda0 > [ 672.170371] ? __mutex_lock+0xb77/0x1170 > [ 672.174298] ? mark_lock+0xd3/0xae0 > [ 672.177794] ? kvm_get_linear_rip+0x12c/0x260 [kvm] > [ 672.182710] ? vcpu_run+0x144/0x7f0 [kvm] > [ 672.186751] vcpu_run+0x144/0x7f0 [kvm] > [ 672.191567] kvm_arch_vcpu_ioctl_run+0x23d/0xf40 [kvm] > [ 672.196854] kvm_vcpu_ioctl+0x42c/0xb20 [kvm] > [ 672.201243] ? __bpf_trace_kvm_age_hva+0xe0/0xe0 [kvm] > [ 672.206410] ? __lock_release+0x494/0xa40 > [ 672.210424] ? lock_downgrade+0x110/0x110 > [ 672.214432] ? __lock_contended+0x4de/0x910 > [ 672.218620] ? selinux_inode_getsecctx+0x80/0x80 > [ 672.224286] ? lock_acquire+0x80/0x570 > [ 672.228044] ? __fget_files+0x189/0x2f0 > [ 672.231887] ? security_file_ioctl+0x50/0x90 > [ 672.236164] __x64_sys_ioctl+0x127/0x190 > [ 672.240090] do_syscall_64+0x3b/0x90 > [ 672.243666] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 672.248719] RIP: 0033:0x7f648c3253eb > [ 672.252297] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 > e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> > 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 2a 0f 00 f7 d8 64 89 01 48 > [ 672.271989] RSP: 002b:00007f64889f3548 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 672.279555] RAX: ffffffffffffffda RBX: 00005613af6071e0 RCX: > 00007f648c3253eb > [ 672.286688] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: > 0000000000000019 > [ 672.294719] RBP: 00007f648cc06000 R08: 00005613acedd210 R09: > 00000000000000ff > [ 672.301852] R10: 0000000000000001 R11: 0000000000000246 R12: > 0000000000000001 > [ 672.308984] R13: 0000000000000001 R14: 00000000000003f9 R15: > 0000000000000000 > [ 672.316124] > [ 672.317616] > [ 672.319109] addr ffffc90010fe75e0 is located in stack of task > qemu-kvm/4844 at offset 40 in frame: > [ 672.328781] ioapic_write_indirect+0x0/0x9e0 [kvm] > [ 672.333609] > [ 672.335106] this frame has 2 objects: > [ 672.338774] [32, 40) 'vcpu_bitmap' > [ 672.338776] [64, 88) 'irq' > [ 672.342265] > [ 672.346547] Memory state around the buggy address: > [ 672.351341] ffffc90010fe7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 f1 > [ 672.359382] ffffc90010fe7500: f1 f1 f1 00 f3 f3 f3 00 00 00 00 00 00 00 > 00 00 > [ 672.366600] >ffffc90010fe7580: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 > f2 00 > [ 672.373821] ^ > [ 672.380171] ffffc90010fe7600: 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 > 00 00 > [ 672.388164] ffffc90010fe7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [ 672.395381] > ================================================================== > [ 672.402595] Disabling lock debugging due to kernel taint > [ 673.811651] hrtimer: interrupt took 722589 ns > > > > Host kernel line info: > # cat /proc/cmdline > BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-0.rc4.35.el9.x86_64+debug > root=/dev/mapper/rhel_amd--milan--07-root ro > resume=/dev/mapper/rhel_amd--milan--07-swap rd.lvm.lv=rhel_amd-milan-07/root > rd.lvm.lv=rhel_amd-milan-07/swap console=ttyS0,115200n81 crashkernel=auto > > > > Hi, would you please help to check this? Thanks. > > > > Best regards > Liu Nana (In reply to Bandan Das from comment #8) > Dave, you asked about large guests. Are you aware of any known issue with > Zen3 with the address > sanitizer ? No I'm not; but since the backtrace showed 'kvm_make_vcpus_request_mask' I wondered if that mask was dependent on number of vcpus and something was going wrong in the size calculations or numbering there. Dave > > > And I still can reproduce this bug with the latest kernel at the first VM > > installation (RHEL.9.0) this time. Please check the command line follows > > 'QEMU command line [1]'. And the guest is installed successfully. > > > > > > Test Environments: > > amd-milan-07.khw1.lab.eng.bos.redhat.com > > 5.14.0-0.rc4.35.el9.x86_64+debug > > qemu-kvm-6.0.0-10.el9.x86_64 > > > > > > QEMU command line [1] > > /usr/libexec/qemu-kvm \ > > -S \ > > -name 'avocado-vt-vm1' \ > > -sandbox on \ > > -machine pc,memory-backend=mem-machine_mem \ > > -nodefaults \ > > -device VGA,bus=pci.0,addr=0x2 \ > > -m 105472 \ > > -object memory-backend-ram,size=105472M,id=mem-machine_mem \ > > -smp 128,maxcpus=128,cores=64,threads=1,dies=1,sockets=2 \ > > -cpu 'EPYC-Milan',+kvm_pv_unhalt \ > > -chardev > > socket,path=/tmp/avocado_w2tu9r_3/monitor-qmpmonitor1-20210803-225107- > > 89L3F5Lz,wait=off,server=on,id=qmp_id_qmpmonitor1 \ > > -mon chardev=qmp_id_qmpmonitor1,mode=control \ > > -chardev > > socket,path=/tmp/avocado_w2tu9r_3/monitor-catch_monitor-20210803-225107- > > 89L3F5Lz,wait=off,server=on,id=qmp_id_catch_monitor \ > > -mon chardev=qmp_id_catch_monitor,mode=control \ > > -device pvpanic,ioport=0x505,id=id6fT7W7 \ > > -chardev > > socket,path=/tmp/avocado_w2tu9r_3/serial-serial0-20210803-225107-89L3F5Lz, > > wait=off,server=on,id=chardev_serial0 \ > > -device isa-serial,id=serial0,chardev=chardev_serial0 \ > > -chardev > > socket,id=seabioslog_id_20210803-225107-89L3F5Lz,path=/tmp/avocado_w2tu9r_3/ > > seabios-20210803-225107-89L3F5Lz,server=on,wait=off \ > > -device > > isa-debugcon,chardev=seabioslog_id_20210803-225107-89L3F5Lz,iobase=0x402 \ > > -device ich9-usb-ehci1,id=usb1,addr=0x1d.0x7,multifunction=on,bus=pci.0 \ > > -device > > ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0x0, > > firstport=0,bus=pci.0 \ > > -device > > ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.0x2, > > firstport=2,bus=pci.0 \ > > -device > > ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.0x4, > > firstport=4,bus=pci.0 \ > > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \ > > -blockdev > > node-name=file_image1,driver=file,auto-read-only=on,discard=unmap, > > aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi. > > qcow2,cache.direct=on,cache.no-flush=off \ > > -blockdev > > node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no- > > flush=off,file=file_image1 \ > > -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ > > -device > > virtio-net-pci,mac=9a:3a:9a:e5:09:d4,id=idEsbHMX,netdev=idznZlxd,bus=pci.0, > > addr=0x4 \ > > -netdev tap,id=idznZlxd,vhost=on,vhostfd=18,fd=15 \ > > -blockdev > > node-name=file_cd1,driver=file,auto-read-only=on,discard=unmap,aio=threads, > > filename=/home/kvm_autotest_root/iso/linux/RHEL-9.0.0-20210707.2-x86_64-dvd1. > > iso,cache.direct=on,cache.no-flush=off \ > > -blockdev > > node-name=drive_cd1,driver=raw,read-only=on,cache.direct=on,cache.no- > > flush=off,file=file_cd1 \ > > -device scsi-cd,id=cd1,drive=drive_cd1,write-cache=on \ > > -blockdev > > node-name=file_unattended,driver=file,auto-read-only=on,discard=unmap, > > aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64/ks.iso,cache. > > direct=on,cache.no-flush=off \ > > -blockdev > > node-name=drive_unattended,driver=raw,read-only=on,cache.direct=on,cache.no- > > flush=off,file=file_unattended \ > > -device scsi-cd,id=unattended,drive=drive_unattended,write-cache=on \ > > -kernel '/home/kvm_autotest_root/images/rhel900-64/vmlinuz' \ > > -append 'inst.sshd ksdevice=link inst.repo=cdrom inst.ks=cdrom:/ks.cfg > > nicdelay=60 biosdevname=0 net.ifnames=0 console=ttyS0,115200 console=tty0' \ > > -initrd '/home/kvm_autotest_root/images/rhel900-64/initrd.img' \ > > -vnc :0 \ > > -rtc base=utc,clock=host,driftfix=slew \ > > -boot menu=off,order=cdn,once=d,strict=off \ > > -no-shutdown \ > > -enable-kvm > > > > > > > > error log: > > > > [ 671.917354] > > ================================================================== > > [ 671.924778] BUG: KASAN: stack-out-of-bounds in > > kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > > [ 671.933976] Read of size 8 at addr ffffc90010fe75e0 by task qemu-kvm/4844 > > [ 671.940759] > > [ 671.942262] CPU: 58 PID: 4844 Comm: qemu-kvm Not tainted > > 5.14.0-0.rc4.35.el9.x86_64+debug #1 > > [ 671.950700] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS > > RYM0092C 11/03/2020 > > [ 671.958787] Call Trace: > > [ 671.961241] dump_stack_lvl+0x57/0x7d > > [ 671.965922] print_address_description.constprop.0+0x1f/0x140 > > [ 671.971674] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > > [ 671.977369] __kasan_report.cold+0x7f/0x11e > > [ 671.981559] ? kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > > [ 671.987251] kasan_report+0x38/0x50 > > [ 671.990741] kasan_check_range+0xf5/0x1d0 > > [ 671.994754] kvm_make_vcpus_request_mask+0x174/0x440 [kvm] > > [ 672.001289] kvm_make_scan_ioapic_request_mask+0x84/0xc0 [kvm] > > [ 672.007165] ? inject_pending_event+0x1080/0x1080 [kvm] > > [ 672.012421] ioapic_write_indirect+0x59f/0x9e0 [kvm] > > [ 672.017414] ? static_obj+0x40/0xc0 > > [ 672.020911] ? __lock_acquired+0x1d2/0x8c0 > > [ 672.025009] ? kvm_ioapic_eoi_inject_work+0x120/0x120 [kvm] > > [ 672.031612] ? __lock_contended+0x910/0x910 > > [ 672.035798] ? do_raw_spin_trylock+0xb5/0x180 > > [ 672.040163] ? ioapic_mmio_write+0xe9/0x1e0 [kvm] > > [ 672.044902] ioapic_mmio_write+0xff/0x1e0 [kvm] > > [ 672.049468] __kvm_io_bus_write+0x1d1/0x450 [kvm] > > [ 672.054203] kvm_io_bus_write+0xfe/0x1d0 [kvm] > > [ 672.058677] ? check_prev_add+0x20f0/0x20f0 > > [ 672.063549] ? __bpf_trace_kvm_test_age_hva+0xb0/0xb0 [kvm] > > [ 672.069166] write_mmio+0x13b/0x3a0 [kvm] > > [ 672.073218] emulator_read_write_onepage+0x167/0x4b0 [kvm] > > [ 672.078736] ? vcpu_mmio_gva_to_gpa+0x5b0/0x5b0 [kvm] > > [ 672.083811] ? decode_register+0xf1/0x400 [kvm] > > [ 672.088369] ? fetch_possible_mmx_operand.part.0+0x120/0x120 [kvm] > > [ 672.095501] emulator_read_write+0x157/0x550 [kvm] > > [ 672.100331] ? decode_operand+0x9a9/0x2920 [kvm] > > [ 672.104996] segmented_write.isra.0+0xc9/0x110 [kvm] > > [ 672.109993] ? segmented_read.isra.0+0x380/0x380 [kvm] > > [ 672.115165] writeback+0x6a5/0x8c0 [kvm] > > [ 672.119119] ? emulator_task_switch+0x2b0/0x2b0 [kvm] > > [ 672.124196] ? em_rdmsr+0x420/0x420 [kvm] > > [ 672.129147] x86_emulate_insn+0x1a0c/0x3cf0 [kvm] > > [ 672.133888] ? ept_invlpg+0xc0/0xc0 [kvm] > > [ 672.137932] ? rcu_read_unlock+0x40/0x40 > > [ 672.141864] x86_emulate_instruction+0x5e5/0x1190 [kvm] > > [ 672.147129] vcpu_enter_guest+0x1af3/0x3ac0 [kvm] > > [ 672.151861] ? lock_acquire+0x1ca/0x570 > > [ 672.155702] ? kvm_vcpu_reload_apic_access_page+0x50/0x50 [kvm] > > [ 672.162530] ? rcu_read_unlock+0x40/0x40 > > [ 672.166453] ? mark_lock_irq+0xda0/0xda0 > > [ 672.170371] ? __mutex_lock+0xb77/0x1170 > > [ 672.174298] ? mark_lock+0xd3/0xae0 > > [ 672.177794] ? kvm_get_linear_rip+0x12c/0x260 [kvm] > > [ 672.182710] ? vcpu_run+0x144/0x7f0 [kvm] > > [ 672.186751] vcpu_run+0x144/0x7f0 [kvm] > > [ 672.191567] kvm_arch_vcpu_ioctl_run+0x23d/0xf40 [kvm] > > [ 672.196854] kvm_vcpu_ioctl+0x42c/0xb20 [kvm] > > [ 672.201243] ? __bpf_trace_kvm_age_hva+0xe0/0xe0 [kvm] > > [ 672.206410] ? __lock_release+0x494/0xa40 > > [ 672.210424] ? lock_downgrade+0x110/0x110 > > [ 672.214432] ? __lock_contended+0x4de/0x910 > > [ 672.218620] ? selinux_inode_getsecctx+0x80/0x80 > > [ 672.224286] ? lock_acquire+0x80/0x570 > > [ 672.228044] ? __fget_files+0x189/0x2f0 > > [ 672.231887] ? security_file_ioctl+0x50/0x90 > > [ 672.236164] __x64_sys_ioctl+0x127/0x190 > > [ 672.240090] do_syscall_64+0x3b/0x90 > > [ 672.243666] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 672.248719] RIP: 0033:0x7f648c3253eb > > [ 672.252297] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 > > e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> > > 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 2a 0f 00 f7 d8 64 89 01 48 > > [ 672.271989] RSP: 002b:00007f64889f3548 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000010 > > [ 672.279555] RAX: ffffffffffffffda RBX: 00005613af6071e0 RCX: > > 00007f648c3253eb > > [ 672.286688] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: > > 0000000000000019 > > [ 672.294719] RBP: 00007f648cc06000 R08: 00005613acedd210 R09: > > 00000000000000ff > > [ 672.301852] R10: 0000000000000001 R11: 0000000000000246 R12: > > 0000000000000001 > > [ 672.308984] R13: 0000000000000001 R14: 00000000000003f9 R15: > > 0000000000000000 > > [ 672.316124] > > [ 672.317616] > > [ 672.319109] addr ffffc90010fe75e0 is located in stack of task > > qemu-kvm/4844 at offset 40 in frame: > > [ 672.328781] ioapic_write_indirect+0x0/0x9e0 [kvm] > > [ 672.333609] > > [ 672.335106] this frame has 2 objects: > > [ 672.338774] [32, 40) 'vcpu_bitmap' > > [ 672.338776] [64, 88) 'irq' > > [ 672.342265] > > [ 672.346547] Memory state around the buggy address: > > [ 672.351341] ffffc90010fe7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 f1 > > [ 672.359382] ffffc90010fe7500: f1 f1 f1 00 f3 f3 f3 00 00 00 00 00 00 00 > > 00 00 > > [ 672.366600] >ffffc90010fe7580: 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 > > f2 00 > > [ 672.373821] ^ > > [ 672.380171] ffffc90010fe7600: 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 > > 00 00 > > [ 672.388164] ffffc90010fe7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 > > [ 672.395381] > > ================================================================== > > [ 672.402595] Disabling lock debugging due to kernel taint > > [ 673.811651] hrtimer: interrupt took 722589 ns > > > > > > > > Host kernel line info: > > # cat /proc/cmdline > > BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-0.rc4.35.el9.x86_64+debug > > root=/dev/mapper/rhel_amd--milan--07-root ro > > resume=/dev/mapper/rhel_amd--milan--07-swap rd.lvm.lv=rhel_amd-milan-07/root > > rd.lvm.lv=rhel_amd-milan-07/swap console=ttyS0,115200n81 crashkernel=auto > > > > > > > > Hi, would you please help to check this? Thanks. > > > > > > > > Best regards > > Liu Nana (In reply to Bandan Das from comment #8) > (In reply to liunana from comment #7) > > (In reply to Dr. David Alan Gilbert from comment #6) > > > (In reply to liunana from comment #5) > > > > (In reply to Bandan Das from comment #4) > > > > > I tried this with kernel-5.14.0-0.rc2.23.el9.x86_64 on > > > > > dell-per6525-01.dell2.lab.eng.bos.redhat.com which (I think) is a Zen3. > > > > > I see no trace launching a guest. > > > > > > > > It occurs in the machine's use, I didn't meet the issue at first while doing > > > > my test. > > > > But when I met it once, it is easy to reproduce. > > > > > > > > > > > > Is there a specific guest config that > > > > > causes this ? Could you also try a more recent build ? > > > > > > > > Ok, I will try it again using the recent build, and will update the result. > > > > > > > > > > > > > > > > > > > > Best regards > > > > Liu Nana > > > > > > Liu: Just a guess, but how big are your VMs? What's the command line you're > > > using for the qemu? > > > > > > About 6 VMs, including windows guests and RHEL guests. I installed them with > > avocado automatically. > > > Thanks for confirming that it's still there. > I need to reproduce this on my setup. > Hi, sorry for the late reply, the Milan machine keeps being used. > Can you either > - give me instructions on how you set this up using avocado ? > Or > - Give me setup instructions using qemu ? I know you posted the qemu command > line but is that > enough to reproduce ? Would you please try the debug kernel packages? It easily to reproduce this bug and seems I can only reproduce this bug with debug kernel. Test Env: # rpm -qa | grep kernel kernel-tools-libs-5.14.0-0.rc4.35.el9.x86_64 kernel-core-5.14.0-0.rc4.35.el9.x86_64 kernel-modules-5.14.0-0.rc4.35.el9.x86_64 kernel-5.14.0-0.rc4.35.el9.x86_64 kernel-tools-5.14.0-0.rc4.35.el9.x86_64 kernel-headers-5.14.0-0.rc4.35.el9.x86_64 kernel-srpm-macros-1.0-7.el9.noarch kernel-debug-core-5.14.0-0.rc4.35.el9.x86_64 kernel-debug-modules-5.14.0-0.rc4.35.el9.x86_64 kernel-debug-devel-5.14.0-0.rc4.35.el9.x86_64 kernel-debug-5.14.0-0.rc4.35.el9.x86_64 Reproduce steps: 1. Install kernel-debug packages. Then reboot to choose the kernel '5.14.0-0.rc4.35.el9.x86_64+debug'. 2. Boot a guest, you will see the call trace in the console soon, or you check the dmesg log. And I guess you can reproduce this bug with your own boot script. This is a simple qemu command line: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pc \ -nodefaults \ -m 104448 \ -smp 128,maxcpus=128,cores=64,threads=1,dies=1,sockets=2 \ -cpu 'EPYC-Milan',+kvm_pv_unhalt \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel850-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -vnc :0 \ -enable-kvm \ -monitor stdio \ >Do I have to run 6 guests to reproduce this ? No need, I can reproduce this bug easily at the guest first boot with new kernel. > Should the guests be idle or should I run > something for the trace to occur ? No need. I just boot a guest after host rebooting. Would you please check if this can reproduce the bug? Thanks. Best regards Liu Nana Yep, I can reproduce that on another box ( amd-epyc3-milan-7713-2s.tpb ) /usr/libexec/qemu-kvm -sandbox on -machine pc -nodefaults -m 200G -smp 128,maxcpus=128,cores=64,threads=1,dies=1,sockets=2 -cpu 'EPYC-Milan',+kvm_pv_unhalt -drive if=virtio,file=./rhel-guest-image-8.6-157.x86_64.qcow2 -nographic -enable-kvm for me it's fine at smp 32 or 64; but 128 triggers it. I'm pretty sure the problem here is that kvm_make_vcpus_request_mask is being called with vcpu_bitmap being a single long on the stack
of ioapic_write_indirect, and kvm_make_cpus_request_mask has a:
kvm_for_each_vcpu(i, vcpu, kvm) {
if ((vcpu_bitmap && !test_bit(i, vcpu_bitmap)) ||
vcpu == except)
continue;
which dutifully tasks all 128 bits of the 64bit vcpu_bitmap;
I can see the 'i' index over 100 in pr_info.
While it seems that ioapic_write_indirect() can't set bits about 64, it is illegal
indeed to call kvm_make_cpus_request_mask() with truncated vcpu mask as we don't
pass a 'length' parameter (so KVM_MAX_VCPUS bits is assumed). The following
(compile and smoke-tested only) should help I believe:
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index ff005fe738a4..58829358224c 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -319,7 +319,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
unsigned index;
bool mask_before, mask_after;
union kvm_ioapic_redirect_entry *e;
- unsigned long vcpu_bitmap;
+ unsigned long vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
int old_remote_irr, old_delivery_status, old_dest_id, old_dest_mode;
switch (ioapic->ioregsel) {
@@ -384,9 +384,9 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
irq.shorthand = APIC_DEST_NOSHORT;
irq.dest_id = e->fields.dest_id;
irq.msi_redir_hint = false;
- bitmap_zero(&vcpu_bitmap, 16);
+ bitmap_zero(vcpu_bitmap, 16);
kvm_bitmap_or_dest_vcpus(ioapic->kvm, &irq,
- &vcpu_bitmap);
+ vcpu_bitmap);
if (old_dest_mode != e->fields.dest_mode ||
old_dest_id != e->fields.dest_id) {
/*
@@ -399,10 +399,10 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
kvm_lapic_irq_dest_mode(
!!e->fields.dest_mode);
kvm_bitmap_or_dest_vcpus(ioapic->kvm, &irq,
- &vcpu_bitmap);
+ vcpu_bitmap);
}
kvm_make_scan_ioapic_request_mask(ioapic->kvm,
- &vcpu_bitmap);
+ vcpu_bitmap);
} else {
kvm_make_scan_ioapic_request(ioapic->kvm);
}
Hmm that'll get quite big these days I guess. Is there a reason that bitmap_zero doesn't have to cover the whole of the bitmap? (In reply to Dr. David Alan Gilbert from comment #14) > Hmm that'll get quite big these days I guess. We need 1 bit per vCPU and KVM_MAX_VCPUS is 288 upstream and 2048 downstream so we will need 256 bytes max. We can live with that I guess. > Is there a reason that bitmap_zero doesn't have to cover the whole of the > bitmap? Nitesh (Cc:) should know exactly: commit 7ee30bc132c683d06a6d9e360e39e483e3990708 Author: Nitesh Narayan Lal <nitesh> Date: Thu Nov 7 07:53:43 2019 -0500 KVM: x86: deliver KVM IOAPIC scan request to target vCPUs commit 9a2ae9f6b6bbd3ef05d5e5977ace854e9b8f04b5 Author: Nitesh Narayan Lal <nitesh> Date: Wed Nov 20 07:12:24 2019 -0500 KVM: x86: Zero the IOAPIC scan request dest vCPUs bitmap (In reply to Vitaly Kuznetsov from comment #15) > (In reply to Dr. David Alan Gilbert from comment #14) > > Hmm that'll get quite big these days I guess. > > We need 1 bit per vCPU and KVM_MAX_VCPUS is 288 upstream and 2048 downstream > so we will need 256 bytes max. We can live with that I guess. > > > Is there a reason that bitmap_zero doesn't have to cover the whole of the > > bitmap? > > Nitesh (Cc:) should know exactly: > Thanks, Vitaly for adding me. I agree, it looks like the code should have been using KVM_MAX_VCPUS in the first place. Will dig further into the code to see if I could share any other useful information. Thanks, Nitesh! I suspect kvm_bitmap_or_dest_vcpus() has the same issue as it passes unsigned long bitmap to kvm_apic_map_get_dest_lapic. I think we should enlarge it too. (I didn't spend much time on the code yet though, may be wrong) Thanks, Nitesh! I suspect kvm_bitmap_or_dest_vcpus() has the same issue as it passes unsigned long bitmap to kvm_apic_map_get_dest_lapic. I think we should enlarge it too. (I didn't spend much time on the code yet though, may be wrong) I reproduced it on an older host, as expected it also triggers; so not Milan specific (Not even sure it's AMD specific) (In reply to Vitaly Kuznetsov from comment #18) > Thanks, Nitesh! I suspect kvm_bitmap_or_dest_vcpus() has the same issue as > it passes > unsigned long bitmap to kvm_apic_map_get_dest_lapic. I think we should > enlarge it too. > (I didn't spend much time on the code yet though, may be wrong) Yes, that should also be fixed along with its first for_each_set_bit loop. I think the same change should also be made in kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast, isn't it? Thanks! (In reply to Nitesh Narayan Lal from comment #20) > > I think the same change should also be made in kvm_irq_delivery_to_apic_fast > and kvm_intr_is_single_vcpu_fast, isn't it? kvm_apic_map_get_logical_dest() doesn't seem to set more than 16 bits in 'bitmap' (and its interface says "u16 *bitmap"), same goes to kvm_apic_map_get_dest_lapic() so kvm_irq_delivery_to_apic_fast() and kvm_intr_is_single_vcpu_fast() should be safe I believe. I may have missed something of course... I've sent https://lore.kernel.org/kvm/20210820124354.582222-2-vkuznets@redhat.com/ upstream. Hi Vitaly, I can't reproduce this bug without debug kernel, so seems I need to pre-verify this with debug kernel. But I didn't see the debug kernel packages on above mr link(Comment 24 & Comment 25), could you help to check this? Thanks. Best regards Liu Nana Test Env: kernel-5.14.0-14.el9.x86_64+debug qemu-kvm-6.1.0-6.el9.x86_64 amd-milan-04.khw1.lab.eng.bos.redhat.com Test senarios: 1. Installation of RHEL9/Win11/Win2022 guests: PASS 2. sanity test of cpu model: PASS existed bugs: Bug 1959421 - [RHEL9]Host hung with log "BUG: soft lockup - CPU#29 stuck for 22s! [systemd:1]" on Milan machine Hi Vitaly, Would you please help to take a look of the existed bug? This is the latest comment Which is reproduced while testing the current bug: https://bugzilla.redhat.com/show_bug.cgi?id=1959421#c14 If that's nothing with this bug I think we can move this bug to VERIFIED. Thanks! Best regards Liu Nana There is another existed issue: Bug 2024063 - [RHEL.9.0.0] Host outputs Call Trace messages while booting windows guests on Milan host. Besides, there is a new KSAN error now. I boot a windows 2019 guest with huge memory and then reboot it. Then VM get stuck and host reports Call Trace. But I only meet this issue once. After sending a quit vm command, qemu reports error log: (qemu) q qemu:cpus_kick_thread: Invalid argument[60276.403712] switch: port 2(tap0) entered disabled state Seems this is a new issue, I create one bug to track this: Bug 2024058 - [RHEL9] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137] Could you please also help to check if above two bugs are related to the current bug? If not, we can move this bug to VERIFIED. Thanks! Best Liu Nana this one is always kvm_make_vcpus_request_mask, where as 2024058 is a scarier one in gup_pte_range Both 2024058 and 2024063 are complaining a lot about 'cachline tracking' - first time I've seen that. (In reply to liunana from comment #36) > > Would you please help to take a look of the existed bug? > This is the latest comment Which is reproduced while testing the current > bug: https://bugzilla.redhat.com/show_bug.cgi?id=1959421#c14 (In reply to liunana from comment #37) > > Seems this is a new issue, I create one bug to track this: Bug 2024058 - > [RHEL9] KASAN: null-ptr-deref in range > [0x0000000000000130-0x0000000000000137] > Thanks for reporting these! From skimming through them I don't think they're directly related to this BZ. It would also be great to retest with KVM rebase (https://bugzilla.redhat.com/show_bug.cgi?id=2009338). Move this bz to verified according to Comment 43. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: kernel), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:3907 |