Bug 2151878
| Summary: | KVM/nested traceback on AMD CPU when deploying CRC | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | dpawlik <dpawlik> | ||||
| Component: | qemu-kvm | Assignee: | Bandan Das <bdas> | ||||
| qemu-kvm sub component: | CPU Models | QA Contact: | yduan | ||||
| Status: | CLOSED MIGRATED | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | apevec, bdas, bdobreli, bstinson, cfergeau, coli, gveitmic, jinzhao, juzhang, jwboyer, nilal, qcheng, virt-maint, vkuznets, xuwei, yduan, zhguo | ||||
| Version: | 9.3 | Keywords: | MigratedToJIRA, Triaged | ||||
| Target Milestone: | rc | Flags: | zhguo:
needinfo-
pm-rhel: mirror+ |
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-09-22 13:40:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
dpawlik
2022-12-08 12:50:01 UTC
I'd bet disabling the pmu would help (But have a vague memory of a recent bug where -perf didn't do it on AMD, but can't find the reference to it) (In reply to Dr. David Alan Gilbert from comment #4) > I'd bet disabling the pmu would help > (But have a vague memory of a recent bug where -perf didn't do it on AMD, > but can't find the reference to it) Not really, but it raises longer traceback. virsh dumpxml crc > crc.xml virsh stop crc virsh destroy crc virsh undefine crc <add <pmu state='off'/> to the features in crc.xml> virsh define crc.xml virsh start crc Current traceback: [Thu Dec 15 02:58:34 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [Thu Dec 15 02:58:34 2022] RSP: 002b:00007f4e46ffc4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Thu Dec 15 02:58:34 2022] RAX: ffffffffffffffda RBX: 00007f4e477fee50 RCX: 00007f4e5163ec6b [Thu Dec 15 02:58:34 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [Thu Dec 15 02:58:34 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff [Thu Dec 15 02:58:34 2022] R10: 00007f4be82b8790 R11: 0000000000000246 R12: 00005574ce88b870 [Thu Dec 15 02:58:34 2022] R13: 00007f4e477feff0 R14: 696833cf082bd200 R15: 00007f4e477fee48 [Thu Dec 15 02:58:34 2022] </TASK> [Thu Dec 15 02:58:34 2022] Call Trace: [Thu Dec 15 02:58:34 2022] <TASK> [Thu Dec 15 02:58:34 2022] x86_pmu_stop+0x50/0xb0 [Thu Dec 15 02:58:34 2022] x86_pmu_del+0x73/0x190 [Thu Dec 15 02:58:34 2022] event_sched_out.part.0+0x7a/0x1f0 [Thu Dec 15 02:58:34 2022] group_sched_out.part.0+0x93/0xf0 [Thu Dec 15 02:58:34 2022] __perf_event_disable+0xdc/0x1a0 [Thu Dec 15 02:58:34 2022] event_function+0x91/0xe0 [Thu Dec 15 02:58:34 2022] ? group_sched_out.part.0+0xf0/0xf0 [Thu Dec 15 02:58:34 2022] remote_function+0x47/0x50 [Thu Dec 15 02:58:34 2022] generic_exec_single+0x78/0xb0 [Thu Dec 15 02:58:34 2022] smp_call_function_single+0xeb/0x130 [Thu Dec 15 02:58:34 2022] ? sw_perf_event_destroy+0x60/0x60 [Thu Dec 15 02:58:34 2022] ? __hrtimer_start_range_ns+0x215/0x300 [Thu Dec 15 02:58:34 2022] event_function_call+0x9c/0x160 [Thu Dec 15 02:58:34 2022] ? group_sched_out.part.0+0xf0/0xf0 [Thu Dec 15 02:58:34 2022] ? perf_kprobe_event_init+0x90/0x90 [Thu Dec 15 02:58:34 2022] perf_event_pause+0x58/0xa0 [Thu Dec 15 02:58:34 2022] reprogram_counter+0x7b/0x320 [kvm] [Thu Dec 15 02:58:34 2022] amd_pmu_set_msr+0x106/0x170 [kvm_amd] [Thu Dec 15 02:58:34 2022] ? __svm_vcpu_run+0x67/0x110 [kvm_amd] [Thu Dec 15 02:58:34 2022] ? get_gp_pmc_amd+0x129/0x200 [kvm_amd] [Thu Dec 15 02:58:34 2022] __kvm_set_msr+0x7f/0x1c0 [kvm] [Thu Dec 15 02:58:34 2022] kvm_emulate_wrmsr+0x52/0x1b0 [kvm] [Thu Dec 15 02:58:34 2022] vcpu_enter_guest+0x667/0x1010 [kvm] [Thu Dec 15 02:58:34 2022] ? kvm_vcpu_kick+0x13/0xb0 [kvm] [Thu Dec 15 02:58:34 2022] ? __apic_accept_irq+0xe0/0x300 [kvm] [Thu Dec 15 02:58:34 2022] vcpu_run+0x33/0x250 [kvm] [Thu Dec 15 02:58:34 2022] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm] [Thu Dec 15 02:58:34 2022] kvm_vcpu_ioctl+0x271/0x670 [kvm] [Thu Dec 15 02:58:34 2022] ? __seccomp_filter+0x45/0x470 [Thu Dec 15 02:58:34 2022] ? security_file_ioctl+0x32/0x50 [Thu Dec 15 02:58:34 2022] __x64_sys_ioctl+0x8a/0xc0 [Thu Dec 15 02:58:34 2022] do_syscall_64+0x5c/0x90 [Thu Dec 15 02:58:34 2022] ? security_file_ioctl+0x32/0x50 [Thu Dec 15 02:58:34 2022] ? kvm_on_user_return+0x82/0x90 [kvm] [Thu Dec 15 02:58:34 2022] ? fire_user_return_notifiers+0x41/0x60 [Thu Dec 15 02:58:34 2022] ? exit_to_user_mode_prepare+0xca/0x100 [Thu Dec 15 02:58:34 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 [Thu Dec 15 02:58:34 2022] ? syscall_exit_work+0x11a/0x150 [Thu Dec 15 02:58:34 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 [Thu Dec 15 02:58:34 2022] ? sysvec_apic_timer_interrupt+0x3c/0x90 [Thu Dec 15 02:58:34 2022] entry_SYSCALL_64_after_hwframe+0x63/0xcd [Thu Dec 15 02:58:34 2022] RIP: 0033:0x7f4e5163ec6b [Thu Dec 15 02:58:34 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [Thu Dec 15 02:58:34 2022] RSP: 002b:00007f4e46ffc4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Thu Dec 15 02:58:34 2022] RAX: ffffffffffffffda RBX: 00007f4e477fee50 RCX: 00007f4e5163ec6b [Thu Dec 15 02:58:34 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [Thu Dec 15 02:58:34 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff [Thu Dec 15 02:58:34 2022] R10: 00007f4be82b8790 R11: 0000000000000246 R12: 00005574ce88b870 [Thu Dec 15 02:58:34 2022] R13: 00007f4e477feff0 R14: 696833cf082bd200 R15: 00007f4e477fee48 [Thu Dec 15 02:58:34 2022] </TASK> (In reply to dpawlik from comment #5) > (In reply to Dr. David Alan Gilbert from comment #4) > > I'd bet disabling the pmu would help > > (But have a vague memory of a recent bug where -perf didn't do it on AMD, > > but can't find the reference to it) > > Not really, but it raises longer traceback. Well that's the thing about the recent bug, which I think is: https://marc.info/?l=kvm&m=166886061623174&w=2 if I read that right, then passing -perf might not work, and the guest still thinks it has counters. But, please go back a step - I notice on re-reading this is nested; I don't think I quite understand it though: L0 (comment 1) RHEL8.4 openstack L1 (comment 2) CentOS 9 stream (Qemu 7.1 !) L?? (comment 3) ??? That says L3 but surely you mean L2? are those errors in the L1 dmesg? That qemu 7.1 RPM in CS 9 is *very* untested at the moment. > virsh dumpxml crc > crc.xml > virsh stop crc > virsh destroy crc > virsh undefine crc > > <add <pmu state='off'/> to the features in crc.xml> > > virsh define crc.xml > virsh start crc > > > Current traceback: > > > [Thu Dec 15 02:58:34 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 > 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 > [Thu Dec 15 02:58:34 2022] RSP: 002b:00007f4e46ffc4a8 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > [Thu Dec 15 02:58:34 2022] RAX: ffffffffffffffda RBX: 00007f4e477fee50 RCX: > 00007f4e5163ec6b > [Thu Dec 15 02:58:34 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: > 0000000000000019 > [Thu Dec 15 02:58:34 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: > 00000000000000ff > [Thu Dec 15 02:58:34 2022] R10: 00007f4be82b8790 R11: 0000000000000246 R12: > 00005574ce88b870 > [Thu Dec 15 02:58:34 2022] R13: 00007f4e477feff0 R14: 696833cf082bd200 R15: > 00007f4e477fee48 > [Thu Dec 15 02:58:34 2022] </TASK> > [Thu Dec 15 02:58:34 2022] Call Trace: > [Thu Dec 15 02:58:34 2022] <TASK> > [Thu Dec 15 02:58:34 2022] x86_pmu_stop+0x50/0xb0 > [Thu Dec 15 02:58:34 2022] x86_pmu_del+0x73/0x190 > [Thu Dec 15 02:58:34 2022] event_sched_out.part.0+0x7a/0x1f0 > [Thu Dec 15 02:58:34 2022] group_sched_out.part.0+0x93/0xf0 > [Thu Dec 15 02:58:34 2022] __perf_event_disable+0xdc/0x1a0 > [Thu Dec 15 02:58:34 2022] event_function+0x91/0xe0 > [Thu Dec 15 02:58:34 2022] ? group_sched_out.part.0+0xf0/0xf0 > [Thu Dec 15 02:58:34 2022] remote_function+0x47/0x50 > [Thu Dec 15 02:58:34 2022] generic_exec_single+0x78/0xb0 > [Thu Dec 15 02:58:34 2022] smp_call_function_single+0xeb/0x130 > [Thu Dec 15 02:58:34 2022] ? sw_perf_event_destroy+0x60/0x60 > [Thu Dec 15 02:58:34 2022] ? __hrtimer_start_range_ns+0x215/0x300 > [Thu Dec 15 02:58:34 2022] event_function_call+0x9c/0x160 > [Thu Dec 15 02:58:34 2022] ? group_sched_out.part.0+0xf0/0xf0 > [Thu Dec 15 02:58:34 2022] ? perf_kprobe_event_init+0x90/0x90 > [Thu Dec 15 02:58:34 2022] perf_event_pause+0x58/0xa0 > [Thu Dec 15 02:58:34 2022] reprogram_counter+0x7b/0x320 [kvm] > [Thu Dec 15 02:58:34 2022] amd_pmu_set_msr+0x106/0x170 [kvm_amd] > [Thu Dec 15 02:58:34 2022] ? __svm_vcpu_run+0x67/0x110 [kvm_amd] > [Thu Dec 15 02:58:34 2022] ? get_gp_pmc_amd+0x129/0x200 [kvm_amd] > [Thu Dec 15 02:58:34 2022] __kvm_set_msr+0x7f/0x1c0 [kvm] > [Thu Dec 15 02:58:34 2022] kvm_emulate_wrmsr+0x52/0x1b0 [kvm] > [Thu Dec 15 02:58:34 2022] vcpu_enter_guest+0x667/0x1010 [kvm] > [Thu Dec 15 02:58:34 2022] ? kvm_vcpu_kick+0x13/0xb0 [kvm] > [Thu Dec 15 02:58:34 2022] ? __apic_accept_irq+0xe0/0x300 [kvm] > [Thu Dec 15 02:58:34 2022] vcpu_run+0x33/0x250 [kvm] > [Thu Dec 15 02:58:34 2022] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm] > [Thu Dec 15 02:58:34 2022] kvm_vcpu_ioctl+0x271/0x670 [kvm] > [Thu Dec 15 02:58:34 2022] ? __seccomp_filter+0x45/0x470 > [Thu Dec 15 02:58:34 2022] ? security_file_ioctl+0x32/0x50 > [Thu Dec 15 02:58:34 2022] __x64_sys_ioctl+0x8a/0xc0 > [Thu Dec 15 02:58:34 2022] do_syscall_64+0x5c/0x90 > [Thu Dec 15 02:58:34 2022] ? security_file_ioctl+0x32/0x50 > [Thu Dec 15 02:58:34 2022] ? kvm_on_user_return+0x82/0x90 [kvm] > [Thu Dec 15 02:58:34 2022] ? fire_user_return_notifiers+0x41/0x60 > [Thu Dec 15 02:58:34 2022] ? exit_to_user_mode_prepare+0xca/0x100 > [Thu Dec 15 02:58:34 2022] ? syscall_exit_to_user_mode+0x12/0x30 > [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 > [Thu Dec 15 02:58:34 2022] ? syscall_exit_work+0x11a/0x150 > [Thu Dec 15 02:58:34 2022] ? syscall_exit_to_user_mode+0x12/0x30 > [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 > [Thu Dec 15 02:58:34 2022] ? do_syscall_64+0x69/0x90 > [Thu Dec 15 02:58:34 2022] ? sysvec_apic_timer_interrupt+0x3c/0x90 > [Thu Dec 15 02:58:34 2022] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [Thu Dec 15 02:58:34 2022] RIP: 0033:0x7f4e5163ec6b > [Thu Dec 15 02:58:34 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 > 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 > [Thu Dec 15 02:58:34 2022] RSP: 002b:00007f4e46ffc4a8 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > [Thu Dec 15 02:58:34 2022] RAX: ffffffffffffffda RBX: 00007f4e477fee50 RCX: > 00007f4e5163ec6b > [Thu Dec 15 02:58:34 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: > 0000000000000019 > [Thu Dec 15 02:58:34 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: > 00000000000000ff > [Thu Dec 15 02:58:34 2022] R10: 00007f4be82b8790 R11: 0000000000000246 R12: > 00005574ce88b870 > [Thu Dec 15 02:58:34 2022] R13: 00007f4e477feff0 R14: 696833cf082bd200 R15: > 00007f4e477fee48 > [Thu Dec 15 02:58:34 2022] </TASK> (In reply to Dr. David Alan Gilbert from comment #6) > (In reply to dpawlik from comment #5) > > (In reply to Dr. David Alan Gilbert from comment #4) > > > I'd bet disabling the pmu would help > > > (But have a vague memory of a recent bug where -perf didn't do it on AMD, > > > but can't find the reference to it) > > > > Not really, but it raises longer traceback. > > Well that's the thing about the recent bug, which I think is: > https://marc.info/?l=kvm&m=166886061623174&w=2 > > if I read that right, then passing -perf might not work, and the guest still > thinks > it has counters. > > But, please go back a step - I notice on re-reading this is nested; I don't > think I quite understand it though: > L0 (comment 1) RHEL8.4 openstack > L1 (comment 2) CentOS 9 stream (Qemu 7.1 !) > L?? (comment 3) ??? That says L3 but surely you mean L2? Sorry for my mistake, yes, it should be L2 - CRC instance. > are those errors in the L1 dmesg? Yes. I don't have access to L0. Details described earlier was from L1. What is worth to mention on L1: - On Centos 8 stream without enabling nested virtualization => WORKS - On Centos 8 stream with enabled nested virtualization ( options kvm_amd nested=1 in /etc/modprobe.d/kvm.conf + reboot the VM) => DOES NOT WORK - On Centos 9 stream without enabled nested virtualization => DOES NOT WORK - On Centos 9 stream with enabled nested virtualization ( options kvm_amd nested=1 in /etc/modprobe.d/kvm.conf + reboot the VM) => DOES NOT WORK Now I see that logs from comment #5 are not correct, due there was nested virtualization enabled (I rebuild the host, but it seems that it have not done it). "Fresh" logs from L1, without enabled nested virtualization + disabled pmu looks like: [Fri Dec 16 02:42:09 2022] RIP: 0033:0x7efeaee3ec6b [Fri Dec 16 02:42:09 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 f f ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [Fri Dec 16 02:42:09 2022] RSP: 002b:00007efc5d7f84a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Fri Dec 16 02:42:09 2022] RAX: ffffffffffffffda RBX: 00007efc5dffae50 RCX: 00007efeaee3ec6b [Fri Dec 16 02:42:09 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001c [Fri Dec 16 02:42:09 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff [Fri Dec 16 02:42:09 2022] R10: 00007efc5068d830 R11: 0000000000000246 R12: 00005606d0903610 [Fri Dec 16 02:42:09 2022] R13: 00007efc5dffaff0 R14: edb821d8ed042100 R15: 00007efc5dffae48 [Fri Dec 16 02:42:09 2022] </TASK> [Fri Dec 16 02:42:09 2022] Call Trace: [Fri Dec 16 02:42:09 2022] <TASK> [Fri Dec 16 02:42:09 2022] x86_pmu_stop+0x50/0xb0 [Fri Dec 16 02:42:09 2022] x86_pmu_del+0x73/0x190 [Fri Dec 16 02:42:09 2022] event_sched_out.part.0+0x7a/0x1f0 [Fri Dec 16 02:42:09 2022] group_sched_out.part.0+0x93/0xf0 [Fri Dec 16 02:42:09 2022] ctx_sched_out+0x124/0x2a0 [Fri Dec 16 02:42:09 2022] perf_event_context_sched_out+0x1a5/0x460 [Fri Dec 16 02:42:09 2022] __perf_event_task_sched_out+0x50/0x170 [Fri Dec 16 02:42:09 2022] ? pick_next_task+0x51/0x940 [Fri Dec 16 02:42:09 2022] prepare_task_switch+0xbd/0x2a0 [Fri Dec 16 02:42:09 2022] __schedule+0x1cb/0x620 [Fri Dec 16 02:42:09 2022] ? kvm_complete_insn_gp+0x60/0x80 [kvm] [Fri Dec 16 02:42:09 2022] schedule+0x5a/0xc0 [Fri Dec 16 02:42:09 2022] xfer_to_guest_mode_handle_work+0xac/0xe0 [Fri Dec 16 02:42:09 2022] vcpu_run+0x1f5/0x250 [kvm] [Fri Dec 16 02:42:09 2022] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm] [Fri Dec 16 02:42:09 2022] kvm_vcpu_ioctl+0x271/0x670 [kvm] [Fri Dec 16 02:42:09 2022] ? __seccomp_filter+0x45/0x470 [Fri Dec 16 02:42:09 2022] ? security_file_ioctl+0x32/0x50 [Fri Dec 16 02:42:09 2022] __x64_sys_ioctl+0x8a/0xc0 [Fri Dec 16 02:42:09 2022] do_syscall_64+0x5c/0x90 [Fri Dec 16 02:42:09 2022] ? exit_to_user_mode_prepare+0xca/0x100 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? syscall_exit_work+0x11a/0x150 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] entry_SYSCALL_64_after_hwframe+0x63/0xcd [Fri Dec 16 02:42:09 2022] RIP: 0033:0x7efeaee3ec6b [Fri Dec 16 02:42:09 2022] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 f f ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [Fri Dec 16 02:42:09 2022] RSP: 002b:00007efc5d7f84a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Fri Dec 16 02:42:09 2022] RAX: ffffffffffffffda RBX: 00007efc5dffae50 RCX: 00007efeaee3ec6b [Fri Dec 16 02:42:09 2022] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001c [Fri Dec 16 02:42:09 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff [Fri Dec 16 02:42:09 2022] R10: 00007efc5068d830 R11: 0000000000000246 R12: 00005606d0903610 [Fri Dec 16 02:42:09 2022] R13: 00007efc5dffaff0 R14: edb821d8ed042100 R15: 00007efc5dffae48 [Fri Dec 16 02:42:09 2022] </TASK> [Fri Dec 16 02:42:09 2022] Call Trace: [Fri Dec 16 02:42:09 2022] <TASK> [Fri Dec 16 02:42:09 2022] amd_pmu_enable_all+0x44/0x60 [Fri Dec 16 02:42:09 2022] __perf_event_task_sched_in+0x24e/0x290 [Fri Dec 16 02:42:09 2022] ? __perf_event_task_sched_out+0x60/0x170 [Fri Dec 16 02:42:09 2022] finish_task_switch.isra.0+0x1fa/0x2a0 [Fri Dec 16 02:42:09 2022] __schedule+0x250/0x620 [Fri Dec 16 02:42:09 2022] ? kvm_complete_insn_gp+0x60/0x80 [kvm] [Fri Dec 16 02:42:09 2022] schedule+0x5a/0xc0 [Fri Dec 16 02:42:09 2022] xfer_to_guest_mode_handle_work+0xac/0xe0 [Fri Dec 16 02:42:09 2022] vcpu_run+0x1f5/0x250 [kvm] [Fri Dec 16 02:42:09 2022] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm] [Fri Dec 16 02:42:09 2022] kvm_vcpu_ioctl+0x271/0x670 [kvm] [Fri Dec 16 02:42:09 2022] ? __seccomp_filter+0x45/0x470 [Fri Dec 16 02:42:09 2022] ? security_file_ioctl+0x32/0x50 [Fri Dec 16 02:42:09 2022] __x64_sys_ioctl+0x8a/0xc0 [Fri Dec 16 02:42:09 2022] do_syscall_64+0x5c/0x90 [Fri Dec 16 02:42:09 2022] ? exit_to_user_mode_prepare+0xca/0x100 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? syscall_exit_work+0x11a/0x150 [Fri Dec 16 02:42:09 2022] ? syscall_exit_to_user_mode+0x12/0x30 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] ? do_syscall_64+0x69/0x90 [Fri Dec 16 02:42:09 2022] entry_SYSCALL_64_after_hwframe+0x63/0xcd Details about L2 (crc instance): Red Hat Enterprise Linux CoreOS 411.86.202211291109-0 [core@crc-pbwlw-master-0 ~]$ rpm -qa | grep -Ei 'qemu|libvirt|kernel|release' | sort kernel-4.18.0-372.32.1.el8_6.x86_64 kernel-core-4.18.0-372.32.1.el8_6.x86_64 kernel-modules-4.18.0-372.32.1.el8_6.x86_64 kernel-modules-extra-4.18.0-372.32.1.el8_6.x86_64 qemu-guest-agent-6.2.0-11.module+el8.6.0+16538+01ea313d.6.x86_64 redhat-release-8.6-0.1.el8.x86_64 redhat-release-eula-8.6-0.1.el8.x86_64 uname -a Linux crc-pbwlw-master-0 4.18.0-372.32.1.el8_6.x86_64 #1 SMP Fri Oct 7 12:35:10 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux dmesg: added as an attachment I can not reproduce this issue on rhel 8.4 host: Host: RHEL 8.4 kernel: 4.18.0-305.25.1.el8_4.x86_64 qemu-kvm: qemu-kvm-5.2.0-16.module+el8.4.0+16683+63f410ce.18.x86_64 libvirt: libvirt-daemon-kvm-7.0.0-14.8.module+el8.4.0+15255+f7eff4dd.x86_64 CPU info: Model name: AMD EPYC 7502 32-Core Processor L1 guest: (CPU: host-passthrough) CentOS-Stream 9 kernel: 5.14.0-283.el9.x86_64 qemu-kvm: qemu-kvm-7.2.0-11.el9.x86_64 L2 guest: same as L1 There is no traceback when start L2 guest. Hi, this makes the bug more interesting. Some part of our VMs for CI testing are running in our Cloud Provider, where we can not ask them to change the L0 system do different one. According to the comment made in CRC project issue [1], it does not work also in others Cloud Providers VMs, which can be a problem for other future CRC members. I made a table with testing cases: +----------------+------------------------------+-------------------------------------------------+--------------------------------------------------+--+-----------------+--------------------------+-----------------+-------------------------+------------------------+--------------------+-----------------+---------------+ | L0 system | L0 Kernel version | L0 Libvirt version | L0 Qemu version | | L0 CPU Model | L0 libvirt CPU model set | L1 system | L1 Nested virt enabled? | L1 Kernel version | L1 Libvirt version | L1 Qemu version | Does it work? | +----------------+------------------------------+-------------------------------------------------+--------------------------------------------------+--+-----------------+--------------------------+-----------------+-------------------------+------------------------+--------------------+-----------------+---------------+ | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 8 Stream | yes | 4.18.0-448.el8.x86_64 | 8.0.0-14 | 6.2.0-28 | yes | | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 8 Stream | no | 4.18.0-448.el8.x86_64 | 8.0.0-14 | 6.2.0-28 | yes | | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 9 Stream | yes | 5.14.0-289.el9.x86_64 | 9.0.0-7.el9.x86_64 | 7.2.0-11.el9 | no | | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 9 Stream | no | 5.14.0-289.el9.x86_64 | 9.0.0-7.el9.x86_64 | 7.2.0-11.el9 | no | | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 9 Stream | yes | 6.2.7-1.el9.elrepo | 9.0.0-7.el9.x86_64 | 7.2.0-11.el9 | yes | | Ubuntu 20.04.3 | 5.13.0-30-generic | 6.0.0-0ubuntu8.15 | 1:4.2-3ubuntu6.19 | | AMD EPYC 7402 | host-passthrough | Centos 9 Stream | no | 6.2.7-1.el9.elrepo | 9.0.0-7.el9.x86_64 | 7.2.0-11.el9 | yes | | RHEL 8.4 | 4.18.0-305.25.1.el8_4.x86_64 | 7.0.0-14.8.module+el8.4.0+15255+f7eff4dd.x86_64 | 5.2.0-16.module+el8.4.0+16683+63f410ce.18.x86_64 | | AMD EPYC 7502 | host-passthrough | Centos 9 Stream | ? | 5.14.0-283.el9.x86_64 | ? | 7.2.0-11.el9 | yes | +----------------+------------------------------+-------------------------------------------------+--------------------------------------------------+--+-----------------+--------------------------+-----------------+-------------------------+------------------------+--------------------+-----------------+---------------+ Dan [1] https://github.com/crc-org/crc/issues/3446#issuecomment-1342923882 Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |