Bug 1674254 - kernel warning in epyc/ryzen nested virtualization
Summary: kernel warning in epyc/ryzen nested virtualization
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 29
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-10 14:22 UTC by Hetz Ben Hamo
Modified: 2021-06-17 08:49 UTC (History)
18 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-09-17 20:11:13 UTC
Type: Bug
Embargoed:
chris_db: needinfo-


Attachments (Terms of Use)

Description Hetz Ben Hamo 2019-02-10 14:22:46 UTC
I'm using the latest Fedora 29 kernel (4.20.6-200.fc29.x86_6) and when I'm trying to run any OS as a nested virtuazliation, I'm getting the following kernel warning..

[31192.883525] WARNING: CPU: 14 PID: 14476 at arch/x86/kvm/mmu.c:2066 nonpaging_update_pte+0x5/0x10 [kvm]
[31192.883531] Modules linked in: nls_utf8 isofs loop vhost_net vhost tap fuse xt_CHECKSUM ipt_MASQUERADE tun devlink nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 bridge ip6table_mangle ip6table_raw stp llc ip6table_security iptable_nat nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables binfmt_misc nct6775 hwmon_vid sunrpc vfat fat xfs libcrc32c edac_mce_amd kvm_amd nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) snd_hda_codec_hdmi kvm nvidia(POE) snd_hda_codec_realtek snd_hda_codec_generic irqbypass snd_hda_intel snd_usb_audio snd_hda_codec uvcvideo drm_kms_helper ppdev snd_hda_core snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops crct10dif_pclmul snd_rawmidi videobuf2_v4l2 snd_hwdep crc32_pclmul videobuf2_common snd_seq drm ghash_clmulni_intel snd_seq_device videodev
[31192.883579]  snd_pcm media snd_timer snd sp5100_tco ipmi_devintf ccp soundcore i2c_piix4 k10temp wmi_bmof ipmi_msghandler parport_pc parport gpio_amdpt gpio_generic pcc_cpufreq acpi_cpufreq hid_logitech_hidpp igb dca crc32c_intel hid_logitech_dj i2c_algo_bit wmi pinctrl_amd
[31192.883606] CPU: 14 PID: 14476 Comm: CPU 5/KVM Tainted: P        W  OE     4.20.6-200.fc29.x86_64 #1
[31192.883609] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-7B79), BIOS A.40 06/28/2018
[31192.883643] RIP: 0010:nonpaging_update_pte+0x5/0x10 [kvm]
[31192.883647] Code: 00 00 00 00 00 0f 1f 44 00 00 31 c0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f0 48 85 c9
[31192.883651] RSP: 0018:ffffad4c8ebdfa90 EFLAGS: 00010206
[31192.883655] RAX: ffffffffc17bf120 RBX: 0000000000000701 RCX: ffffad4c8ebdfac0
[31192.883657] RDX: ffff9ba471e1e000 RSI: ffff9ba2f7b90a00 RDI: ffff9ba429b50000
[31192.883660] RBP: ffff9ba2f7b90a00 R08: ffff9ba471e1e000 R09: 0000000000000000
[31192.883663] R10: 0000000000000008 R11: 0000000000000007 R12: 0000000000000000
[31192.883665] R13: ffff9ba471e1e000 R14: ffff9ba429b50000 R15: ffffad4c8ebdfac8
[31192.883669] FS:  00007f35c2ffd700(0000) GS:ffff9ba49ed80000(0000) knlGS:0000000000000000
[31192.883672] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[31192.883674] CR2: 000000000060291d CR3: 0000000f973b8000 CR4: 00000000003406e0
[31192.883677] Call Trace:
[31192.883713]  kvm_mmu_pte_write+0x485/0x490 [kvm]
[31192.883747]  kvm_page_track_write+0x7c/0xa0 [kvm]
[31192.883780]  emulator_write_phys+0x36/0x50 [kvm]
[31192.883811]  emulator_read_write_onepage+0xef/0x310 [kvm]
[31192.883842]  emulator_read_write+0xc8/0x180 [kvm]
[31192.883875]  segmented_write+0x5d/0x80 [kvm]
[31192.883908]  writeback+0xf4/0x260 [kvm]
[31192.883940]  ? em_in+0x13a/0x240 [kvm]
[31192.883972]  x86_emulate_insn+0x7b0/0x10a0 [kvm]
[31192.884001]  ? kvm_set_irq+0xa1/0x130 [kvm]
[31192.884032]  x86_emulate_instruction+0x32f/0x700 [kvm]
[31192.884064]  complete_emulated_pio+0x3a/0x60 [kvm]
[31192.884096]  kvm_arch_vcpu_ioctl_run+0x186c/0x1b20 [kvm]
[31192.884124]  kvm_vcpu_ioctl+0x22f/0x5e0 [kvm]
[31192.884131]  ? __switch_to_asm+0x34/0x70
[31192.884135]  ? __switch_to_asm+0x40/0x70
[31192.884140]  ? __switch_to_xtra+0x51d/0x590
[31192.884144]  ? __switch_to_asm+0x40/0x70
[31192.884150]  do_vfs_ioctl+0xa4/0x630
[31192.884156]  ? syscall_trace_enter+0x192/0x2c0
[31192.884160]  ksys_ioctl+0x60/0x90
[31192.884164]  __x64_sys_ioctl+0x16/0x20
[31192.884168]  do_syscall_64+0x5b/0x160
[31192.884173]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[31192.884177] RIP: 0033:0x7f35dd1f309b
[31192.884180] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
[31192.884183] RSP: 002b:00007f35c2ffc6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[31192.884187] RAX: ffffffffffffffda RBX: 00007f35da585008 RCX: 00007f35dd1f309b
[31192.884189] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000023
[31192.884191] RBP: 0000000000000002 R08: 0000557d508fca50 R09: 0000000000000004
[31192.884194] R10: 0000000000000001 R11: 0000000000000246 R12: 0000557d508df9c0
[31192.884196] R13: 0000000000000000 R14: 00007f35da584000 R15: 0000557d51702880
[31192.884200] ---[ end trace e3de1b4753b8b39b ]---

I'm using Ryzen 2700 CPU, and the host virtualization is oVirt 4.3.0 (which I'm running on top of Virt-Manager..)

At the moment, if I want to run any OS as nested, in oVirt I have to select to pass-through to the VM inside oVirt or else it cannot start the nested VM (complains about missing "monitor").

Comment 1 Laura Abbott 2019-04-09 20:46:47 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora XX has now been rebased to 5.0.6  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.
 
If you experience different issues, please open a new bug report for those.

Comment 2 Justin M. Forbes 2019-09-17 20:11:13 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 3 Cristian Vladescu 2020-06-02 08:11:24 UTC
I also get the same stack crash on Proxmox 6.2-1 with nested virtualization on AMD Ryzen.

Linux version 5.4.34-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) ()
root@pve2:~# kvm --version
QEMU emulator version 5.0.0 (pve-qemu-kvm_5.0.0)
Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers


Jun  2 05:11:45 pve2 kernel: [106043.123346] ------------[ cut here ]------------
Jun  2 05:11:45 pve2 kernel: [106043.123361] WARNING: CPU: 30 PID: 2113 at arch/x86/kvm/mmu.c:2237 nonpaging_update_pte+0x9/0x10 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123361] Modules linked in: veth(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6_tables(E) sctp(E) iptable_filter(E) bpfilter(E) softdog(E) nfnetlink_log(E) nfnetlink(E) edac_mce_amd(E) kvm_amd(E) kvm(E) zfs(POE) zunicode(POE) zlua(POE) zavl(POE) icp(POE) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) nouveau(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_intel(E) ghash_clmulni_intel(E) snd_intel_dspcfg(E) ttm(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) aesni_intel(E) drm(E) snd_hwdep(E) snd_pcm(E) fb_sys_fops(E) crypto_simd(E) syscopyarea(E) eeepc_wmi(E) joydev(E) sysfillrect(E) cryptd(E) snd_timer(E) sysimgblt(E) asus_wmi(E) glue_helper(E) input_leds(E) snd(E) sparse_keymap(E) video(E) pcspkr(E) soundcore(E) mxm_wmi(E) wmi_bmof(E) ccp(E) k10temp(E) mac_hid(E) zcommon(POE) znvpair(POE) spl(OE) vhost_net(E) vhost(E) tap(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) sunrpc(E) ib_core(E) iscsi_tcp(E)
Jun  2 05:11:45 pve2 kernel: [106043.123380]  libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) hid_logitech_hidpp(E) hid_logitech_dj(E) usbmouse(E) usbkbd(E) hid_generic(E) usbhid(E) hid(E) i2c_piix4(E) qla2xxx(E) nvme_fc(E) nvme_fabrics(E) scsi_transport_fc(E) ahci(E) libahci(E) igb(E) i2c_algo_bit(E) dca(E) wmi(E)
Jun  2 05:11:45 pve2 kernel: [106043.123390] CPU: 30 PID: 2113 Comm: kvm Tainted: P        W  OE     5.4.34-1-pve #1
Jun  2 05:11:45 pve2 kernel: [106043.123391] Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 1407 04/02/2020
Jun  2 05:11:45 pve2 kernel: [106043.123400] RIP: 0010:nonpaging_update_pte+0x9/0x10 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123400] Code: 00 0f 1f 44 00 00 55 31 c0 48 89 e5 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 5d c3 0f 1f 00 0f 1f 44 00 00 31 f6 48 8b 04 77 48 63 54 37
Jun  2 05:11:45 pve2 kernel: [106043.123401] RSP: 0018:ffffa93fc21e3a78 EFLAGS: 00010206
Jun  2 05:11:45 pve2 kernel: [106043.123402] RAX: ffffffffc12091d0 RBX: 0000000000000701 RCX: ffffa93fc21e3ac0
Jun  2 05:11:45 pve2 kernel: [106043.123402] RDX: ffff949b99a74000 RSI: ffff949e1ff29500 RDI: ffff94a2e89f8000
Jun  2 05:11:45 pve2 kernel: [106043.123402] RBP: ffffa93fc21e3a78 R08: 00000000008125f3 R09: ffff949b99a74000
Jun  2 05:11:45 pve2 kernel: [106043.123403] R10: 0000000000000000 R11: 0000000000001960 R12: ffff94a2e89f8000
Jun  2 05:11:45 pve2 kernel: [106043.123403] R13: 0000000000000000 R14: ffff949b99a74000 R15: ffffa93fc21e3ac8
Jun  2 05:11:45 pve2 kernel: [106043.123404] FS:  00007f396d5ff700(0000) GS:ffff94a36ef80000(0000) knlGS:0000000000000000
Jun  2 05:11:45 pve2 kernel: [106043.123404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  2 05:11:45 pve2 kernel: [106043.123405] CR2: ffffffffff600400 CR3: 0000000771fea000 CR4: 0000000000340ee0
Jun  2 05:11:45 pve2 kernel: [106043.123405] Call Trace:
Jun  2 05:11:45 pve2 kernel: [106043.123415]  kvm_mmu_pte_write+0x421/0x430 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123424]  kvm_page_track_write+0x82/0xc0 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123433]  emulator_write_phys+0x3b/0x50 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123441]  write_emulate+0xe/0x10 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123448]  emulator_read_write_onepage+0xfc/0x320 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123456]  emulator_read_write+0xd6/0x190 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123463]  emulator_write_emulated+0x15/0x20 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123471]  segmented_write+0x5d/0x80 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123479]  writeback+0x203/0x2e0 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123487]  x86_emulate_insn+0x983/0x1040 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123494]  x86_emulate_instruction+0x350/0x720 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123502]  complete_emulated_pio+0x3f/0x70 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123509]  kvm_arch_vcpu_ioctl_run+0x4cb/0x570 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123517]  kvm_vcpu_ioctl+0x24b/0x610 [kvm]
Jun  2 05:11:45 pve2 kernel: [106043.123519]  ? do_futex+0xc7/0xc50
Jun  2 05:11:45 pve2 kernel: [106043.123521]  ? apic_timer_interrupt+0xa/0x20
Jun  2 05:11:45 pve2 kernel: [106043.123523]  do_vfs_ioctl+0xa9/0x640
Jun  2 05:11:45 pve2 kernel: [106043.123524]  ksys_ioctl+0x67/0x90
Jun  2 05:11:45 pve2 kernel: [106043.123525]  __x64_sys_ioctl+0x1a/0x20
Jun  2 05:11:45 pve2 kernel: [106043.123527]  do_syscall_64+0x57/0x190
Jun  2 05:11:45 pve2 kernel: [106043.123527]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun  2 05:11:45 pve2 kernel: [106043.123528] RIP: 0033:0x7f4183922427
Jun  2 05:11:45 pve2 kernel: [106043.123529] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
Jun  2 05:11:45 pve2 kernel: [106043.123529] RSP: 002b:00007f396d5fa3b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun  2 05:11:45 pve2 kernel: [106043.123530] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4183922427
Jun  2 05:11:45 pve2 kernel: [106043.123530] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000021
Jun  2 05:11:45 pve2 kernel: [106043.123531] RBP: 0000000000000000 R08: 00005558156d3350 R09: 0000000000000000
Jun  2 05:11:45 pve2 kernel: [106043.123531] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f4172c98e80
Jun  2 05:11:45 pve2 kernel: [106043.123531] R13: 000055581569d2e0 R14: 00007f4176ebc000 R15: 0000000000000000
Jun  2 05:11:45 pve2 kernel: [106043.123532] ---[ end trace 1d4cbcac7f24c177 ]---

Comment 4 Hetz Ben Hamo 2020-06-02 08:13:38 UTC
Cristian, this bug is marked as "closed" (donno why, ask the Red Hat teams), so your info will be ignored.

I would suggest for you to open a new bug.

Comment 5 Cristian Vladescu 2020-06-02 08:19:37 UTC
It might be related to KSM (Kernel Samepage Merging).
When this happened I've had 20 GiB in KSM sharing.

Comment 6 Hetz Ben Hamo 2020-06-02 08:20:56 UTC
I understand, but again, this bug was open against Fedora 29 which is dead for quite a long time. You'll need to create a new bug, no one will treat this bug.

Comment 7 Cristian Vladescu 2020-06-02 08:21:41 UTC
(In reply to Hetz Ben Hamo from comment #4)
> Cristian, this bug is marked as "closed" (donno why, ask the Red Hat teams),
> so your info will be ignored.
> 
> I would suggest for you to open a new bug.

I just wanted to confirm the issue OP had. The bug can always be reopened.
You guys to whatever you want.

Comment 8 Dr. David Alan Gilbert 2021-06-17 08:49:15 UTC
If any of you are still seeing this I'd try it on a very very recent host kernel; there's a whole bunch of AMD nesting fixes that went in in the last few months.


Note You need to log in before you can comment on or make changes to this bug.