1876123 – KVM VM crashes, auto-reboot and hangs frequently when VMWARE Workstation VM are both running (FC32)

Bug 1876123 - KVM VM crashes, auto-reboot and hangs frequently when VMWARE Workstation VM are both running (FC32)

Summary: KVM VM crashes, auto-reboot and hangs frequently when VMWARE Workstation VM a...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	36
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-05 17:32 UTC by Kappa
Modified:	2023-05-25 17:00 UTC (History)
CC List:	27 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2023-05-25 17:00:23 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
virsh dump config for the fc32 vm in comment #9 (11.39 KB, text/plain) 2020-09-06 19:50 UTC, Kappa	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1876589	0	unspecified	CLOSED	Crash utility cannot analyse a core dump image from virsh	2021-06-17 01:51:04 UTC

Description Kappa 2020-09-05 17:32:26 UTC

Description of problem:
On Fedora 32, using kernel from 5.6 to 5.8 on the host.
I am using VMware Workstation and Qemu on the host machine.
I mostly use VMware Workstation. A few days ago, I try to use some VM on Qemu (with KVM acceleration)
But many guest using KVM have hang or crash frequently.

It is found that when VMware Workstation is running with their VM guests,
then the guests in KVM would crash or freeze frequently.

These are some conditions would not crash KVM:
1) VMware workstation is stopped. Only run VM with KVM.
2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow.
3) Vmware workstation is running. For Qemu guests, they could run with KVM acceleration, but those guests
could not start docker or cri-o. If those processes are started, it would likely crash the KVM guest.

Version-Release number of selected component (if applicable):
qemu 4.2.1-1
VMware workstation 15.5.6

How reproducible:
Start some VM with both KVM and VMware workstation.

Steps to Reproduce:
Install five CENTOS 7.8 with virsh. All VM are patched with the latest rpm.
Guest VM are running kernel 3.10.0-1127.19.1.el7.x86_64.

The guest VMs are going to install Kubernetes, so there is no swap.

Actual results:
VM would crash and reboot randomly. For some cases, I haven't installed installed any Kubernetes packages. It just crash and reboot.

One of the back trace:
[30929.767857] BUG: unable to handle kernel paging request at ffffffffa9384750
[30929.767857] IP: [<ffffffffa9384750>] 0xffffffffa9384750
[30929.767857] PGD 167614067 PUD 167615063 PMD 0
[30929.767857] Oops: 0010 [#1] SMP
[30929.767857] Modules linked in: sunrpc dm_mirror dm_region_hash dm_log dm_mod i2c_i801 iTCO_wdt pcspkr iTCO_vendor_support crc32_pclmul ghash_clmulni_intel sg joydev lpc_ich aesni_intel lrw ppdev gf128mul glue_helper ablk_helper snd_hda_codec_generic cryptd snd_hda_intel snd_hda_codec snd_hda_core virtio_balloon snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore virtio_rng parport_pc parport binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom qxl ahci virtio_console virtio_blk libahci drm_kms_helper crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core
[30929.767857] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 3.10.0-1127.19.1.el7.x86_64 #1
[30929.767857] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[30929.767857] task: ffffffffb0618480 ti: ffffffffb0600000 task.ti: ffffffffb0600000
[30929.767857] RIP: 0010:[<ffffffffa9384750>]  [<ffffffffa9384750>] 0xffffffffa9384750
[30929.767857] RSP: 0018:ffff9d218623a1e0  EFLAGS: 00010082
[30929.767857] RAX: ffffffffb0187c40 RBX: ffffffffb075db40 RCX: ffff9d2186406290
[30929.767857] RDX: 0000000000000000 RSI: ffffffffb0603e28 RDI: ffffffffb0603e08
[30929.767857] RBP: ffffffffb0603eb0 R08: 0000000000000000 R09: 0000000000000001
[30929.767857] R10: 0000000000000000 R11: 00001c850bdefa40 R12: 0000000000000000
[30929.767857] R13: ffffffffb0600000 R14: ffffffffb0600000 R15: ffffffffb0600000
[30929.767857] FS:  0000000000000000(0000) GS:ffff9d2186400000(0000) knlGS:0000000000000000
[30929.767857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30929.767857] CR2: ffffffffa9384750 CR3: 000000017a3a6000 CR4: 0000000000340ff0
[30929.767857] Call Trace:
[30929.767857] Code:  Bad RIP value.
[30929.767857] RIP  [<ffffffffa9384750>] 0xffffffffa9384750
[30929.767857]  RSP <ffff9d218623a1e0>
[30929.767857] CR2: ffffffffa9384750

Expected results:
Should not crash.

Additional info:
This description was edit because I found out it's a problem with co-existing VMWARE Workstation and KVM.

Here is my questions:
1) It is suspected that the processes for Qemu may have corruption when both hypervisor are running. Why there is no problem logged in the host? Is there any mechanism that could prevent or alert user about the problem?
2) Should Qemu stop starting VM with KVM if it had detected another Hypervisor is already running?

Comment 1 Kappa 2020-09-05 17:37:12 UTC

Related to the first post #c0 (https://bugzilla.redhat.com/show_bug.cgi?id=1876123#c0)

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   740154       2.8 GB         ----
         FREE   524986         2 GB   70% of TOTAL MEM
         USED   215168     840.5 MB   29% of TOTAL MEM
       SHARED    57014     222.7 MB    7% of TOTAL MEM
      BUFFERS      517         2 MB    0% of TOTAL MEM
       CACHED   137675     537.8 MB   18% of TOTAL MEM
         SLAB    30150     117.8 MB    4% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP        0            0         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE        0            0    0% of TOTAL SWAP

 COMMIT LIMIT   370077       1.4 GB         ----
    COMMITTED    39836     155.6 MB   10% of TOTAL LIMIT

No swap is configured.
But the VM is just idle. It has 2GB free RAM but still crashed.

Comment 2 Kappa 2020-09-05 17:43:04 UTC

On some of the VM, I tried to add swap right away.

But just now, two of the VM hanged and they are consuming 200% CPU on the host.
I collected core dumps from virsh. VMs are running Centos 7.8.

1)
crash> bt
PID: 3424   TASK: ffff8db3dd9a9070  CPU: 1   COMMAND: "calico-node"
 #0 [ffff8db551507ef8] die at ffffffffadc30a68
 #1 [ffff8db551507f28] do_double_fault at ffffffffadc2d802
 #2 [ffff8db551507f50] double_fault at ffffffffae396298
    [exception RIP: do_async_page_fault+5]
    RIP: ffffffffae38cf85  RSP: ffff8db3f05df000  RFLAGS: 00010006
    RAX: ffff8db551506090  RBX: 0000000000000001  RCX: ffff8db551506290
    RDX: ffffffffadd1fae8  RSI: 0000000000000000  RDI: ffff8db3f05df008
    RBP: ffff8db3f05df0d8   R8: 0000000000030001   R9: 0000000000000000
    R10: ffffffffffffffff  R11: 000000000000b8ca  R12: ffffffffadd1fae8
    R13: fffffffffffffff8  R14: ffff8db3dd9a9070  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <DOUBLEFAULT exception stack> ---
 #3 [ffff8db3f05df000] do_async_page_fault at ffffffffae38cf85
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: ffff8db551507ef8
      process stack pointer: ffff8db3f05df008
         current stack base: ffff8db3f066c000

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  1298232         5 GB         ----
         FREE   799861       3.1 GB   61% of TOTAL MEM
         USED   498371       1.9 GB   38% of TOTAL MEM
       SHARED   111894     437.1 MB    8% of TOTAL MEM
      BUFFERS      518         2 MB    0% of TOTAL MEM
       CACHED   239668     936.2 MB   18% of TOTAL MEM
         SLAB    78199     305.5 MB    6% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  3145727        12 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE  3145727        12 GB  100% of TOTAL SWAP

 COMMIT LIMIT  3794843      14.5 GB         ----
    COMMITTED   967960       3.7 GB   25% of TOTAL LIMIT

2) another node that is consuming 200% of CPU
PID: 0      TASK: ffffffff87818480  CPU: 0   COMMAND: "swapper/0"
    [exception RIP: native_queued_spin_lock_slowpath+29]
    RIP: ffffffff86d17edd  RSP: ffff9747d1403458  RFLAGS: 00000093
    RAX: 0000000000000001  RBX: 00000000ffffffff  RCX: 0000000000000001
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: ffffffff87d038d0
    RBP: ffff9747d1403458   R8: ffffffff8766eeb8   R9: ffff9747d1403508
    R10: 000000003b9aca00  R11: 000002afc3e979c0  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000010
    CS: 0010  SS: 0018
 #0 [ffff9747d1403460] queued_spin_lock_slowpath at ffffffff8737a024
 #1 [ffff9747d1403470] _raw_spin_lock at ffffffff873886d0
 #2 [ffff9747d1403480] vprintk_emit at ffffffff86c9e8e3
 #3 [ffff9747d14034f0] vprintk_default at ffffffff86c9ef79
 #4 [ffff9747d1403500] printk at ffffffff873796d8
 #5 [ffff9747d1403560] no_context at ffffffff86c75e42
 #6 [ffff9747d14035b0] __bad_area_nosemaphore at ffffffff86c76042
 #7 [ffff9747d1403600] bad_area_nosemaphore at ffffffff86c76164
 #8 [ffff9747d1403610] __do_page_fault at ffffffff8738d750
 #9 [ffff9747d1403680] trace_do_page_fault at ffffffff8738da26
#10 [ffff9747d14036c0] do_async_page_fault at ffffffff8738cfa2
#11 [ffff9747d14036e0] async_page_fault at ffffffff873897a8
    [exception RIP: unknown or invalid address]
    RIP: 00007fefbe826960  RSP: ffff9747d1403790  RFLAGS: 00010093
    RAX: 00007fefbe826960  RBX: ffffffff87c02fc0  RCX: 000000000000001f
    RDX: 0000000000000000  RSI: ffff9747d14037b8  RDI: ffffffff8766eeb8
    RBP: ffff9747d14037f0   R8: ffffffff8766eeb8   R9: ffff9747d1403898
    R10: 000000003b9aca00  R11: 000002afc3e979c0  R12: ffffffff87c033a0
    R13: ffff9747d1403898  R14: ffffffff8766eed7  R15: ffffffff8766eeb8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#12 [ffff9747d14037f8] vscnprintf at ffffffff86f9296d
#13 [ffff9747d1403810] vprintk_emit at ffffffff86c9e91b
#14 [ffff9747d1403880] vprintk_default at ffffffff86c9ef79
#15 [ffff9747d1403890] printk at ffffffff873796d8
#16 [ffff9747d14038f0] no_context at ffffffff86c75e42
#17 [ffff9747d1403940] __bad_area_nosemaphore at ffffffff86c76042
#18 [ffff9747d1403990] bad_area_nosemaphore at ffffffff86c76164
#19 [ffff9747d14039a0] __do_page_fault at ffffffff8738d750
#20 [ffff9747d1403a10] trace_do_page_fault at ffffffff8738da26
#21 [ffff9747d1403a50] do_async_page_fault at ffffffff8738cfa2
#22 [ffff9747d1403a70] async_page_fault at ffffffff873897a8
    [exception RIP: unknown or invalid address]
    RIP: 00007fefbe826960  RSP: ffff9747d1403b20  RFLAGS: 00010093
    RAX: 00007fefbe826960  RBX: ffffffff87c02fc0  RCX: 000000000000001f
    RDX: 0000000000000000  RSI: ffff9747d1403b48  RDI: ffffffff8766eeb8
    RBP: ffff9747d1403b80   R8: ffffffff8766eeb8   R9: ffff9747d1403c28
    R10: 000000003b9aca00  R11: 000002afc3e979c0  R12: ffffffff87c033a0
    R13: ffff9747d1403c28  R14: ffffffff8766eed7  R15: ffffffff8766eeb8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#23 [ffff9747d1403b88] vscnprintf at ffffffff86f9296d
#24 [ffff9747d1403ba0] vprintk_emit at ffffffff86c9e91b
#25 [ffff9747d1403c10] vprintk_default at ffffffff86c9ef79
#26 [ffff9747d1403c20] printk at ffffffff873796d8
#27 [ffff9747d1403c80] no_context at ffffffff86c75e42
#28 [ffff9747d1403cd0] __bad_area_nosemaphore at ffffffff86c76042
#29 [ffff9747d1403d20] bad_area_nosemaphore at ffffffff86c76164
#30 [ffff9747d1403d30] __do_page_fault at ffffffff8738d750
#31 [ffff9747d1403da0] trace_do_page_fault at ffffffff8738da26
#32 [ffff9747d1403de0] do_async_page_fault at ffffffff8738cfa2
#33 [ffff9747d1403e00] async_page_fault at ffffffff873897a8
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff9747d1403eb0  RFLAGS: 00010002
    RAX: 0000000000000000  RBX: ffff97466ecfda34  RCX: 0000000000000000
    RDX: 0000000000000010  RSI: 0000000000000000  RDI: ffff97466ecfd230
    RBP: ffff9747d1403ef8   R8: 0000000000000000   R9: 00000000000009de
    R10: 000000003b9aca00  R11: 000002afc3e979c0  R12: 0000000000000000
    R13: ffff97466ecfd230  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#34 [ffff9747d1403eb0] try_to_wake_up at ffffffff86cdb617
#35 [ffff9747d1403f00] wake_up_process at ffffffff86cdb8e5
#36 [ffff9747d1403f10] hrtimer_wakeup at ffffffff86cca2f2
#37 [ffff9747d1403f20] __hrtimer_run_queues at ffffffff86ccaa8e
#38 [ffff9747d1403f78] hrtimer_interrupt at ffffffff86ccafef
#39 [ffff9747d1403fc0] local_apic_timer_interrupt at ffffffff86c5ccfb
#40 [ffff9747d1403fd8] smp_apic_timer_interrupt at ffffffff873979c3
#41 [ffff9747d1403ff0] apic_timer_interrupt at ffffffff87393efa
--- <IRQ stack> ---
#42 [ffffffff87803e00] apic_timer_interrupt at ffffffff87393efa
    RIP: ffffffff86d58cfd  RSP: 0000000000000000  RFLAGS: ffff9747d14161e0
    RAX: 7fffffffffffffff  RBX: 0000024c1723eb40  RCX: 0000024c15de15f9
    RDX: ffff9747d1411200  RSI: 7fffffffffffffff  RDI: ed63a856222aa83a
    RBP: ffff9747d1415f80   R8: ffffffff86d10ed0   R9: ffffffff87803ea0
    R10: ffffffff86cca972  R11: ffffffff87803e40  R12: 0000024c1723eb40
    R13: 0000000000000000  R14: 0000024c15d439c0  R15: ed63a856222aa83a
    ORIG_RAX: ffffffff87803ea8  CS: 24c15de15f9  SS: ffffffffffffffed
bt: WARNING: possibly bogus exception frame


crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  1298232         5 GB         ----
         FREE   572562       2.2 GB   44% of TOTAL MEM
         USED   725670       2.8 GB   55% of TOTAL MEM
       SHARED   135119     527.8 MB   10% of TOTAL MEM
      BUFFERS      518         2 MB    0% of TOTAL MEM
       CACHED   263000         1 GB   20% of TOTAL MEM
         SLAB   265527         1 GB   20% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  3145727        12 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE  3145727        12 GB  100% of TOTAL SWAP

 COMMIT LIMIT  3794843      14.5 GB         ----
    COMMITTED  1049288         4 GB   27% of TOTAL LIMIT

Comment 3 Kappa 2020-09-05 21:21:16 UTC

Using an older kernel (CENTOS 7.5) in a VM, still having crash.

Guest kernel version 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6

[ 8850.226379] BUG: unable to handle kernel paging request at ffffdc8d8184f2c0
[ 8850.226990] IP: [<ffffffffa7c47345>] __check_object_size+0x195/0x250
[ 8850.227332] PGD 1d0f85067 PUD ffffffffa8600000
[ 8850.227332] Oops: 0000 [#1] SMP
[ 8850.227332] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set i
pip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs
_wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntr
ack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrp
c dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device aesni_intel lrw gf128mul glue_helper ablk_helper snd_pcm sg iTCO_wdt joydev cryptd iTCO_vendor_support ppdev pcspkr i2c_i801 snd_timer snd virtio_rng lpc_ich soundcore
[ 8850.227332]  parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables xfs libcrc32c sr_mod cdrom qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_console virtio_blk ahci drm libahci crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_net virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core
[ 8850.227332] CPU: 1 PID: 11853 Comm: calico-node Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.9.1.el7.x86_64 #1
[ 8850.227332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 8850.227332] task: ffff8a1047afa0e0 ti: ffff8a0f213c8000 task.ti: ffff8a0f213c8000
[ 8850.227332] RIP: 0010:[<ffffffffa7c47345>]  [<ffffffffa7c47345>] __check_object_size+0x195/0x250
[ 8850.227332] RSP: 0018:ffff8a0f213cbef0  EFLAGS: 00010286
[ 8850.227332] RAX: ffffdc8d8184f2c0 RBX: ffff8a0f213cbf20 RCX: 000000000000000c
[ 8850.227332] RDX: 000075f0c0000000 RSI: ffff8a1091787000 RDI: ffff8a0fa13cbf20
[ 8850.227332] RBP: ffff8a0f213cbf10 R08: 0000080c9a4eff5c R09: 0000000000002292
[ 8850.227332] R10: 000000000000003c R11: 0000000000000212 R12: 0000000000000010
[ 8850.227332] R13: 0000000000000000 R14: ffff8a0f213cbf30 R15: 0000000000000000
[ 8850.227332] FS:  00007f9ade33b700(0000) GS:ffff8a1091500000(0000) knlGS:0000000000000000
[ 8850.227332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8850.227332] CR2: ffffdc8d8184f2c0 CR3: 000000007656e000 CR4: 0000000000340fe0
[ 8850.227332] Call Trace:
[ 8850.227332]  [<ffffffffa7acb195>] SyS_nanosleep+0x35/0xb0
[ 8850.227332]  [<ffffffffa818dede>] system_call_fastpath+0x25/0x2a
[ 8850.227332] Code: 8b 15 f0 0c 9d 00 48 01 d8 72 0e 48 c7 c2 00 00 00 80 48 2b 15 8d e2 9f 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6b e2 9f 00 <48> 8b 10 80 e6 80 0f 85 95 00 00 00 48 89 c2 48 8b 02 a8 80 74
[ 8850.227332] RIP  [<ffffffffa7c47345>] __check_object_size+0x195/0x250
[ 8850.227332]  RSP <ffff8a0f213cbef0>
[ 8850.227332] CR2: ffffdc8d8184f2c0

Comment 4 Kappa 2020-09-05 21:37:21 UTC

Tried to use a kernel 4.4 from ELREPO.

Guest VM kernel version 4.4.235-1.el7.elrepo.x86_64

Guest crashed but having another error.
There is no debug kernel provided so I can't get a back trace for this one.

[ 9205.717160] nscd[870]: segfault at 0 ip 00007fae3fb77eb3 sp 00007ffe4e8b17a0 error 6 in libc-2.17.so[7fae3fa79000+1c3000]
[ 9205.719072] general protection fault: 0000 [#1] SMP
[ 9205.719847] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd input_leds pcspkr i2c_i801 snd_timer sg snd lpc_ich virtio_rng mfd_core virtio_balloon joydev soundcore 8250_fintek shpchp binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net crc32c_intel ahci libahci serio_raw libata qxl drm_kms_helper syscopyarea sysfillrect sysimgblt
[ 9205.720015]  fb_sys_fops ttm virtio_pci virtio_ring virtio drm
[ 9205.720015] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.4.235-1.el7.elrepo.x86_64 #1
[ 9205.720015] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 9205.720015] Workqueue: events qxl_fb_work [qxl]
[ 9205.720015] task: ffff88018094c2c0 ti: ffff8801809dc000 task.ti: ffff8801809dc000
[ 9205.720015] RIP: 0010:[<ffffffff811f4d3b>]  [<ffffffff811f4d3b>] kmem_cache_alloc+0x7b/0x1e0
[ 9205.720015] RSP: 0018:ffff8801809dfa58  EFLAGS: 00010286
[ 9205.720015] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 000000000001cf81
[ 9205.720015] RDX: 000000000001cf80 RSI: 00000000024000c0 RDI: 0000000000019fa0
[ 9205.720015] RBP: ffff8801809dfa88 R08: ffff880186419fa0 R09: ffff88017f088400
[ 9205.720015] R10: ffff88017d9bc468 R11: ffff8801809dfa80 R12: ffff7228e8de8948
[ 9205.720015] R13: 00000000024000c0 R14: ffff880181885d00 R15: ffff880181885d00
[ 9205.720015] FS:  0000000000000000(0000) GS:ffff880186400000(0000) knlGS:0000000000000000
[ 9205.720015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9205.720015] CR2: 0000000000000000 CR3: 000000017f48f000 CR4: 00000000003406f0
[ 9205.720015] Stack:
[ 9205.720015]  ffffffff81231315 ffff8801809dfae8 ffff880181815800 ffff880181815800
[ 9205.720015]  0000000000000000 ffff88017f088400 ffff8801809dfab0 ffffffff81231315
[ 9205.720015]  0000000000200000 ffff880181815800 0000000000021000 ffff8801809dfac0
[ 9205.720015] Call Trace:
[ 9205.720015]  [<ffffffff81231315>] ? __d_alloc+0x25/0x180
[ 9205.720015]  [<ffffffff81231315>] __d_alloc+0x25/0x180
[ 9205.720015]  [<ffffffff8123155e>] d_alloc_pseudo+0xe/0x10
[ 9205.720015]  [<ffffffff811b0420>] __shmem_file_setup.part.0+0x70/0x210
[ 9205.720015]  [<ffffffff811b07cb>] shmem_file_setup+0x2b/0x30
[ 9205.720015]  [<ffffffffa00083fb>] drm_gem_object_init+0x2b/0x40 [drm]
[ 9205.720015]  [<ffffffffa0062a8c>] qxl_bo_create+0x7c/0x190 [qxl]
[ 9205.720015]  [<ffffffffa006760e>] ? qxl_release_list_add+0x5e/0xa0 [qxl]
[ 9205.720015]  [<ffffffffa0064024>] qxl_alloc_bo_reserved+0x44/0xb0 [qxl]
[ 9205.720015]  [<ffffffffa0064ecc>] qxl_image_alloc_objects+0xac/0x140 [qxl]
[ 9205.720015]  [<ffffffffa006552d>] qxl_draw_opaque_fb+0xbd/0x3e0 [qxl]
[ 9205.720015]  [<ffffffff810b96d2>] ? account_entity_dequeue+0xb2/0xd0
[ 9205.720015]  [<ffffffffa0061d31>] qxl_fb_dirty_flush+0x181/0x230 [qxl]
[ 9205.720015]  [<ffffffff8172f820>] ? __schedule+0x270/0x840
[ 9205.720015]  [<ffffffffa0061df9>] qxl_fb_work+0x19/0x20 [qxl]
[ 9205.720015]  [<ffffffff8109cde4>] process_one_work+0x194/0x470
[ 9205.720015]  [<ffffffff8109d219>] worker_thread+0x159/0x510
[ 9205.720015]  [<ffffffff8109d0c0>] ? process_one_work+0x470/0x470
[ 9205.720015]  [<ffffffff810a3679>] kthread+0xd9/0xf0
[ 9205.720015]  [<ffffffff8172f84d>] ? __schedule+0x29d/0x840
[ 9205.720015]  [<ffffffff810a35a0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 9205.720015]  [<ffffffff817348e2>] ret_from_fork+0x42/0x80
[ 9205.720015]  [<ffffffff810a35a0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 9205.720015] Code: 08 65 4c 03 05 17 84 e1 7e 49 83 78 10 00 4d 8b 20 0f 84 28 01 00 00 4d 85 e4 0f 84 1f 01 00 00 49 63 46 20 49 8b 3e 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
[ 9205.720015] RIP  [<ffffffff811f4d3b>] kmem_cache_alloc+0x7b/0x1e0
[ 9205.720015]  RSP <ffff8801809dfa58>

Comment 5 Kappa 2020-09-05 22:41:30 UTC

Got another crash with different backtrace.

[  267.404272] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[  267.405200] IP: [<ffffffffad2db801>] try_to_wake_up+0x2c1/0x390
[  267.405200] PGD 6cd8c067 PUD 6cd92067 PMD 0
[  267.405200] Oops: 0000 [#1] SMP
[  267.405200] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support ppdev snd_pcm sg pcspkr virtio_rng joydev snd_timer snd soundcore lpc_ich i2c_i801 i6300esb parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables sr_mod cdrom
[  267.405200]  xfs libcrc32c qxl drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ttm serio_raw libata drm virtio_net virtio_blk virtio_console net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core
[  267.405200] CPU: 0 PID: 4008 Comm: kube-apiserver Kdump: loaded Tainted: G               ------------ T 3.10.0-1127.19.1.el7.x86_64 #1
[  267.405200] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[  267.405200] task: ffff94460a32c1c0 ti: ffff944608778000 task.ti: ffff944608778000
[  267.405200] RIP: 0010:[<ffffffffad2db801>]  [<ffffffffad2db801>] try_to_wake_up+0x2c1/0x390
[  267.405200] RSP: 0018:ffff94460877bd60  EFLAGS: 00010002
[  267.405200] RAX: ffff9445bd43c1c0 RBX: ffff9444b5d2eaa4 RCX: 0000000000000018
[  267.405200] RDX: ffff94461151b6e0 RSI: ffff9444b5d2e2c0 RDI: ffff9444b5d2e2c0
[  267.404228] ------------[ cut here ]------------
[  267.404228] WARNING: CPU: 1 PID: 0 at include/linux/uaccess.h:17 is_prefetch.isra.23+0x2a9/0x2e0
[  267.404228] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support ppdev snd_pcm sg pcspkr virtio_rng joydev snd_timer snd soundcore lpc_ich i2c_i801 i6300esb parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables sr_mod cdrom
[  267.404228]  xfs libcrc32c qxl drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ttm serio_raw libata drm virtio_net virtio_blk virtio_console net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core
[  267.404228] CPU: 1 PID: 0 Comm: \x80\x97\x98\xad\xff\xff\xff\xff\x10 Kdump: loaded Tainted: G               ------------ T 3.10.0-1127.19.1.el7.x86_64 #1
[  267.404228] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[  267.404228] Call Trace:
[  267.404228] ---[ end trace f55d3283bccbb6f4 ]---

Comment 6 Kappa 2020-09-06 08:48:22 UTC

One more crash in the VM

[ 5570.272849] BUG: unable to handle kernel paging request at 000000014f07b025
[ 5570.272849] IP: [<ffffffff83276cc7>] search_extable+0x27/0x50
[ 5570.272849] PGD 17b3b0067 PUD 0
[ 5570.272849] Oops: 0000 [#1] SMP
[ 5570.272849] Modules linked in: ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_comment xt_mark xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_
hda_core snd_hwdep snd_seq snd_seq_device snd_pcm sg iTCO_wdt snd_timer iTCO_vendor_support pcspkr ppdev joydev snd soundcore i2c_i801 lpc_ich i6300esb virtio_rng virtio_balloon parport_pc parport binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom qxl drm_kms_helper syscopyarea
[ 5570.272849]  sysfillrect sysimgblt fb_sys_fops ttm virtio_blk ahci libahci virtio_console drm virtio_net net_failover libata failover serio_raw drm_panel_orientation_quirks virtio_pci virtio_ring virtio ptp_kvm ptp pps_core
[ 5570.272849] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G               ------------ T 3.10.0-1127.19.1.el7.x86_64 #1
[ 5570.272849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 5570.272849] task: ffffffff83e18480 ti: ffffffff83e00000 task.ti: ffffffff83e00000
[ 5570.272849] RIP: 0010:[<ffffffff83276cc7>]  [<ffffffff83276cc7>] search_extable+0x27/0x50
[ 5570.272849] RSP: 0018:ffff8e7e059fc808  EFLAGS: 00010056
[ 5570.272849] RAX: 000000014f07b025 RBX: ffffffffc06fc100 RCX: ffffffff8327ef01
[ 5570.272849] RDX: ffffffff83276cc7 RSI: 000000014f07b025 RDI: 000000014f07b025
[ 5570.272849] RBP: ffff8e7e059fc808 R08: 0000000000030001 R09: 00000000000015c3
[ 5570.272849] R10: 000000003b9aca00 R11: 000005749bbfe840 R12: ffffffff83276cc7
[ 5570.272849] R13: 000000014f07b025 R14: ffffffff83e18480 R15: 0000000000000000
[ 5570.272849] FS:  0000000000000000(0000) GS:ffff8e7e06400000(0000) knlGS:0000000000000000
[ 5570.272849] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5570.272849] CR2: 000000014f07b025 CR3: 000000017b706000 CR4: 00000000000007f0
[ 5570.272849] Call Trace:
[ 5570.272849] Code: c3 0f 1f 00 66 66 66 66 90 55 48 39 f7 48 89 e5 76 0b eb 2d 48 8d 78 08 48 39 fe 72 24 48 89 f0 48 29 f8 48 c1 f8 04 48 8d 04 c7 <48> 63 08 48 01 c1 48 39 ca 77 de 73 0b 48 8d 70 f8 48 39 fe 73
[ 5570.272849] RIP  [<ffffffff83276cc7>] search_extable+0x27/0x50
[ 5570.272849]  RSP <ffff8e7e059fc808>
[ 5570.272849] CR2: 000000014f07b025

Comment 7 Kappa 2020-09-06 12:35:59 UTC

Host machine upgrade kernel to 5.7.17-200.fc32

One of the node upgraded kernel to 5.8.6 but still getting crashes:

[ 3492.933383] BUG: unable to handle page fault for address: 00007f5e545f1ca8
[ 3492.934146] #PF: supervisor read access in kernel mode
[ 3492.934678] #PF: error_code(0x0000) - not-present page
[ 3492.935232] PGD 11b3a0067 P4D 11b3a0067 PUD 0
[ 3492.935695] Oops: 0000 [#1] SMP NOPTI
[ 3492.936081] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1
[ 3492.937055] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 3492.937947] RIP: 0010:rb_insert_color+0x14/0x120
[ 3492.938442] Code: c0 75 eb 4c 89 c0 c3 45 31 c0 eb f7 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 b0 00 00 00 48 8b 10 f6 c2 01 75 5b <48> 8b 4a 08 48 39 c1 74 53 48 85 c9 74 05 f6 01 01 74 72 48 8b 48
[ 3492.940378] RSP: 0018:ffffc90000073dc0 EFLAGS: 00010046
[ 3492.940924] RAX: ffffc90000c6fd18 RBX: ffff88818231f440 RCX: ffffc90000c6fd20
[ 3492.941483] RDX: 00007f5e545f1ca0 RSI: ffff88818231f460 RDI: ffff88818231f960
[ 3492.942139] RBP: ffffc90000073dd0 R08: ffff88818231f460 R09: 7fffffffffffffff
[ 3492.942709] R10: 0000032d40ffd98f R11: 00000000001f2e14 R12: 0000000000000000
[ 3492.943290] R13: 0000000000000001 R14: 000000000000000a R15: ffff88818231f440
[ 3492.943869] FS:  0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000
[ 3492.944518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3492.944993] CR2: 00007f5e545f1ca8 CR3: 000000011b3da000 CR4: 00000000000006e0
[ 3492.945552] Call Trace:
[ 3492.945783]  ? timerqueue_add+0x68/0xb0
[ 3492.946122]  enqueue_hrtimer+0x3d/0x90
[ 3492.946433]  hrtimer_start_range_ns+0x196/0x310
[ 3492.946808]  ? hrtimer_try_to_cancel+0x2c/0x110
[ 3492.947193]  tick_nohz_restart_sched_tick+0xa9/0xc0
[ 3492.947585]  tick_nohz_idle_exit+0xac/0xd0
[ 3492.947927]  do_idle+0x156/0x270
[ 3492.948205]  cpu_startup_entry+0x20/0x30
[ 3492.948606]  start_secondary+0x159/0x1a0
[ 3492.949040]  secondary_startup_64+0xa4/0xb0
[ 3492.949469] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle ip6_tables xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm i2c_i801 snd_timer snd soundcore iTCO_wdt iTCO_vendor_support lpc_ich sg i6300esb i2c_smbus pcspkr mfd_core input_leds virtio_rng virtio_balloon joydev qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom serio_raw virtio_blk virtio_console virtio_net net_failover failover ahci libahci libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_pci drm virtio_ring virtio ptp_kvm
[ 3492.949509]  ptp pps_core
[ 3492.960266] CR2: 00007f5e545f1ca8

This VM do not have swap configured but there is still free RAM just before it crash.

08:15:01 PM   1740616   1103972     38.81      2068    733808   2769832     97.37    469208    466536       100
08:16:01 PM   1740680   1103908     38.81      2068    733824   2767844     97.30    469220    466552       208
08:17:01 PM   1740648   1103940     38.81      2068    733860   2767844     97.30    469260    466588       204
08:18:01 PM   1741320   1103268     38.78      2068    733872   2767844     97.30    469284    466596       100
08:19:01 PM   1741096   1103492     38.79      2068    733888   2767844     97.30    469320    466612       204
08:20:01 PM   1740064   1104524     38.83      2068    734004   2770600     97.40    469344    466728       104
08:21:01 PM   1740080   1104508     38.83      2068    734044   2768100     97.31    469344    466768       200

08:21:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
08:22:02 PM   1740080   1104508     38.83      2068    734060   2768100     97.31    469520    466720       200
Average:      1753835   1090753     38.34      2068    727251   2747707     96.59    463350    463245       158

08:22:51 PM       LINUX RESTART

08:23:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
08:24:01 PM   1888068    956520     33.63      2068    679436   1751044     61.56    380352    445008       200
08:25:01 PM   1792372   1052216     36.99      2068    713184   2862964    100.65    447760    449488       100

Comment 8 Kappa 2020-09-06 14:17:57 UTC

Another crash for VM, VM running kernel 5.8.6

[ 8727.211401] BUG: unable to handle page fault for address: ffff888182317bc0
[ 8727.212058] #PF: supervisor read access in kernel mode
[ 8727.212497] #PF: error_code(0x0009) - reserved bit violation
[ 8727.212954] PGD 2c01067 P4D 2c01067 PUD 2c04067 PMD 80000001822001e3
[ 8727.213478] Thread overran stack, or stack corrupted
[ 8727.213875] Oops: 0009 [#1] SMP NOPTI
[ 8727.214182] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1
[ 8727.214873] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 8727.215585] RIP: 0010:exc_page_fault+0x1d/0x160
[ 8727.215967] Code: 70 ff cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 54 49 89 fc 53 0f 20 d0 66 66 66 90 49 89 c5 <65> 48 8b 04 25 c0 7b 01 00 48 8b 80 f8 07 00 00 0f 0d 48 78 e9 5a
[ 8727.217449] RSP: 0018:fffffe000003ad60 EFLAGS: 00010083
[ 8727.217900] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7
[ 8727.218551] RDX: 0000000000000000 RSI: 0000000000000009 RDI: fffffe000003ad98
[ 8727.219133] RBP: fffffe000003ad88 R08: 0000000000000000 R09: 0000000000000000
[ 8727.219710] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003ad98
[ 8727.220293] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000
[ 8727.220883] FS:  0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000
[ 8727.221532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8727.221998] CR2: ffff888182317bc0 CR3: 000000011a7d4000 CR4: 00000000000006e0
[ 8727.222579] Call Trace:
[ 8727.222785]  <#DF>
[ 8727.222966]  asm_exc_page_fault+0x1e/0x30
[ 8727.223300] RIP: 0010:exc_page_fault+0x1d/0x160
[ 8727.223663] Code: 70 ff cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 54 49 89 fc 53 0
f 20 d0 66 66 66 90 49 89 c5 <65> 48 8b 04 25 c0 7b 01 00 48 8b 80 f8 07 00 00 0f 0d 48 78 e9 5a
[ 8727.225238] RSP: 0018:fffffe000003ae40 EFLAGS: 00010093
[ 8727.225781] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7
[ 8727.226503] RDX: 0000000000000000 RSI: 0000000000000009 RDI: fffffe000003ae78
[ 8727.227249] RBP: fffffe000003ae68 R08: 0000000000000000 R09: 0000000000000000
[ 8727.227985] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003ae78
[ 8727.228591] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000
[ 8727.229176]  ? native_iret+0x7/0x7
[ 8727.229459]  asm_exc_page_fault+0x1e/0x30
[ 8727.229794] RIP: 0010:exc_double_fault+0x11/0x160
[ 8727.230191] Code: ea 4c 89 e6 48 c7 c7 11 5d 0d 82 e8 59 a7 6a ff eb 92 0f 1f 80 00 00 00 00 55 48 89 e5 41 56 41 5
5 49 89 f5 41 54 49 89 fc 53 <65> 48 8b 1c 25 c0 7b 01 00 0f 20 d0 66 66 66 90 49 89 c6 48 8b 87
[ 8727.232755] RSP: 0018:fffffe000003af28 EFLAGS: 00010086
[ 8727.233636] RAX: 0000000082300000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 8727.234660] RDX: 00000000ffff8881 RSI: 0000000000000000 RDI: fffffe000003af58
[ 8727.235670] RBP: fffffe000003af48 R08: 0000000000000000 R09: 0000000000000000
[ 8727.236677] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003af58
[ 8727.237687] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 8727.238685]  asm_exc_double_fault+0x1e/0x30
[ 8727.239457] RIP: 0010:error_entry+0xc/0xf0
[ 8727.240228] Code: 0c 65 48 0f b3 1c 25 96 b8 02 00 eb 05 49 0f ba ee 3f 41 0f 22 de e9 2a fd ff ff 0f 1f 00 fc 56 48 8b 74 24 08 48 89 7c 24 08 <52> 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 41 57 56
[ 8727.242711] RSP: 0018:fffffe000003a000 EFLAGS: 00010083
[ 8727.243847] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7
[ 8727.245204] RDX: 0000000000000000 RSI: ffffffff81a00ac8 RDI: fffffe000003a078
[ 8727.246312] RBP: fffffe000003a068 R08: 0000000000000000 R09: 0000000000000000
[ 8727.247332] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003a078
[ 8727.248335] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000
[ 8727.249354]  ? native_iret+0x7/0x7
[ 8727.250066]  ? asm_exc_page_fault+0x8/0x30
[ 8727.250834]  </#DF>
[ 8727.251448] WARNING: stack recursion on stack type 5
[ 8727.251448] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio iTCO_wdt iTCO_vendor_support snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr input_leds sg snd_timer snd soundcore i2c_i801 lpc_ich mfd_core i2c_smbus virtio_rng i6300esb virtio_balloon joydev qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console ahci virtio_net net_failover failover libahci serio_raw libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_pci virtio_ring virtio ptp_kvm ptp
[ 8727.251496]  pps_core
[ 8727.264222] CR2: ffff888182317bc0

Comment 9 Kappa 2020-09-06 16:30:47 UTC

Another crash.
I had add a VM running Fedora 32, kernel 5.8.4-200.fc32.x86_64.
I have no idea about which part is causing the problem :(

[ 2263.148807] BUG: kernel NULL pointer dereference, address: 000000000000007d
[ 2263.149716] #PF: supervisor instruction fetch in kernel mode
[ 2263.150648] #PF: error_code(0x0010) - not-present page
[ 2263.151302] PGD 597ef067 P4D 597ef067 PUD 597ee067 PMD 0
[ 2263.151989] Oops: 0010 [#1] SMP NOPTI
[ 2263.152456] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1
[ 2263.153241] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 2263.153945] RIP: 0010:0x7d
[ 2263.154214] Code: Bad RIP value.
[ 2263.154549] RSP: 0018:ffffbbf6000a8f18 EFLAGS: 00010002
[ 2263.155094] RAX: 0000000000000000 RBX: ffff9bfc84b1d580 RCX: 0000000000000007
[ 2263.155824] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffbbf600c1bcc0
[ 2263.156528] RBP: ffff9bfc84b1d540 R08: 0000000000000191 R09: 0000000000000078
[ 2263.157170] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000007d
[ 2263.157778] R13: ffff9bfc84b1d540 R14: ffff9bfc84b1d638 R15: ffffbbf600c1bcc0
[ 2263.158429] FS:  0000000000000000(0000) GS:ffff9bfc84b00000(0000) knlGS:0000000000000000
[ 2263.159259] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2263.159865] CR2: 000000000000007d CR3: 00000000597e2000 CR4: 0000000000340ee0
[ 2263.160613] Call Trace:
[ 2263.160856]  <IRQ>
[ 2263.161063]  ? __hrtimer_run_queues+0x118/0x280
[ 2263.161547]  ? hrtimer_interrupt+0x10e/0x280
[ 2263.162001]  ? __sysvec_apic_timer_interrupt+0x61/0x100
[ 2263.162549]  ? asm_call_on_stack+0x12/0x20
[ 2263.162980]  </IRQ>
[ 2263.163210]  ? sysvec_apic_timer_interrupt+0x6f/0x90
[ 2263.163616]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2263.164076]  ? __sched_text_end+0x3/0x3
[ 2263.164472]  ? native_safe_halt+0xe/0x10
[ 2263.164810]  ? default_idle+0x1a/0x140
[ 2263.165150]  ? do_idle+0x1f3/0x2a0
[ 2263.165453]  ? cpu_startup_entry+0x19/0x20
[ 2263.165791]  ? start_secondary+0x144/0x170
[ 2263.166143]  ? secondary_startup_64+0xb6/0xc0
[ 2263.166542] Modules linked in: xt_multiport xt_set iptable_filter ipt_rpfilter iptable_mangle iptable_nat iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill iTCO_wdt intel_pmc_bxt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl i2c_i801 joydev i2c_smbus snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core virtio_balloon snd_hwdep pvpanic lpc_ich i6300esb snd_pcm snd_timer snd soundcore sunrpc br_netfilter bridge stp llc overlay ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper cec drm crc32c_intel serio_raw virtio_blk virtio_console xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse
[ 2263.173371] CR2: 000000000000007d

Comment 10 Kappa 2020-09-06 19:50:52 UTC

Created attachment 1713892 [details]
virsh dump config for the fc32 vm in comment #9

virsh dumpxml config for the fc32 vm in comment #9

Other VM (CENTOS 7.8) use similar setting

Comment 11 Kappa 2020-09-06 20:05:32 UTC

Similar to comment #8

These two VM do not have swap configured, they are different VM

1)
[10499.651778] BUG: unable to handle page fault for address: ffffffff91c00ac0
[10499.652391] #PF: supervisor instruction fetch in kernel mode
[10499.652846] #PF: error_code(0x0010) - not-present page
[10499.653256] PGD 240e067 P4D 240e067 PUD 240f063 PMD 0
[10499.653665] Thread overran stack, or stack corrupted
[10499.654063] Oops: 0010 [#1] SMP NOPTI
[10499.654379] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1
[10499.655119] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[10499.655814] RIP: 0010:0xffffffff91c00ac0
[10499.656148] Code: Bad RIP value.
[10499.656431] WARNING: kernel stack frame pointer at 00000000984c746a in swapper/0:0 has bad value 0000000000000000
[10499.656432] unwind stack type:0 next_sp:0000000000000000 mask:0x20 graph_idx:0
[10499.656432] 00000000984c746a: 0000000000000000 ...
[10499.656470] BUG: kernel NULL pointer dereference, address: 0000000000000000
[10499.656471] #PF: supervisor instruction fetch in kernel mode
[10499.656471] #PF: error_code(0x0010) - not-present page
[10499.656471] PGD 0 P4D 0
[10499.656472] Thread overran stack, or stack corrupted
[10499.656472] Oops: 0010 [#2] SMP NOPTI
[10499.656472] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1
[10499.656473] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[10499.656473] RIP: 0010:0x0
[10499.656473] Code: Bad RIP value.
[10499.656473] RSP: 0018:fffffe00000097b8 EFLAGS: 00010092
[10499.656474] RAX: 0000000000001000 RBX: 0000000000000000 RCX: 0000000000000008
[10499.656474] RDX: ffffc900005d6460 RSI: ffff88817f5409c2 RDI: 0000000000aaaaaa
[10499.656474] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffc900005d6460
[10499.656474] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[10499.656474] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[10499.656475] FS:  0000000000000000(0000) GS:ffff888182200000(0000) knlGS:0000000000000000
[10499.656475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10499.656477] CR2: ffffffffffffffd6 CR3: 000000017adf0000 CR4: 00000000000006f0
[10499.656477] Call Trace:
[10499.656477]  <#DF>
[10499.656478]  </#DF>
[10499.656478] Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6_tables iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr iTCO_wdt iTCO_vendor_support snd_timer snd i6300esb soundcore virtio_balloon input_leds i2c_i801 i2c_smbus virtio_rng joydev sg lpc_ich mfd_core qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net net_failover failover ahci libahci serio_raw libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_pci
[10499.656487]  virtio_ring virtio ptp_kvm ptp pps_core
[10499.656487] CR2: 0000000000000000


2)
[ 2791.577029] BUG: unable to handle page fault for address: ffffffff91c00ac0
[ 2791.577864] #PF: supervisor instruction fetch in kernel mode
[ 2791.578548] #PF: error_code(0x0010) - not-present page
[ 2791.579092] PGD 240e067 P4D 240e067 PUD 240f063 PMD 0
[ 2791.579508] Thread overran stack, or stack corrupted
[ 2791.579922] Oops: 0010 [#1] SMP NOPTI
[ 2791.580236] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1
[ 2791.580924] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 2791.581607] RIP: 0010:0xffffffff91c00ac0
[ 2791.581926] Code: Bad RIP value.
[ 2791.582198] RSP: 0018:fffffe000003ad00 EFLAGS: 00010093
[ 2791.582614] RAX: 0000000091c00fe7 RBX: 0000000000000000 RCX: ffffffff91c00fe7
[ 2791.583247] RDX: 0000000000000000 RSI: 0000000000000010 RDI: fffffe000003a798
[ 2791.583812] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2791.584433] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2791.584998] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2791.585600] FS:  0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000
[ 2791.586254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2791.586757] CR2: ffffffff91c00a96 CR3: 00000001128e4000 CR4: 00000000000006e0
[ 2791.587344] Call Trace:
[ 2791.587549]  <#DF>
[ 2791.587722]  </#DF>
[ 2791.587922] WARNING: stack recursion on stack type 5
[ 2791.587923] Modules linked in: ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_multiport xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6_tables iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq iTCO_wdt iTCO_vendor_support snd_seq_device snd_pcm i2c_i801 pcspkr snd_timer snd soundcore i2c_smbus input_leds sg lpc_ich mfd_core i6300esb virtio_rng virtio_balloon qemu_fw_cfg joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net net_failover failover ahci libahci serio_raw libata qxl drm_ttm_helper virtio_pci ttm virtio_ring virtio drm_kms_helper syscopyarea sysfillrect
[ 2791.587956]  sysimgblt fb_sys_fops drm ptp_kvm ptp pps_core
[ 2791.595729] CR2: ffffffff91c00ac0

Comment 12 Kappa 2020-09-06 20:21:55 UTC

These crashes are from FC32 VM, VM same as comment #9.
This VM do not have swap configured.

1)
[  724.402477] unable to execute userspace code (SMEP?) (uid: 0)
[  724.403082] BUG: unable to handle page fault for address: ffffffff8ec00ac0
[  724.403678] #PF: supervisor instruction fetch in kernel mode
[  724.404126] #PF: error_code(0x0019) - reserved bit violation
[  724.404586] PGD 13fa0f067 P4D 13fa0f067 PUD 13fa10063 PMD 13ec001e1
[  724.405087] Thread overran stack, or stack corrupted
[  724.405496] Oops: 0019 [#1] SMP NOPTI
[  724.405797] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1
[  724.406481] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[  724.407204] RIP: 0010:asm_exc_page_fault+0x0/0x30
[  724.407605] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff
[  724.409157] RSP: 0018:fffffe000003af40 EFLAGS: 00010046
[  724.409588] RAX: ffffffff8eb71700 RBX: 0000000000000001 RCX: 7fffffffffffffff
[  724.410147] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff943804b1daa0
[  724.410709] RBP: 0000000000000001 R08: 000000cd42e4dffb R09: 0000000000000201
[  724.411268] R10: 000000000000036c R11: 0000000000000000 R12: 0000000000000000
[  724.411830] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  724.412392] FS:  0000000000000000(0000) GS:ffff943804b00000(0000) knlGS:0000000000000000
[  724.413031] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  724.413485] CR2: ffffffff8ec00ac0 CR3: 000000013d544000 CR4: 0000000000340ee0
[  724.414049] Call Trace:
[  724.414263]  <#DF>
[  724.414431] RIP: 0010:asm_exc_page_fault+0x0/0x30
[  724.414813] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff
[  724.416276] RSP: 0018:fffffe000003af70 EFLAGS: 00010046
[  724.416281] RIP: 0010:asm_exc_page_fault+0x0/0x30
[  724.417096] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff
[  724.418595] RSP: 0018:fffffe000003afa0 EFLAGS: 00010046
[  724.418597] RIP: 0010:asm_exc_double_fault+0x0/0x30
[  724.419397] Code: e8 55 0c f6 ff e9 00 08 00 00 0f 01 ca 6a ff e8 06 07 00 00 48 89 e7 e8 2e ff f5 ff e9 e9 07 00 00 66 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 e8 05 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff
[  724.420856] RSP: 0018:fffffe000003afd0 EFLAGS: 00010046
[  724.420857] WARNING: stack going in the wrong direction? at asm_xenpv_exc_debug+0x20/0x20
[  724.420864]  ? asm_exc_int3+0x40/0x40
[  724.422210]  </#DF>
[  724.422383] WARNING: stack recursion on stack type 5
[  724.422384] Modules linked in: ipt_rpfilter xt_multiport xt_set iptable_filter iptable_mangle iptable_nat iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_pmc_bxt sunrpc iTCO_vendor_support i2c_i801 i2c_smbus lpc_ich virtio_balloon joydev i6300esb pvpanic br_netfilter bridge stp llc overlay ip_tables xfs crc32c_intel qxl drm_ttm_helper ttm drm_kms_helper serio_raw cec virtio_console drm virtio_blk xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse
[  724.430263] CR2: ffffffff8ec00ac0

2)
[ 2294.420100] general protection fault, probably for non-canonical address 0xebaa27cde9932d16: 0000 [#1] SMP NOPTI
[ 2294.421608] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1
[ 2294.422572] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 2294.423573] RIP: 0010:__x86_indirect_thunk_rax+0x3/0x5
[ 2294.424112] Code: c0 e9 f1 dc d5 ff 31 c0 e9 f3 dc d5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f ae e8 <ff> e0 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 c3 66 2e 0f
[ 2294.425599] RSP: 0018:ffffb9c480003f68 EFLAGS: 00010202
[ 2294.426021] RAX: ebaa27cde9932d16 RBX: ffff9184fbc61198 RCX: 0000000000000004
[ 2294.426671] RDX: ffffb9c480003f70 RSI: ffff9184fbc61198 RDI: ffff9184fbc61140
[ 2294.427234] RBP: ffff9184fbc61140 R08: ffffb9c480003f70 R09: ffff918504a28290
[ 2294.427803] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9c480003f70
[ 2294.428382] R13: 0000000000000100 R14: 0000000000000004 R15: 0000000000000010
[ 2294.428978] FS:  0000000000000000(0000) GS:ffff918504a00000(0000) knlGS:0000000000000000
[ 2294.429675] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2294.430177] CR2: 00007fdc763a5000 CR3: 00000001411be000 CR4: 0000000000340ef0
[ 2294.430770] Call Trace:
[ 2294.430999]  <IRQ>
[ 2294.431175]  blk_done_softirq+0x91/0xb0
[ 2294.431503]  __do_softirq+0xd9/0x2c4
[ 2294.431793]  asm_call_on_stack+0x12/0x20
[ 2294.432132]  </IRQ>
[ 2294.432313]  do_softirq_own_stack+0x39/0x50
[ 2294.432665]  irq_exit_rcu+0xc2/0x100
[ 2294.433007]  sysvec_call_function_single+0x34/0x90
[ 2294.433411]  asm_sysvec_call_function_single+0x12/0x20
[ 2294.433822] RIP: 0010:native_safe_halt+0xe/0x10
[ 2294.434187] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 36 70 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 70 49 00 f4 c3 cc cc 0f 1f 44 00
[ 2294.435693] RSP: 0018:ffffffff95a03ea0 EFLAGS: 00000246
[ 2294.436169] RAX: ffffffff94b71700 RBX: 0000000000000000 RCX: 0000000000000001
[ 2294.436763] RDX: 0000000000000000 RSI: 0000000000000083 RDI: 0000000000000000
[ 2294.437339] RBP: 0000000000000000 R08: 000006c7fe924906 R09: 0000000000000000
[ 2294.437943] R10: 00000000000303ce R11: 0000000000000000 R12: 0000000000000000
[ 2294.438551] R13: 0000000000000000 R14: 0000000000000101 R15: 0000000000000000
[ 2294.439143]  ? __sched_text_end+0x3/0x3
[ 2294.439485]  default_idle+0x1a/0x140
[ 2294.439784]  do_idle+0x1f3/0x2a0
[ 2294.440070]  cpu_startup_entry+0x19/0x20
[ 2294.440402]  start_kernel+0x7f4/0x804
[ 2294.440739]  ? x86_family+0x5/0x20
[ 2294.441029]  secondary_startup_64+0xb6/0xc0
[ 2294.441400] Modules linked in: ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_multiport xt_set iptable_filter iptable_mangle iptable_raw iptable_nat ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 i2c_smbus lpc_ich virtio_balloon joydev i6300esb pvpanic br_netfilter bridge stp llc overlay ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper cec drm crc32c_intel serio_raw virtio_console virtio_blk xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse

3)
[ 2292.163247] BUG: unable to handle page fault for address: ffff9b02ff7be870
[ 2292.164364] BUG: unable to handle page fault for address: ffff9b02ff625f38
[ 2292.164368] #PF: supervisor read access in kernel mode
[ 2292.164369] #PF: error_code(0x0000) - not-present page
[ 2292.164369] PGD 11a801067 P4D 11a801067 PUD 144154063 PMD 13f605063
[ 2292.164370] BAD
[ 2292.164371] Oops: 0000 [#1] SMP NOPTI
[ 2292.164371] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1
[ 2292.164372] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[ 2292.164372] RIP: 0010:insert_work+0x9a/0xc0
[ 2292.164373] Code: 87 00 03 00 00 85 c0 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 57 38 49 8d 47 38 48 39 c2 74 e8 49 8b 47 38 48 85 c0 74 df <48> 8b 78 38 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c3 6f 01 00 0f 0b eb
[ 2292.164373] RSP: 0018:ffffaa00800a8a30 EFLAGS: 00010086
[ 2292.164374] RAX: ffff9b02ff625f00 RBX: ffff9b02fd9794d0 RCX: ffff9b0304b2f705
[ 2292.164375] RDX: ffff9b02ff625f00 RSI: ffff9b0304b29c60 RDI: ffff9b02fd9794d8
[ 2292.164375] RBP: ffff9b0304b2f700 R08: ffff9b0304b29c60 R09: ffff9b0304b29c60
[ 2292.164376] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b0304b29c60
[ 2292.164376] R13: ffff9b0304b29c60 R14: ffff9b02fd9794d8 R15: ffff9b0304b29c40
[ 2292.164376] FS:  0000000000000000(0000) GS:ffff9b0304b00000(0000) knlGS:0000000000000000
[ 2292.164377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2292.164377] CR2: ffff9b02ff605128 CR3: 0000000140f32000 CR4: 0000000000340ee0
[ 2292.164378] Call Trace:
[ 2292.164378]  <IRQ>
[ 2292.164378]  __queue_work+0x1e0/0x410
[ 2292.164378]  queue_work_on+0x36/0x40
[ 2292.164379]  soft_cursor+0x1a7/0x230
[ 2292.164379]  bit_cursor+0x3b4/0x5a0
[ 2292.164379]  ? cursor_timer_handler+0x1/0x50
[ 2292.164380]  ? fbcon_cursor+0xfb/0x180
[ 2292.164380]  ? bit_putcs+0x510/0x510
[ 2292.164380]  hide_cursor+0x2a/0x90
[ 2292.164381]  vt_console_print+0x3c2/0x3d0
[ 2292.164381]  console_unlock+0x39d/0x590
[ 2292.164381]  vprintk_emit+0x164/0x280
[ 2292.164382]  printk+0x48/0x4a
[ 2292.164385]  ? psi_task_change+0x91/0xc0
[ 2292.164386]  no_context.cold+0x1c/0x21b
[ 2292.164386]  ? __netif_receive_skb_list_core+0x253/0x2b0
[ 2292.164386]  exc_page_fault+0xe9/0x1a0
[ 2292.164387]  asm_exc_page_fault+0x1e/0x30
[ 2292.164387] RIP: 0010:psi_task_change+0x90/0xc0
[ 2292.164388] Code: df 48 81 c7 b0 03 00 00 74 ad 45 89 f8 44 89 e1 89 ea 44 89 f6 41 83 e0 01 e8 2c fb ff ff 48 85 db 75 bc 49 8b 85 f8 0c 00 00 <48> 8b 58 70 48 85 db 75 c1 48 c7 c3 c0 da a5 89 48 89 df eb cb 41
[ 2292.164388] RSP: 0018:ffffaa00800a8e90 EFLAGS: 00010046
[ 2292.164389] RAX: ffff9b02ff7be800 RBX: 0000000000000000 RCX: 0000000000002800
[ 2292.164389] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9b0301d626c0
[ 2292.164390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
[ 2292.164390] R10: 0000000000000000 R11: 000000000000706b R12: 0000000000000004
[ 2292.164391] R13: ffff9b0301d626c0 R14: 0000000000000001 R15: 0000000000000001
[ 2292.164391]  try_to_wake_up+0x529/0x5c0
[ 2292.164391]  ? update_load_avg+0x7a/0x610
[ 2292.164392]  ? __hrtimer_init+0xd0/0xd0
[ 2292.164392]  hrtimer_wakeup+0x1e/0x30
[ 2292.164392]  __hrtimer_run_queues+0x118/0x280
[ 2292.164393]  hrtimer_interrupt+0x10e/0x280
[ 2292.164393]  __sysvec_apic_timer_interrupt+0x61/0x100
[ 2292.164393]  asm_call_on_stack+0x12/0x20
[ 2292.164394]  </IRQ>
[ 2292.164394]  sysvec_apic_timer_interrupt+0x6f/0x90
[ 2292.164394]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2292.164395] RIP: 0010:native_safe_halt+0xe/0x10
[ 2292.164396] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 36 70 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 70 49 00 f4 c3 cc cc 0f 1f 44 00
[ 2292.164396] RSP: 0018:ffffaa0080073ed0 EFLAGS: 00000246
[ 2292.164397] RAX: ffffffff88b71700 RBX: 0000000000000001 RCX: 0000000000000001
[ 2292.164397] RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000001
[ 2292.164397] RBP: 0000000000000001 R08: ffff9b0304b1d5a0 R09: 0000000000000400
[ 2292.164398] R10: 00000000000000e4 R11: 0000000000000000 R12: 0000000000000000
[ 2292.164398] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2292.164399]  ? __sched_text_end+0x3/0x3
[ 2292.164399]  default_idle+0x1a/0x140
[ 2292.164399]  do_idle+0x1f3/0x2a0
[ 2292.164400]  cpu_startup_entry+0x19/0x20
[ 2292.164400]  start_secondary+0x144/0x170
[ 2292.164400]  secondary_startup_64+0xb6/0xc0
[ 2292.164400] Modules linked in: ipt_rpfilter xt_multiport iptable_mangle xt_set iptable_raw iptable_filter iptable_nat ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support sunrpc i2c_i801 i2c_smbus joyd
[ 2292.164410] Lost 31 message(s)!

Comment 13 Kappa 2020-09-06 20:59:14 UTC

I had add a worker node to the Kubernetes cluster (Centos OS in KVM).

This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine.
The node have swap. I have been using this node before without problem.

After I add the worker node to the Kubernetes cluster, the node (Fedora 32) crashed after a while.

[ 4233.816610] Oops: 0000 [#1] SMP NOPTI
[ 4233.816636] CPU: 1 PID: 139909 Comm: systemd-userwor Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1
[ 4233.816702] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 4233.816754] RIP: 0010:__rb_erase_color+0x99/0x240
[ 4233.816776] Code: 48 89 c5 4c 8b 65 08 49 39 d4 75 a1 4c 8b 65 10 49 8b 5c 24 08 41 f6 04 24 01 0f 84 f8 00 00 00 49 8b 44 24 10 48 85 c0 74 05 <f6> 00 01 74 3f 48 85 db 74 aa f6 03 01 75 a5 48 8b 43 10 49 89 44
[ 4233.816863] RSP: 0018:ffffbad204327d60 EFLAGS: 00010202
[ 4233.817973] RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000006
[ 4233.819110] RDX: ffff8f0592bcf188 RSI: ffff8f05b5852480 RDI: ffff8f0592bcf188
[ 4233.820190] RBP: ffff8f059661ea80 R08: 0000000000000000 R09: 0000000000000000
[ 4233.821266] R10: ffff8f05b3a2df40 R11: ffff8f059acf7700 R12: ffff8f0594d2fae8
[ 4233.822334] R13: ffffffffb5291710 R14: ffff8f05b5852480 R15: 0000000000000000
[ 4233.823232] FS:  0000000000000000(0000) GS:ffff8f05b9e40000(0000) knlGS:0000000000000000
[ 4233.823753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4233.824266] CR2: 0000000000000010 CR3: 0000000135698000 CR4: 00000000003406e0
[ 4233.824851] Call Trace:
[ 4233.825341]  unlink_file_vma+0x3d/0x60
[ 4233.825833]  free_pgtables+0x92/0xf0
[ 4233.826295]  exit_mmap+0xa6/0x170
[ 4233.826763]  mmput+0x61/0x140
[ 4233.827212]  do_exit+0x2fc/0xaf0
[ 4233.827724]  ? syscall_trace_enter+0x14a/0x290
[ 4233.828174]  do_group_exit+0x33/0xa0
[ 4233.828617]  __x64_sys_exit_group+0x14/0x20
[ 4233.829125]  do_syscall_64+0x52/0x90
[ 4233.829561]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4233.830032] RIP: 0033:0x7f31eaf373c1
[ 4233.830455] Code: Bad RIP value.
[ 4233.830874] RSP: 002b:00007fffbc769e18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 4233.831279] RAX: ffffffffffffffda RBX: 00007f31eb02e470 RCX: 00007f31eaf373c1
[ 4233.831696] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[ 4233.832089] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000000
[ 4233.832493] R10: 00007f31eabb284e R11: 0000000000000246 R12: 00007f31eb02e470
[ 4233.832899] R13: 0000000000000002 R14: 00007f31eb02e948 R15: 0000000000000000
[ 4233.833286] Modules linked in: ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink nfnetlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_comment xt_mark rfkill vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc kvm_amd ccp pktcdvd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl vmw_balloon pcspkr joydev i2c_piix4 vmw_vmci br_netfilter bridge stp binfmt_misc llc overlay ip_tables raid1 vmwgfx drm_kms_helper cec ttm drm mptsas crc32c_intel scsi_transport_sas mptscsih serio_raw mptbase vmxnet3 ata_generic pata_acpi target_core_mod fuse vhost_net tun tap vhost vhost_iotlb [last unloaded: cfg80211]
[ 4233.837381] CR2: 0000000000000010

Comment 14 Kappa 2020-09-07 06:41:14 UTC

New finding

When the VM hangs but not crash, it consume all the CPU (200% if two CPU is assigned).
When it happens, I could get a crash dump by virsh dump.
But the output of the dump could not be analyse by the crash utility.

gdb ../../vmlinux-5.8.4 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [1699MB]: patching 126940 gdb minimal_symbol values

<readmem: ffffffff18083cc8, KVADDR, "__pgtable_l5_enabled", 4, (FOE|Q), 7ffceb752218>
<read_kdump: addr: ffffffff18083cc8 paddr: 77ff18083cc8 cnt: 4>
crash: read error: kernel virtual address: ffffffff18083cc8  type: "__pgtable_l5_enabled"

The behavior is the same for the VM running KVM or vmware workstation.

Comment 15 Kappa 2020-09-07 16:26:42 UTC

Sometimes the guest VM hangs and consume lots of CPU instead of crash and reboot.
Those problem is problem of Qemu/KVM or the crash utility? (case reported with https://bugzilla.redhat.com/show_bug.cgi?id=1876589)

Comment 16 Kappa 2020-09-07 21:24:47 UTC

All previous testing are using KVM acceleration.
With a VM that crashes frequently, I change it to use Qemu+TCG.
It could run without problem for two hours. But that VM is very slow.

Comment 17 Kappa 2020-09-07 21:52:41 UTC

When a VM freeze (not crashing), it consume 100% VCPU of the host.
Here's a pstack on the qemu process.

# pstack 31258
Thread 5 (Thread 0x7f5e66dff700 (LWP 31719)):
#0  0x00007f5fcb549aaf in poll () from target:/lib64/libc.so.6
#1  0x00007f5fcc69cace in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0
#2  0x00007f5fcc69ce53 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x00007f5fcbf612db in red_worker_main () from target:/lib64/libspice-server.so.1
#4  0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0
#5  0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6
Thread 4 (Thread 0x7f5fc61ff700 (LWP 31276)):
#0  0x00007f5fcb54b3bb in ioctl () from target:/lib64/libc.so.6
#1  0x000055b8588b9519 in kvm_vcpu_ioctl ()
#2  0x000055b8588b95d9 in kvm_cpu_exec ()
#3  0x000055b85889daac in qemu_kvm_cpu_thread_fn ()
#4  0x000055b858c95683 in qemu_thread_start ()
#5  0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6
Thread 3 (Thread 0x7f5fc6f8e700 (LWP 31275)):
#0  0x00007f5fcb549aaf in poll () from target:/lib64/libc.so.6
#1  0x00007f5fcc69cace in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0
#2  0x00007f5fcc69ce53 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#3  0x000055b8589c0fb1 in iothread_run ()
#4  0x000055b858c95683 in qemu_thread_start ()
#5  0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0
#6  0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6
Thread 2 (Thread 0x7f5fc93ff700 (LWP 31267)):
#0  0x00007f5fcb54f37d in syscall () from target:/lib64/libc.so.6
#1  0x000055b858c95fd2 in qemu_event_wait ()
#2  0x000055b858ca88c2 in call_rcu_thread ()
#3  0x000055b858c95683 in qemu_thread_start ()
#4  0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0
#5  0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6
Thread 1 (Thread 0x7f5fc9c84700 (LWP 31258)):
#0  0x00007f5fcb549bae in ppoll () from target:/lib64/libc.so.6
#1  0x000055b858c91255 in qemu_poll_ns ()
#2  0x000055b858c92615 in main_loop_wait ()
#3  0x000055b8589c72af in main_loop ()
#4  0x000055b85884e79c in main ()

Comment 18 Kappa 2020-09-08 07:22:22 UTC

These are two conditions would not crash KVM:
1) VMware workstation is stopped. Only run with KVM.
2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow.

Here is my questions:
1) It is suspected that the processes for Qemu may have corruption when both hypervisor are running. Why there is no problem logged in the host? Is there any mechanism that could prevent or alert user about the problem?
2) Should Qemu stop starting VM with KVM if it had detected another Hypervisor is already running?

Comment 19 Kappa 2020-09-08 23:05:57 UTC

hese are some conditions would not crash KVM:
1) VMware workstation is stopped. Only run VM with KVM.
2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow.
3) Vmware workstation is running. For Qemu guests, they could run with KVM acceleration, but those guests
could not start docker or cri-o. If those processes are started, it would likely crash the KVM guest.

Note the third condition, it should be related to hardware virtualization used by docker and cri-o.
While Vmware workstaion is running, I could use KVM to install some guest OS without problem. Once I start docker or cri-o daemon inside a KVM guest, those VMs start crashing.

Comment 20 Richard W.M. Jones 2020-09-09 09:23:37 UTC

Assuming the host is VMware and therefore the KVM host is L1, it's likely
using nested KVM but VMware isn't emulating/implementing nested virt correctly.
In any case there's not very much we can do to deal with closed source software.

Comment 21 Kappa 2020-09-09 12:56:56 UTC

The host is Fedora 32 Linux. So there are two Type 2 hypervisors:

1) VMware workstation (it's not ESX. For ESX it needs to be installed on the host)
2) Qemu with KVM acceleration by loading Linux kernel module

The strange thing is that it seems it is able to start or run VM with both hypervisors at the same time.
But if there are container applications (Docker or cri-o) running in guests using KVM and VMware workstation at the same time, the VM using KVM could easily crash.
For VMware workstation guests, it only need to run a plain OS inside the VM, not necessary running Docker or cri-o) to trigger crashing the VM inside KVM.

Comment 22 Richard W.M. Jones 2020-09-09 13:08:44 UTC

> This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine.

I'm having a really hard time understanding what the configuration is.
"VMware workstation" is some kind of proprietary product.  Does it
load proprietary kernel modules?  If so, then this is immediately NOTABUG -
go and ask VMware for help.  Does the host (the baremetal bit) run Fedora?
What version of Fedora?  What guests are you running?  What is running
in the guests?

Comment 23 Kappa 2020-09-09 13:58:52 UTC

(In reply to Richard W.M. Jones from comment #22)
> > This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine.
> 
> I'm having a really hard time understanding what the configuration is.
> "VMware workstation" is some kind of proprietary product.  Does it
> load proprietary kernel modules?  If so, then this is immediately NOTABUG -
> go and ask VMware for help.  Does the host (the baremetal bit) run Fedora?
> What version of Fedora?  What guests are you running?  What is running
> in the guests?

'This node' refers to a Kubernetes worker node. That node/VM runs under VMware workstation.

There is only one physical machine (bare metal), it's Fedora Linux versino 32.
There are two Type 2 hypervisors on it. One of them is Vmware workstation and the other one is KVM.
VMware workstation loads kernel modules. Those kernel modules are GPL, not proprietary.

Here's the related modules:

$ lsmod |grep vm | sort
ccp                   106496  1 kvm_amd
irqbypass              16384  19 kvm
kvm                   823296  61 kvm_amd
kvm_amd               114688  8
vmmon                 131072  8
vmnet                  65536  63
vmw_vmci               90112  9 vmw_vsock_vmci_transport
vmw_vsock_vmci_transport    32768  0
vsock                  49152  1 vmw_vsock_vmci_transport

vmw_vmci, vm_vsock_vmci_transport and vsock are provided by Fedora 32 host OS.
vmmon and vmnet are provided by VMware. For VMware workstation 15.5, we need to patch these modules for Kernel 5.7/5.8.
I use the module/patch from github :
https://github.com/mkubecek/vmware-host-modules/tree/player-15.5.6/vmmon-only
https://github.com/mkubecek/vmware-host-modules/tree/player-15.5.6/vmnet-only

I had performed further test, with VMware workstation running guests.
Any guest using KVM would crash randomly (even a plain OS without docker/cri-o/K8S).
I run guest KVM like Centos 7.8 and Fedora 32.

Comment 24 Daniel Berrangé 2020-09-09 15:07:10 UTC

Re-assigning to the kernel, since all signs point towards bad interaction between the vmware kernel modules and the main kernel. Debugging vmware kmods is outside scope of Fedora though really, as whether they're open source or not, they are still out of tre kmods.

Comment 25 Vitaly Kuznetsov 2020-09-10 07:34:29 UTC

Running two different hypervisors at the same time is a really, really bad idea. As
one of these hypervisors is proprietary, Vmware is likely the only once who can
make this work -- if they want to, of course. I don't see what can be done in
Fedora/Upstream.

Comment 26 Kappa 2020-09-10 09:42:26 UTC

Checking out some old posts ten years ago about KVM and VMware workstation :)
(I am using AMD CPU)
https://communities.vmware.com/thread/188067

May be these two hypervisors are not ready to run concurrently.

Comment 27 Richard W.M. Jones 2020-09-10 10:28:20 UTC

The best way forward here is to work with VMware and the community
to get the vmmon and vmnet modules upstream.  (Apparently other
VMware-related modules have gone upstream).  When everything is
upstream it should be possible to fix the kernel either so it
doesn't let you load kvm.ko and vmmon at the same time, or even
better so the hypervisor state can be shared in some way.

Until that time there's not much Fedora can do about this, so
I am closing this bug.

Comment 28 Kappa 2021-08-02 18:11:05 UTC

Greetings,

I am using Fedora 34 and kernel 5.13.4-200.fc34.x86_64.

Since VMware Workstation 16.1.2 does not support kernel 5.13.4, I used the patching module from

https://github.com/mkubecek/vmware-host-modules/tree/workstation-16.1.2

Now I had been running with VMware workstation and KVM without crashing problem for about two weeks. So some of the patches that could fix the problem. Thanks.

Comment 29 Ben Cotton 2022-05-12 15:37:43 UTC

This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 30 Ben Cotton 2023-04-25 16:40:36 UTC

This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 31 Ludek Smid 2023-05-25 17:00:23 UTC

Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
airlied
berrange
bskeggs
cfergeau
hdegoede
ichavero
itamar
jarodwilson
jeremy
jglisse
john.j5live
jonathan
josef
kernel-maint
lgoncalv
linville
masami256
mchehab
mjg59
ondrejj
pbonzini
philmd
rjones
steved
virt-maint
vkuznets