Description of problem: On Fedora 32, using kernel from 5.6 to 5.8 on the host. I am using VMware Workstation and Qemu on the host machine. I mostly use VMware Workstation. A few days ago, I try to use some VM on Qemu (with KVM acceleration) But many guest using KVM have hang or crash frequently. It is found that when VMware Workstation is running with their VM guests, then the guests in KVM would crash or freeze frequently. These are some conditions would not crash KVM: 1) VMware workstation is stopped. Only run VM with KVM. 2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow. 3) Vmware workstation is running. For Qemu guests, they could run with KVM acceleration, but those guests could not start docker or cri-o. If those processes are started, it would likely crash the KVM guest. Version-Release number of selected component (if applicable): qemu 4.2.1-1 VMware workstation 15.5.6 How reproducible: Start some VM with both KVM and VMware workstation. Steps to Reproduce: Install five CENTOS 7.8 with virsh. All VM are patched with the latest rpm. Guest VM are running kernel 3.10.0-1127.19.1.el7.x86_64. The guest VMs are going to install Kubernetes, so there is no swap. Actual results: VM would crash and reboot randomly. For some cases, I haven't installed installed any Kubernetes packages. It just crash and reboot. One of the back trace: [30929.767857] BUG: unable to handle kernel paging request at ffffffffa9384750 [30929.767857] IP: [<ffffffffa9384750>] 0xffffffffa9384750 [30929.767857] PGD 167614067 PUD 167615063 PMD 0 [30929.767857] Oops: 0010 [#1] SMP [30929.767857] Modules linked in: sunrpc dm_mirror dm_region_hash dm_log dm_mod i2c_i801 iTCO_wdt pcspkr iTCO_vendor_support crc32_pclmul ghash_clmulni_intel sg joydev lpc_ich aesni_intel lrw ppdev gf128mul glue_helper ablk_helper snd_hda_codec_generic cryptd snd_hda_intel snd_hda_codec snd_hda_core virtio_balloon snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore virtio_rng parport_pc parport binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom qxl ahci virtio_console virtio_blk libahci drm_kms_helper crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core [30929.767857] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 3.10.0-1127.19.1.el7.x86_64 #1 [30929.767857] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [30929.767857] task: ffffffffb0618480 ti: ffffffffb0600000 task.ti: ffffffffb0600000 [30929.767857] RIP: 0010:[<ffffffffa9384750>] [<ffffffffa9384750>] 0xffffffffa9384750 [30929.767857] RSP: 0018:ffff9d218623a1e0 EFLAGS: 00010082 [30929.767857] RAX: ffffffffb0187c40 RBX: ffffffffb075db40 RCX: ffff9d2186406290 [30929.767857] RDX: 0000000000000000 RSI: ffffffffb0603e28 RDI: ffffffffb0603e08 [30929.767857] RBP: ffffffffb0603eb0 R08: 0000000000000000 R09: 0000000000000001 [30929.767857] R10: 0000000000000000 R11: 00001c850bdefa40 R12: 0000000000000000 [30929.767857] R13: ffffffffb0600000 R14: ffffffffb0600000 R15: ffffffffb0600000 [30929.767857] FS: 0000000000000000(0000) GS:ffff9d2186400000(0000) knlGS:0000000000000000 [30929.767857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30929.767857] CR2: ffffffffa9384750 CR3: 000000017a3a6000 CR4: 0000000000340ff0 [30929.767857] Call Trace: [30929.767857] Code: Bad RIP value. [30929.767857] RIP [<ffffffffa9384750>] 0xffffffffa9384750 [30929.767857] RSP <ffff9d218623a1e0> [30929.767857] CR2: ffffffffa9384750 Expected results: Should not crash. Additional info: This description was edit because I found out it's a problem with co-existing VMWARE Workstation and KVM. Here is my questions: 1) It is suspected that the processes for Qemu may have corruption when both hypervisor are running. Why there is no problem logged in the host? Is there any mechanism that could prevent or alert user about the problem? 2) Should Qemu stop starting VM with KVM if it had detected another Hypervisor is already running?
Related to the first post #c0 (https://bugzilla.redhat.com/show_bug.cgi?id=1876123#c0) crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 740154 2.8 GB ---- FREE 524986 2 GB 70% of TOTAL MEM USED 215168 840.5 MB 29% of TOTAL MEM SHARED 57014 222.7 MB 7% of TOTAL MEM BUFFERS 517 2 MB 0% of TOTAL MEM CACHED 137675 537.8 MB 18% of TOTAL MEM SLAB 30150 117.8 MB 4% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 0 0 ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 0 0 0% of TOTAL SWAP COMMIT LIMIT 370077 1.4 GB ---- COMMITTED 39836 155.6 MB 10% of TOTAL LIMIT No swap is configured. But the VM is just idle. It has 2GB free RAM but still crashed.
On some of the VM, I tried to add swap right away. But just now, two of the VM hanged and they are consuming 200% CPU on the host. I collected core dumps from virsh. VMs are running Centos 7.8. 1) crash> bt PID: 3424 TASK: ffff8db3dd9a9070 CPU: 1 COMMAND: "calico-node" #0 [ffff8db551507ef8] die at ffffffffadc30a68 #1 [ffff8db551507f28] do_double_fault at ffffffffadc2d802 #2 [ffff8db551507f50] double_fault at ffffffffae396298 [exception RIP: do_async_page_fault+5] RIP: ffffffffae38cf85 RSP: ffff8db3f05df000 RFLAGS: 00010006 RAX: ffff8db551506090 RBX: 0000000000000001 RCX: ffff8db551506290 RDX: ffffffffadd1fae8 RSI: 0000000000000000 RDI: ffff8db3f05df008 RBP: ffff8db3f05df0d8 R8: 0000000000030001 R9: 0000000000000000 R10: ffffffffffffffff R11: 000000000000b8ca R12: ffffffffadd1fae8 R13: fffffffffffffff8 R14: ffff8db3dd9a9070 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <DOUBLEFAULT exception stack> --- #3 [ffff8db3f05df000] do_async_page_fault at ffffffffae38cf85 bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff8db551507ef8 process stack pointer: ffff8db3f05df008 current stack base: ffff8db3f066c000 crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 1298232 5 GB ---- FREE 799861 3.1 GB 61% of TOTAL MEM USED 498371 1.9 GB 38% of TOTAL MEM SHARED 111894 437.1 MB 8% of TOTAL MEM BUFFERS 518 2 MB 0% of TOTAL MEM CACHED 239668 936.2 MB 18% of TOTAL MEM SLAB 78199 305.5 MB 6% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 3145727 12 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 3145727 12 GB 100% of TOTAL SWAP COMMIT LIMIT 3794843 14.5 GB ---- COMMITTED 967960 3.7 GB 25% of TOTAL LIMIT 2) another node that is consuming 200% of CPU PID: 0 TASK: ffffffff87818480 CPU: 0 COMMAND: "swapper/0" [exception RIP: native_queued_spin_lock_slowpath+29] RIP: ffffffff86d17edd RSP: ffff9747d1403458 RFLAGS: 00000093 RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff87d038d0 RBP: ffff9747d1403458 R8: ffffffff8766eeb8 R9: ffff9747d1403508 R10: 000000003b9aca00 R11: 000002afc3e979c0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010 CS: 0010 SS: 0018 #0 [ffff9747d1403460] queued_spin_lock_slowpath at ffffffff8737a024 #1 [ffff9747d1403470] _raw_spin_lock at ffffffff873886d0 #2 [ffff9747d1403480] vprintk_emit at ffffffff86c9e8e3 #3 [ffff9747d14034f0] vprintk_default at ffffffff86c9ef79 #4 [ffff9747d1403500] printk at ffffffff873796d8 #5 [ffff9747d1403560] no_context at ffffffff86c75e42 #6 [ffff9747d14035b0] __bad_area_nosemaphore at ffffffff86c76042 #7 [ffff9747d1403600] bad_area_nosemaphore at ffffffff86c76164 #8 [ffff9747d1403610] __do_page_fault at ffffffff8738d750 #9 [ffff9747d1403680] trace_do_page_fault at ffffffff8738da26 #10 [ffff9747d14036c0] do_async_page_fault at ffffffff8738cfa2 #11 [ffff9747d14036e0] async_page_fault at ffffffff873897a8 [exception RIP: unknown or invalid address] RIP: 00007fefbe826960 RSP: ffff9747d1403790 RFLAGS: 00010093 RAX: 00007fefbe826960 RBX: ffffffff87c02fc0 RCX: 000000000000001f RDX: 0000000000000000 RSI: ffff9747d14037b8 RDI: ffffffff8766eeb8 RBP: ffff9747d14037f0 R8: ffffffff8766eeb8 R9: ffff9747d1403898 R10: 000000003b9aca00 R11: 000002afc3e979c0 R12: ffffffff87c033a0 R13: ffff9747d1403898 R14: ffffffff8766eed7 R15: ffffffff8766eeb8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #12 [ffff9747d14037f8] vscnprintf at ffffffff86f9296d #13 [ffff9747d1403810] vprintk_emit at ffffffff86c9e91b #14 [ffff9747d1403880] vprintk_default at ffffffff86c9ef79 #15 [ffff9747d1403890] printk at ffffffff873796d8 #16 [ffff9747d14038f0] no_context at ffffffff86c75e42 #17 [ffff9747d1403940] __bad_area_nosemaphore at ffffffff86c76042 #18 [ffff9747d1403990] bad_area_nosemaphore at ffffffff86c76164 #19 [ffff9747d14039a0] __do_page_fault at ffffffff8738d750 #20 [ffff9747d1403a10] trace_do_page_fault at ffffffff8738da26 #21 [ffff9747d1403a50] do_async_page_fault at ffffffff8738cfa2 #22 [ffff9747d1403a70] async_page_fault at ffffffff873897a8 [exception RIP: unknown or invalid address] RIP: 00007fefbe826960 RSP: ffff9747d1403b20 RFLAGS: 00010093 RAX: 00007fefbe826960 RBX: ffffffff87c02fc0 RCX: 000000000000001f RDX: 0000000000000000 RSI: ffff9747d1403b48 RDI: ffffffff8766eeb8 RBP: ffff9747d1403b80 R8: ffffffff8766eeb8 R9: ffff9747d1403c28 R10: 000000003b9aca00 R11: 000002afc3e979c0 R12: ffffffff87c033a0 R13: ffff9747d1403c28 R14: ffffffff8766eed7 R15: ffffffff8766eeb8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #23 [ffff9747d1403b88] vscnprintf at ffffffff86f9296d #24 [ffff9747d1403ba0] vprintk_emit at ffffffff86c9e91b #25 [ffff9747d1403c10] vprintk_default at ffffffff86c9ef79 #26 [ffff9747d1403c20] printk at ffffffff873796d8 #27 [ffff9747d1403c80] no_context at ffffffff86c75e42 #28 [ffff9747d1403cd0] __bad_area_nosemaphore at ffffffff86c76042 #29 [ffff9747d1403d20] bad_area_nosemaphore at ffffffff86c76164 #30 [ffff9747d1403d30] __do_page_fault at ffffffff8738d750 #31 [ffff9747d1403da0] trace_do_page_fault at ffffffff8738da26 #32 [ffff9747d1403de0] do_async_page_fault at ffffffff8738cfa2 #33 [ffff9747d1403e00] async_page_fault at ffffffff873897a8 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff9747d1403eb0 RFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff97466ecfda34 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff97466ecfd230 RBP: ffff9747d1403ef8 R8: 0000000000000000 R9: 00000000000009de R10: 000000003b9aca00 R11: 000002afc3e979c0 R12: 0000000000000000 R13: ffff97466ecfd230 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #34 [ffff9747d1403eb0] try_to_wake_up at ffffffff86cdb617 #35 [ffff9747d1403f00] wake_up_process at ffffffff86cdb8e5 #36 [ffff9747d1403f10] hrtimer_wakeup at ffffffff86cca2f2 #37 [ffff9747d1403f20] __hrtimer_run_queues at ffffffff86ccaa8e #38 [ffff9747d1403f78] hrtimer_interrupt at ffffffff86ccafef #39 [ffff9747d1403fc0] local_apic_timer_interrupt at ffffffff86c5ccfb #40 [ffff9747d1403fd8] smp_apic_timer_interrupt at ffffffff873979c3 #41 [ffff9747d1403ff0] apic_timer_interrupt at ffffffff87393efa --- <IRQ stack> --- #42 [ffffffff87803e00] apic_timer_interrupt at ffffffff87393efa RIP: ffffffff86d58cfd RSP: 0000000000000000 RFLAGS: ffff9747d14161e0 RAX: 7fffffffffffffff RBX: 0000024c1723eb40 RCX: 0000024c15de15f9 RDX: ffff9747d1411200 RSI: 7fffffffffffffff RDI: ed63a856222aa83a RBP: ffff9747d1415f80 R8: ffffffff86d10ed0 R9: ffffffff87803ea0 R10: ffffffff86cca972 R11: ffffffff87803e40 R12: 0000024c1723eb40 R13: 0000000000000000 R14: 0000024c15d439c0 R15: ed63a856222aa83a ORIG_RAX: ffffffff87803ea8 CS: 24c15de15f9 SS: ffffffffffffffed bt: WARNING: possibly bogus exception frame crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 1298232 5 GB ---- FREE 572562 2.2 GB 44% of TOTAL MEM USED 725670 2.8 GB 55% of TOTAL MEM SHARED 135119 527.8 MB 10% of TOTAL MEM BUFFERS 518 2 MB 0% of TOTAL MEM CACHED 263000 1 GB 20% of TOTAL MEM SLAB 265527 1 GB 20% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 3145727 12 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 3145727 12 GB 100% of TOTAL SWAP COMMIT LIMIT 3794843 14.5 GB ---- COMMITTED 1049288 4 GB 27% of TOTAL LIMIT
Using an older kernel (CENTOS 7.5) in a VM, still having crash. Guest kernel version 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 [ 8850.226379] BUG: unable to handle kernel paging request at ffffdc8d8184f2c0 [ 8850.226990] IP: [<ffffffffa7c47345>] __check_object_size+0x195/0x250 [ 8850.227332] PGD 1d0f85067 PUD ffffffffa8600000 [ 8850.227332] Oops: 0000 [#1] SMP [ 8850.227332] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set i pip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs _wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntr ack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrp c dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device aesni_intel lrw gf128mul glue_helper ablk_helper snd_pcm sg iTCO_wdt joydev cryptd iTCO_vendor_support ppdev pcspkr i2c_i801 snd_timer snd virtio_rng lpc_ich soundcore [ 8850.227332] parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables xfs libcrc32c sr_mod cdrom qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_console virtio_blk ahci drm libahci crct10dif_pclmul crct10dif_common libata crc32c_intel serio_raw virtio_net virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core [ 8850.227332] CPU: 1 PID: 11853 Comm: calico-node Kdump: loaded Tainted: G ------------ T 3.10.0-1062.9.1.el7.x86_64 #1 [ 8850.227332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 8850.227332] task: ffff8a1047afa0e0 ti: ffff8a0f213c8000 task.ti: ffff8a0f213c8000 [ 8850.227332] RIP: 0010:[<ffffffffa7c47345>] [<ffffffffa7c47345>] __check_object_size+0x195/0x250 [ 8850.227332] RSP: 0018:ffff8a0f213cbef0 EFLAGS: 00010286 [ 8850.227332] RAX: ffffdc8d8184f2c0 RBX: ffff8a0f213cbf20 RCX: 000000000000000c [ 8850.227332] RDX: 000075f0c0000000 RSI: ffff8a1091787000 RDI: ffff8a0fa13cbf20 [ 8850.227332] RBP: ffff8a0f213cbf10 R08: 0000080c9a4eff5c R09: 0000000000002292 [ 8850.227332] R10: 000000000000003c R11: 0000000000000212 R12: 0000000000000010 [ 8850.227332] R13: 0000000000000000 R14: ffff8a0f213cbf30 R15: 0000000000000000 [ 8850.227332] FS: 00007f9ade33b700(0000) GS:ffff8a1091500000(0000) knlGS:0000000000000000 [ 8850.227332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8850.227332] CR2: ffffdc8d8184f2c0 CR3: 000000007656e000 CR4: 0000000000340fe0 [ 8850.227332] Call Trace: [ 8850.227332] [<ffffffffa7acb195>] SyS_nanosleep+0x35/0xb0 [ 8850.227332] [<ffffffffa818dede>] system_call_fastpath+0x25/0x2a [ 8850.227332] Code: 8b 15 f0 0c 9d 00 48 01 d8 72 0e 48 c7 c2 00 00 00 80 48 2b 15 8d e2 9f 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 6b e2 9f 00 <48> 8b 10 80 e6 80 0f 85 95 00 00 00 48 89 c2 48 8b 02 a8 80 74 [ 8850.227332] RIP [<ffffffffa7c47345>] __check_object_size+0x195/0x250 [ 8850.227332] RSP <ffff8a0f213cbef0> [ 8850.227332] CR2: ffffdc8d8184f2c0
Tried to use a kernel 4.4 from ELREPO. Guest VM kernel version 4.4.235-1.el7.elrepo.x86_64 Guest crashed but having another error. There is no debug kernel provided so I can't get a back trace for this one. [ 9205.717160] nscd[870]: segfault at 0 ip 00007fae3fb77eb3 sp 00007ffe4e8b17a0 error 6 in libc-2.17.so[7fae3fa79000+1c3000] [ 9205.719072] general protection fault: 0000 [#1] SMP [ 9205.719847] Modules linked in: xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd input_leds pcspkr i2c_i801 snd_timer sg snd lpc_ich virtio_rng mfd_core virtio_balloon joydev soundcore 8250_fintek shpchp binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net crc32c_intel ahci libahci serio_raw libata qxl drm_kms_helper syscopyarea sysfillrect sysimgblt [ 9205.720015] fb_sys_fops ttm virtio_pci virtio_ring virtio drm [ 9205.720015] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.4.235-1.el7.elrepo.x86_64 #1 [ 9205.720015] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 9205.720015] Workqueue: events qxl_fb_work [qxl] [ 9205.720015] task: ffff88018094c2c0 ti: ffff8801809dc000 task.ti: ffff8801809dc000 [ 9205.720015] RIP: 0010:[<ffffffff811f4d3b>] [<ffffffff811f4d3b>] kmem_cache_alloc+0x7b/0x1e0 [ 9205.720015] RSP: 0018:ffff8801809dfa58 EFLAGS: 00010286 [ 9205.720015] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 000000000001cf81 [ 9205.720015] RDX: 000000000001cf80 RSI: 00000000024000c0 RDI: 0000000000019fa0 [ 9205.720015] RBP: ffff8801809dfa88 R08: ffff880186419fa0 R09: ffff88017f088400 [ 9205.720015] R10: ffff88017d9bc468 R11: ffff8801809dfa80 R12: ffff7228e8de8948 [ 9205.720015] R13: 00000000024000c0 R14: ffff880181885d00 R15: ffff880181885d00 [ 9205.720015] FS: 0000000000000000(0000) GS:ffff880186400000(0000) knlGS:0000000000000000 [ 9205.720015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9205.720015] CR2: 0000000000000000 CR3: 000000017f48f000 CR4: 00000000003406f0 [ 9205.720015] Stack: [ 9205.720015] ffffffff81231315 ffff8801809dfae8 ffff880181815800 ffff880181815800 [ 9205.720015] 0000000000000000 ffff88017f088400 ffff8801809dfab0 ffffffff81231315 [ 9205.720015] 0000000000200000 ffff880181815800 0000000000021000 ffff8801809dfac0 [ 9205.720015] Call Trace: [ 9205.720015] [<ffffffff81231315>] ? __d_alloc+0x25/0x180 [ 9205.720015] [<ffffffff81231315>] __d_alloc+0x25/0x180 [ 9205.720015] [<ffffffff8123155e>] d_alloc_pseudo+0xe/0x10 [ 9205.720015] [<ffffffff811b0420>] __shmem_file_setup.part.0+0x70/0x210 [ 9205.720015] [<ffffffff811b07cb>] shmem_file_setup+0x2b/0x30 [ 9205.720015] [<ffffffffa00083fb>] drm_gem_object_init+0x2b/0x40 [drm] [ 9205.720015] [<ffffffffa0062a8c>] qxl_bo_create+0x7c/0x190 [qxl] [ 9205.720015] [<ffffffffa006760e>] ? qxl_release_list_add+0x5e/0xa0 [qxl] [ 9205.720015] [<ffffffffa0064024>] qxl_alloc_bo_reserved+0x44/0xb0 [qxl] [ 9205.720015] [<ffffffffa0064ecc>] qxl_image_alloc_objects+0xac/0x140 [qxl] [ 9205.720015] [<ffffffffa006552d>] qxl_draw_opaque_fb+0xbd/0x3e0 [qxl] [ 9205.720015] [<ffffffff810b96d2>] ? account_entity_dequeue+0xb2/0xd0 [ 9205.720015] [<ffffffffa0061d31>] qxl_fb_dirty_flush+0x181/0x230 [qxl] [ 9205.720015] [<ffffffff8172f820>] ? __schedule+0x270/0x840 [ 9205.720015] [<ffffffffa0061df9>] qxl_fb_work+0x19/0x20 [qxl] [ 9205.720015] [<ffffffff8109cde4>] process_one_work+0x194/0x470 [ 9205.720015] [<ffffffff8109d219>] worker_thread+0x159/0x510 [ 9205.720015] [<ffffffff8109d0c0>] ? process_one_work+0x470/0x470 [ 9205.720015] [<ffffffff810a3679>] kthread+0xd9/0xf0 [ 9205.720015] [<ffffffff8172f84d>] ? __schedule+0x29d/0x840 [ 9205.720015] [<ffffffff810a35a0>] ? kthread_create_on_node+0x1c0/0x1c0 [ 9205.720015] [<ffffffff817348e2>] ret_from_fork+0x42/0x80 [ 9205.720015] [<ffffffff810a35a0>] ? kthread_create_on_node+0x1c0/0x1c0 [ 9205.720015] Code: 08 65 4c 03 05 17 84 e1 7e 49 83 78 10 00 4d 8b 20 0f 84 28 01 00 00 4d 85 e4 0f 84 1f 01 00 00 49 63 46 20 49 8b 3e 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63 [ 9205.720015] RIP [<ffffffff811f4d3b>] kmem_cache_alloc+0x7b/0x1e0 [ 9205.720015] RSP <ffff8801809dfa58>
Got another crash with different backtrace. [ 267.404272] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 267.405200] IP: [<ffffffffad2db801>] try_to_wake_up+0x2c1/0x390 [ 267.405200] PGD 6cd8c067 PUD 6cd92067 PMD 0 [ 267.405200] Oops: 0000 [#1] SMP [ 267.405200] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support ppdev snd_pcm sg pcspkr virtio_rng joydev snd_timer snd soundcore lpc_ich i2c_i801 i6300esb parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables sr_mod cdrom [ 267.405200] xfs libcrc32c qxl drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ttm serio_raw libata drm virtio_net virtio_blk virtio_console net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core [ 267.405200] CPU: 0 PID: 4008 Comm: kube-apiserver Kdump: loaded Tainted: G ------------ T 3.10.0-1127.19.1.el7.x86_64 #1 [ 267.405200] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 267.405200] task: ffff94460a32c1c0 ti: ffff944608778000 task.ti: ffff944608778000 [ 267.405200] RIP: 0010:[<ffffffffad2db801>] [<ffffffffad2db801>] try_to_wake_up+0x2c1/0x390 [ 267.405200] RSP: 0018:ffff94460877bd60 EFLAGS: 00010002 [ 267.405200] RAX: ffff9445bd43c1c0 RBX: ffff9444b5d2eaa4 RCX: 0000000000000018 [ 267.405200] RDX: ffff94461151b6e0 RSI: ffff9444b5d2e2c0 RDI: ffff9444b5d2e2c0 [ 267.404228] ------------[ cut here ]------------ [ 267.404228] WARNING: CPU: 1 PID: 0 at include/linux/uaccess.h:17 is_prefetch.isra.23+0x2a9/0x2e0 [ 267.404228] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip6_tables iptable_mangle xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat ip_vs nf_conntrack overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support ppdev snd_pcm sg pcspkr virtio_rng joydev snd_timer snd soundcore lpc_ich i2c_i801 i6300esb parport_pc parport binfmt_misc br_netfilter bridge stp llc ip_tables sr_mod cdrom [ 267.404228] xfs libcrc32c qxl drm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ttm serio_raw libata drm virtio_net virtio_blk virtio_console net_failover failover virtio_pci virtio_ring virtio drm_panel_orientation_quirks ptp_kvm ptp pps_core [ 267.404228] CPU: 1 PID: 0 Comm: \x80\x97\x98\xad\xff\xff\xff\xff\x10 Kdump: loaded Tainted: G ------------ T 3.10.0-1127.19.1.el7.x86_64 #1 [ 267.404228] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 267.404228] Call Trace: [ 267.404228] ---[ end trace f55d3283bccbb6f4 ]---
One more crash in the VM [ 5570.272849] BUG: unable to handle kernel paging request at 000000014f07b025 [ 5570.272849] IP: [<ffffffff83276cc7>] search_extable+0x27/0x50 [ 5570.272849] PGD 17b3b0067 PUD 0 [ 5570.272849] Oops: 0000 [#1] SMP [ 5570.272849] Modules linked in: ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_comment xt_mark xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_ hda_core snd_hwdep snd_seq snd_seq_device snd_pcm sg iTCO_wdt snd_timer iTCO_vendor_support pcspkr ppdev joydev snd soundcore i2c_i801 lpc_ich i6300esb virtio_rng virtio_balloon parport_pc parport binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom qxl drm_kms_helper syscopyarea [ 5570.272849] sysfillrect sysimgblt fb_sys_fops ttm virtio_blk ahci libahci virtio_console drm virtio_net net_failover libata failover serio_raw drm_panel_orientation_quirks virtio_pci virtio_ring virtio ptp_kvm ptp pps_core [ 5570.272849] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G ------------ T 3.10.0-1127.19.1.el7.x86_64 #1 [ 5570.272849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 5570.272849] task: ffffffff83e18480 ti: ffffffff83e00000 task.ti: ffffffff83e00000 [ 5570.272849] RIP: 0010:[<ffffffff83276cc7>] [<ffffffff83276cc7>] search_extable+0x27/0x50 [ 5570.272849] RSP: 0018:ffff8e7e059fc808 EFLAGS: 00010056 [ 5570.272849] RAX: 000000014f07b025 RBX: ffffffffc06fc100 RCX: ffffffff8327ef01 [ 5570.272849] RDX: ffffffff83276cc7 RSI: 000000014f07b025 RDI: 000000014f07b025 [ 5570.272849] RBP: ffff8e7e059fc808 R08: 0000000000030001 R09: 00000000000015c3 [ 5570.272849] R10: 000000003b9aca00 R11: 000005749bbfe840 R12: ffffffff83276cc7 [ 5570.272849] R13: 000000014f07b025 R14: ffffffff83e18480 R15: 0000000000000000 [ 5570.272849] FS: 0000000000000000(0000) GS:ffff8e7e06400000(0000) knlGS:0000000000000000 [ 5570.272849] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 5570.272849] CR2: 000000014f07b025 CR3: 000000017b706000 CR4: 00000000000007f0 [ 5570.272849] Call Trace: [ 5570.272849] Code: c3 0f 1f 00 66 66 66 66 90 55 48 39 f7 48 89 e5 76 0b eb 2d 48 8d 78 08 48 39 fe 72 24 48 89 f0 48 29 f8 48 c1 f8 04 48 8d 04 c7 <48> 63 08 48 01 c1 48 39 ca 77 de 73 0b 48 8d 70 f8 48 39 fe 73 [ 5570.272849] RIP [<ffffffff83276cc7>] search_extable+0x27/0x50 [ 5570.272849] RSP <ffff8e7e059fc808> [ 5570.272849] CR2: 000000014f07b025
Host machine upgrade kernel to 5.7.17-200.fc32 One of the node upgraded kernel to 5.8.6 but still getting crashes: [ 3492.933383] BUG: unable to handle page fault for address: 00007f5e545f1ca8 [ 3492.934146] #PF: supervisor read access in kernel mode [ 3492.934678] #PF: error_code(0x0000) - not-present page [ 3492.935232] PGD 11b3a0067 P4D 11b3a0067 PUD 0 [ 3492.935695] Oops: 0000 [#1] SMP NOPTI [ 3492.936081] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1 [ 3492.937055] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 3492.937947] RIP: 0010:rb_insert_color+0x14/0x120 [ 3492.938442] Code: c0 75 eb 4c 89 c0 c3 45 31 c0 eb f7 66 2e 0f 1f 84 00 00 00 00 00 48 8b 07 48 85 c0 0f 84 b0 00 00 00 48 8b 10 f6 c2 01 75 5b <48> 8b 4a 08 48 39 c1 74 53 48 85 c9 74 05 f6 01 01 74 72 48 8b 48 [ 3492.940378] RSP: 0018:ffffc90000073dc0 EFLAGS: 00010046 [ 3492.940924] RAX: ffffc90000c6fd18 RBX: ffff88818231f440 RCX: ffffc90000c6fd20 [ 3492.941483] RDX: 00007f5e545f1ca0 RSI: ffff88818231f460 RDI: ffff88818231f960 [ 3492.942139] RBP: ffffc90000073dd0 R08: ffff88818231f460 R09: 7fffffffffffffff [ 3492.942709] R10: 0000032d40ffd98f R11: 00000000001f2e14 R12: 0000000000000000 [ 3492.943290] R13: 0000000000000001 R14: 000000000000000a R15: ffff88818231f440 [ 3492.943869] FS: 0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000 [ 3492.944518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3492.944993] CR2: 00007f5e545f1ca8 CR3: 000000011b3da000 CR4: 00000000000006e0 [ 3492.945552] Call Trace: [ 3492.945783] ? timerqueue_add+0x68/0xb0 [ 3492.946122] enqueue_hrtimer+0x3d/0x90 [ 3492.946433] hrtimer_start_range_ns+0x196/0x310 [ 3492.946808] ? hrtimer_try_to_cancel+0x2c/0x110 [ 3492.947193] tick_nohz_restart_sched_tick+0xa9/0xc0 [ 3492.947585] tick_nohz_idle_exit+0xac/0xd0 [ 3492.947927] do_idle+0x156/0x270 [ 3492.948205] cpu_startup_entry+0x20/0x30 [ 3492.948606] start_secondary+0x159/0x1a0 [ 3492.949040] secondary_startup_64+0xa4/0xb0 [ 3492.949469] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle ip6_tables xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm i2c_i801 snd_timer snd soundcore iTCO_wdt iTCO_vendor_support lpc_ich sg i6300esb i2c_smbus pcspkr mfd_core input_leds virtio_rng virtio_balloon joydev qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom serio_raw virtio_blk virtio_console virtio_net net_failover failover ahci libahci libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_pci drm virtio_ring virtio ptp_kvm [ 3492.949509] ptp pps_core [ 3492.960266] CR2: 00007f5e545f1ca8 This VM do not have swap configured but there is still free RAM just before it crash. 08:15:01 PM 1740616 1103972 38.81 2068 733808 2769832 97.37 469208 466536 100 08:16:01 PM 1740680 1103908 38.81 2068 733824 2767844 97.30 469220 466552 208 08:17:01 PM 1740648 1103940 38.81 2068 733860 2767844 97.30 469260 466588 204 08:18:01 PM 1741320 1103268 38.78 2068 733872 2767844 97.30 469284 466596 100 08:19:01 PM 1741096 1103492 38.79 2068 733888 2767844 97.30 469320 466612 204 08:20:01 PM 1740064 1104524 38.83 2068 734004 2770600 97.40 469344 466728 104 08:21:01 PM 1740080 1104508 38.83 2068 734044 2768100 97.31 469344 466768 200 08:21:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 08:22:02 PM 1740080 1104508 38.83 2068 734060 2768100 97.31 469520 466720 200 Average: 1753835 1090753 38.34 2068 727251 2747707 96.59 463350 463245 158 08:22:51 PM LINUX RESTART 08:23:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 08:24:01 PM 1888068 956520 33.63 2068 679436 1751044 61.56 380352 445008 200 08:25:01 PM 1792372 1052216 36.99 2068 713184 2862964 100.65 447760 449488 100
Another crash for VM, VM running kernel 5.8.6 [ 8727.211401] BUG: unable to handle page fault for address: ffff888182317bc0 [ 8727.212058] #PF: supervisor read access in kernel mode [ 8727.212497] #PF: error_code(0x0009) - reserved bit violation [ 8727.212954] PGD 2c01067 P4D 2c01067 PUD 2c04067 PMD 80000001822001e3 [ 8727.213478] Thread overran stack, or stack corrupted [ 8727.213875] Oops: 0009 [#1] SMP NOPTI [ 8727.214182] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1 [ 8727.214873] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 8727.215585] RIP: 0010:exc_page_fault+0x1d/0x160 [ 8727.215967] Code: 70 ff cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 54 49 89 fc 53 0f 20 d0 66 66 66 90 49 89 c5 <65> 48 8b 04 25 c0 7b 01 00 48 8b 80 f8 07 00 00 0f 0d 48 78 e9 5a [ 8727.217449] RSP: 0018:fffffe000003ad60 EFLAGS: 00010083 [ 8727.217900] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7 [ 8727.218551] RDX: 0000000000000000 RSI: 0000000000000009 RDI: fffffe000003ad98 [ 8727.219133] RBP: fffffe000003ad88 R08: 0000000000000000 R09: 0000000000000000 [ 8727.219710] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003ad98 [ 8727.220293] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000 [ 8727.220883] FS: 0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000 [ 8727.221532] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8727.221998] CR2: ffff888182317bc0 CR3: 000000011a7d4000 CR4: 00000000000006e0 [ 8727.222579] Call Trace: [ 8727.222785] <#DF> [ 8727.222966] asm_exc_page_fault+0x1e/0x30 [ 8727.223300] RIP: 0010:exc_page_fault+0x1d/0x160 [ 8727.223663] Code: 70 ff cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 54 49 89 fc 53 0 f 20 d0 66 66 66 90 49 89 c5 <65> 48 8b 04 25 c0 7b 01 00 48 8b 80 f8 07 00 00 0f 0d 48 78 e9 5a [ 8727.225238] RSP: 0018:fffffe000003ae40 EFLAGS: 00010093 [ 8727.225781] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7 [ 8727.226503] RDX: 0000000000000000 RSI: 0000000000000009 RDI: fffffe000003ae78 [ 8727.227249] RBP: fffffe000003ae68 R08: 0000000000000000 R09: 0000000000000000 [ 8727.227985] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003ae78 [ 8727.228591] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000 [ 8727.229176] ? native_iret+0x7/0x7 [ 8727.229459] asm_exc_page_fault+0x1e/0x30 [ 8727.229794] RIP: 0010:exc_double_fault+0x11/0x160 [ 8727.230191] Code: ea 4c 89 e6 48 c7 c7 11 5d 0d 82 e8 59 a7 6a ff eb 92 0f 1f 80 00 00 00 00 55 48 89 e5 41 56 41 5 5 49 89 f5 41 54 49 89 fc 53 <65> 48 8b 1c 25 c0 7b 01 00 0f 20 d0 66 66 66 90 49 89 c6 48 8b 87 [ 8727.232755] RSP: 0018:fffffe000003af28 EFLAGS: 00010086 [ 8727.233636] RAX: 0000000082300000 RBX: 0000000000000001 RCX: 00000000c0000101 [ 8727.234660] RDX: 00000000ffff8881 RSI: 0000000000000000 RDI: fffffe000003af58 [ 8727.235670] RBP: fffffe000003af48 R08: 0000000000000000 R09: 0000000000000000 [ 8727.236677] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003af58 [ 8727.237687] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 8727.238685] asm_exc_double_fault+0x1e/0x30 [ 8727.239457] RIP: 0010:error_entry+0xc/0xf0 [ 8727.240228] Code: 0c 65 48 0f b3 1c 25 96 b8 02 00 eb 05 49 0f ba ee 3f 41 0f 22 de e9 2a fd ff ff 0f 1f 00 fc 56 48 8b 74 24 08 48 89 7c 24 08 <52> 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 41 57 56 [ 8727.242711] RSP: 0018:fffffe000003a000 EFLAGS: 00010083 [ 8727.243847] RAX: ffff888182317bc0 RBX: 0000000000000000 RCX: ffffffff81a00fc7 [ 8727.245204] RDX: 0000000000000000 RSI: ffffffff81a00ac8 RDI: fffffe000003a078 [ 8727.246312] RBP: fffffe000003a068 R08: 0000000000000000 R09: 0000000000000000 [ 8727.247332] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000003a078 [ 8727.248335] R13: ffff888182317bc0 R14: 0000000000000009 R15: 0000000000000000 [ 8727.249354] ? native_iret+0x7/0x7 [ 8727.250066] ? asm_exc_page_fault+0x8/0x30 [ 8727.250834] </#DF> [ 8727.251448] WARNING: stack recursion on stack type 5 [ 8727.251448] Modules linked in: xt_multiport ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio iTCO_wdt iTCO_vendor_support snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr input_leds sg snd_timer snd soundcore i2c_i801 lpc_ich mfd_core i2c_smbus virtio_rng i6300esb virtio_balloon joydev qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console ahci virtio_net net_failover failover libahci serio_raw libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_pci virtio_ring virtio ptp_kvm ptp [ 8727.251496] pps_core [ 8727.264222] CR2: ffff888182317bc0
Another crash. I had add a VM running Fedora 32, kernel 5.8.4-200.fc32.x86_64. I have no idea about which part is causing the problem :( [ 2263.148807] BUG: kernel NULL pointer dereference, address: 000000000000007d [ 2263.149716] #PF: supervisor instruction fetch in kernel mode [ 2263.150648] #PF: error_code(0x0010) - not-present page [ 2263.151302] PGD 597ef067 P4D 597ef067 PUD 597ee067 PMD 0 [ 2263.151989] Oops: 0010 [#1] SMP NOPTI [ 2263.152456] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1 [ 2263.153241] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 2263.153945] RIP: 0010:0x7d [ 2263.154214] Code: Bad RIP value. [ 2263.154549] RSP: 0018:ffffbbf6000a8f18 EFLAGS: 00010002 [ 2263.155094] RAX: 0000000000000000 RBX: ffff9bfc84b1d580 RCX: 0000000000000007 [ 2263.155824] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffbbf600c1bcc0 [ 2263.156528] RBP: ffff9bfc84b1d540 R08: 0000000000000191 R09: 0000000000000078 [ 2263.157170] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000007d [ 2263.157778] R13: ffff9bfc84b1d540 R14: ffff9bfc84b1d638 R15: ffffbbf600c1bcc0 [ 2263.158429] FS: 0000000000000000(0000) GS:ffff9bfc84b00000(0000) knlGS:0000000000000000 [ 2263.159259] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2263.159865] CR2: 000000000000007d CR3: 00000000597e2000 CR4: 0000000000340ee0 [ 2263.160613] Call Trace: [ 2263.160856] <IRQ> [ 2263.161063] ? __hrtimer_run_queues+0x118/0x280 [ 2263.161547] ? hrtimer_interrupt+0x10e/0x280 [ 2263.162001] ? __sysvec_apic_timer_interrupt+0x61/0x100 [ 2263.162549] ? asm_call_on_stack+0x12/0x20 [ 2263.162980] </IRQ> [ 2263.163210] ? sysvec_apic_timer_interrupt+0x6f/0x90 [ 2263.163616] ? asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 2263.164076] ? __sched_text_end+0x3/0x3 [ 2263.164472] ? native_safe_halt+0xe/0x10 [ 2263.164810] ? default_idle+0x1a/0x140 [ 2263.165150] ? do_idle+0x1f3/0x2a0 [ 2263.165453] ? cpu_startup_entry+0x19/0x20 [ 2263.165791] ? start_secondary+0x144/0x170 [ 2263.166143] ? secondary_startup_64+0xb6/0xc0 [ 2263.166542] Modules linked in: xt_multiport xt_set iptable_filter ipt_rpfilter iptable_mangle iptable_nat iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill iTCO_wdt intel_pmc_bxt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl i2c_i801 joydev i2c_smbus snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core virtio_balloon snd_hwdep pvpanic lpc_ich i6300esb snd_pcm snd_timer snd soundcore sunrpc br_netfilter bridge stp llc overlay ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper cec drm crc32c_intel serio_raw virtio_blk virtio_console xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse [ 2263.173371] CR2: 000000000000007d
Created attachment 1713892 [details] virsh dump config for the fc32 vm in comment #9 virsh dumpxml config for the fc32 vm in comment #9 Other VM (CENTOS 7.8) use similar setting
Similar to comment #8 These two VM do not have swap configured, they are different VM 1) [10499.651778] BUG: unable to handle page fault for address: ffffffff91c00ac0 [10499.652391] #PF: supervisor instruction fetch in kernel mode [10499.652846] #PF: error_code(0x0010) - not-present page [10499.653256] PGD 240e067 P4D 240e067 PUD 240f063 PMD 0 [10499.653665] Thread overran stack, or stack corrupted [10499.654063] Oops: 0010 [#1] SMP NOPTI [10499.654379] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1 [10499.655119] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [10499.655814] RIP: 0010:0xffffffff91c00ac0 [10499.656148] Code: Bad RIP value. [10499.656431] WARNING: kernel stack frame pointer at 00000000984c746a in swapper/0:0 has bad value 0000000000000000 [10499.656432] unwind stack type:0 next_sp:0000000000000000 mask:0x20 graph_idx:0 [10499.656432] 00000000984c746a: 0000000000000000 ... [10499.656470] BUG: kernel NULL pointer dereference, address: 0000000000000000 [10499.656471] #PF: supervisor instruction fetch in kernel mode [10499.656471] #PF: error_code(0x0010) - not-present page [10499.656471] PGD 0 P4D 0 [10499.656472] Thread overran stack, or stack corrupted [10499.656472] Oops: 0010 [#2] SMP NOPTI [10499.656472] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1 [10499.656473] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [10499.656473] RIP: 0010:0x0 [10499.656473] Code: Bad RIP value. [10499.656473] RSP: 0018:fffffe00000097b8 EFLAGS: 00010092 [10499.656474] RAX: 0000000000001000 RBX: 0000000000000000 RCX: 0000000000000008 [10499.656474] RDX: ffffc900005d6460 RSI: ffff88817f5409c2 RDI: 0000000000aaaaaa [10499.656474] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffc900005d6460 [10499.656474] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [10499.656474] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [10499.656475] FS: 0000000000000000(0000) GS:ffff888182200000(0000) knlGS:0000000000000000 [10499.656475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10499.656477] CR2: ffffffffffffffd6 CR3: 000000017adf0000 CR4: 00000000000006f0 [10499.656477] Call Trace: [10499.656477] <#DF> [10499.656478] </#DF> [10499.656478] Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_set iptable_raw ip_set_hash_net ip_set_hash_ip ip_set veth ipip tunnel4 ip_tunnel xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6_tables iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr iTCO_wdt iTCO_vendor_support snd_timer snd i6300esb soundcore virtio_balloon input_leds i2c_i801 i2c_smbus virtio_rng joydev sg lpc_ich mfd_core qemu_fw_cfg binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net net_failover failover ahci libahci serio_raw libata qxl drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_pci [10499.656487] virtio_ring virtio ptp_kvm ptp pps_core [10499.656487] CR2: 0000000000000000 2) [ 2791.577029] BUG: unable to handle page fault for address: ffffffff91c00ac0 [ 2791.577864] #PF: supervisor instruction fetch in kernel mode [ 2791.578548] #PF: error_code(0x0010) - not-present page [ 2791.579092] PGD 240e067 P4D 240e067 PUD 240f063 PMD 0 [ 2791.579508] Thread overran stack, or stack corrupted [ 2791.579922] Oops: 0010 [#1] SMP NOPTI [ 2791.580236] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.6-2.el7.elrepo.x86_64 #1 [ 2791.580924] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 2791.581607] RIP: 0010:0xffffffff91c00ac0 [ 2791.581926] Code: Bad RIP value. [ 2791.582198] RSP: 0018:fffffe000003ad00 EFLAGS: 00010093 [ 2791.582614] RAX: 0000000091c00fe7 RBX: 0000000000000000 RCX: ffffffff91c00fe7 [ 2791.583247] RDX: 0000000000000000 RSI: 0000000000000010 RDI: fffffe000003a798 [ 2791.583812] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 2791.584433] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 2791.584998] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2791.585600] FS: 0000000000000000(0000) GS:ffff888182300000(0000) knlGS:0000000000000000 [ 2791.586254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2791.586757] CR2: ffffffff91c00a96 CR3: 00000001128e4000 CR4: 00000000000006e0 [ 2791.587344] Call Trace: [ 2791.587549] <#DF> [ 2791.587722] </#DF> [ 2791.587922] WARNING: stack recursion on stack type 5 [ 2791.587923] Modules linked in: ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_multiport xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6_tables iptable_mangle xt_comment xt_mark xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq iTCO_wdt iTCO_vendor_support snd_seq_device snd_pcm i2c_i801 pcspkr snd_timer snd soundcore i2c_smbus input_leds sg lpc_ich mfd_core i6300esb virtio_rng virtio_balloon qemu_fw_cfg joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom virtio_blk virtio_console virtio_net net_failover failover ahci libahci serio_raw libata qxl drm_ttm_helper virtio_pci ttm virtio_ring virtio drm_kms_helper syscopyarea sysfillrect [ 2791.587956] sysimgblt fb_sys_fops drm ptp_kvm ptp pps_core [ 2791.595729] CR2: ffffffff91c00ac0
These crashes are from FC32 VM, VM same as comment #9. This VM do not have swap configured. 1) [ 724.402477] unable to execute userspace code (SMEP?) (uid: 0) [ 724.403082] BUG: unable to handle page fault for address: ffffffff8ec00ac0 [ 724.403678] #PF: supervisor instruction fetch in kernel mode [ 724.404126] #PF: error_code(0x0019) - reserved bit violation [ 724.404586] PGD 13fa0f067 P4D 13fa0f067 PUD 13fa10063 PMD 13ec001e1 [ 724.405087] Thread overran stack, or stack corrupted [ 724.405496] Oops: 0019 [#1] SMP NOPTI [ 724.405797] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1 [ 724.406481] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 724.407204] RIP: 0010:asm_exc_page_fault+0x0/0x30 [ 724.407605] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff [ 724.409157] RSP: 0018:fffffe000003af40 EFLAGS: 00010046 [ 724.409588] RAX: ffffffff8eb71700 RBX: 0000000000000001 RCX: 7fffffffffffffff [ 724.410147] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff943804b1daa0 [ 724.410709] RBP: 0000000000000001 R08: 000000cd42e4dffb R09: 0000000000000201 [ 724.411268] R10: 000000000000036c R11: 0000000000000000 R12: 0000000000000000 [ 724.411830] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 724.412392] FS: 0000000000000000(0000) GS:ffff943804b00000(0000) knlGS:0000000000000000 [ 724.413031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 724.413485] CR2: ffffffff8ec00ac0 CR3: 000000013d544000 CR4: 0000000000340ee0 [ 724.414049] Call Trace: [ 724.414263] <#DF> [ 724.414431] RIP: 0010:asm_exc_page_fault+0x0/0x30 [ 724.414813] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff [ 724.416276] RSP: 0018:fffffe000003af70 EFLAGS: 00010046 [ 724.416281] RIP: 0010:asm_exc_page_fault+0x0/0x30 [ 724.417096] Code: 24 28 ff 74 24 28 ff 74 24 28 ff 74 24 28 e8 d7 07 00 00 48 89 e7 e8 bf 09 f6 ff e9 ba 08 00 00 66 2e 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 b8 07 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff [ 724.418595] RSP: 0018:fffffe000003afa0 EFLAGS: 00010046 [ 724.418597] RIP: 0010:asm_exc_double_fault+0x0/0x30 [ 724.419397] Code: e8 55 0c f6 ff e9 00 08 00 00 0f 01 ca 6a ff e8 06 07 00 00 48 89 e7 e8 2e ff f5 ff e9 e9 07 00 00 66 0f 1f 84 00 00 00 00 00 <0f> 01 ca e8 e8 05 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff [ 724.420856] RSP: 0018:fffffe000003afd0 EFLAGS: 00010046 [ 724.420857] WARNING: stack going in the wrong direction? at asm_xenpv_exc_debug+0x20/0x20 [ 724.420864] ? asm_exc_int3+0x40/0x40 [ 724.422210] </#DF> [ 724.422383] WARNING: stack recursion on stack type 5 [ 724.422384] Modules linked in: ipt_rpfilter xt_multiport xt_set iptable_filter iptable_mangle iptable_nat iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_pmc_bxt sunrpc iTCO_vendor_support i2c_i801 i2c_smbus lpc_ich virtio_balloon joydev i6300esb pvpanic br_netfilter bridge stp llc overlay ip_tables xfs crc32c_intel qxl drm_ttm_helper ttm drm_kms_helper serio_raw cec virtio_console drm virtio_blk xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse [ 724.430263] CR2: ffffffff8ec00ac0 2) [ 2294.420100] general protection fault, probably for non-canonical address 0xebaa27cde9932d16: 0000 [#1] SMP NOPTI [ 2294.421608] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1 [ 2294.422572] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 2294.423573] RIP: 0010:__x86_indirect_thunk_rax+0x3/0x5 [ 2294.424112] Code: c0 e9 f1 dc d5 ff 31 c0 e9 f3 dc d5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f ae e8 <ff> e0 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 c3 66 2e 0f [ 2294.425599] RSP: 0018:ffffb9c480003f68 EFLAGS: 00010202 [ 2294.426021] RAX: ebaa27cde9932d16 RBX: ffff9184fbc61198 RCX: 0000000000000004 [ 2294.426671] RDX: ffffb9c480003f70 RSI: ffff9184fbc61198 RDI: ffff9184fbc61140 [ 2294.427234] RBP: ffff9184fbc61140 R08: ffffb9c480003f70 R09: ffff918504a28290 [ 2294.427803] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb9c480003f70 [ 2294.428382] R13: 0000000000000100 R14: 0000000000000004 R15: 0000000000000010 [ 2294.428978] FS: 0000000000000000(0000) GS:ffff918504a00000(0000) knlGS:0000000000000000 [ 2294.429675] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2294.430177] CR2: 00007fdc763a5000 CR3: 00000001411be000 CR4: 0000000000340ef0 [ 2294.430770] Call Trace: [ 2294.430999] <IRQ> [ 2294.431175] blk_done_softirq+0x91/0xb0 [ 2294.431503] __do_softirq+0xd9/0x2c4 [ 2294.431793] asm_call_on_stack+0x12/0x20 [ 2294.432132] </IRQ> [ 2294.432313] do_softirq_own_stack+0x39/0x50 [ 2294.432665] irq_exit_rcu+0xc2/0x100 [ 2294.433007] sysvec_call_function_single+0x34/0x90 [ 2294.433411] asm_sysvec_call_function_single+0x12/0x20 [ 2294.433822] RIP: 0010:native_safe_halt+0xe/0x10 [ 2294.434187] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 36 70 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 70 49 00 f4 c3 cc cc 0f 1f 44 00 [ 2294.435693] RSP: 0018:ffffffff95a03ea0 EFLAGS: 00000246 [ 2294.436169] RAX: ffffffff94b71700 RBX: 0000000000000000 RCX: 0000000000000001 [ 2294.436763] RDX: 0000000000000000 RSI: 0000000000000083 RDI: 0000000000000000 [ 2294.437339] RBP: 0000000000000000 R08: 000006c7fe924906 R09: 0000000000000000 [ 2294.437943] R10: 00000000000303ce R11: 0000000000000000 R12: 0000000000000000 [ 2294.438551] R13: 0000000000000000 R14: 0000000000000101 R15: 0000000000000000 [ 2294.439143] ? __sched_text_end+0x3/0x3 [ 2294.439485] default_idle+0x1a/0x140 [ 2294.439784] do_idle+0x1f3/0x2a0 [ 2294.440070] cpu_startup_entry+0x19/0x20 [ 2294.440402] start_kernel+0x7f4/0x804 [ 2294.440739] ? x86_family+0x5/0x20 [ 2294.441029] secondary_startup_64+0xb6/0xc0 [ 2294.441400] Modules linked in: ipt_REJECT nf_reject_ipv4 ipt_rpfilter xt_multiport xt_set iptable_filter iptable_mangle iptable_raw iptable_nat ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc iTCO_wdt intel_pmc_bxt iTCO_vendor_support i2c_i801 i2c_smbus lpc_ich virtio_balloon joydev i6300esb pvpanic br_netfilter bridge stp llc overlay ip_tables xfs qxl drm_ttm_helper ttm drm_kms_helper cec drm crc32c_intel serio_raw virtio_console virtio_blk xhci_pci xhci_pci_renesas virtio_net net_failover failover qemu_fw_cfg ptp_kvm fuse 3) [ 2292.163247] BUG: unable to handle page fault for address: ffff9b02ff7be870 [ 2292.164364] BUG: unable to handle page fault for address: ffff9b02ff625f38 [ 2292.164368] #PF: supervisor read access in kernel mode [ 2292.164369] #PF: error_code(0x0000) - not-present page [ 2292.164369] PGD 11a801067 P4D 11a801067 PUD 144154063 PMD 13f605063 [ 2292.164370] BAD [ 2292.164371] Oops: 0000 [#1] SMP NOPTI [ 2292.164371] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1 [ 2292.164372] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 2292.164372] RIP: 0010:insert_work+0x9a/0xc0 [ 2292.164373] Code: 87 00 03 00 00 85 c0 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 57 38 49 8d 47 38 48 39 c2 74 e8 49 8b 47 38 48 85 c0 74 df <48> 8b 78 38 5b 5d 41 5c 41 5d 41 5e 41 5f e9 c3 6f 01 00 0f 0b eb [ 2292.164373] RSP: 0018:ffffaa00800a8a30 EFLAGS: 00010086 [ 2292.164374] RAX: ffff9b02ff625f00 RBX: ffff9b02fd9794d0 RCX: ffff9b0304b2f705 [ 2292.164375] RDX: ffff9b02ff625f00 RSI: ffff9b0304b29c60 RDI: ffff9b02fd9794d8 [ 2292.164375] RBP: ffff9b0304b2f700 R08: ffff9b0304b29c60 R09: ffff9b0304b29c60 [ 2292.164376] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b0304b29c60 [ 2292.164376] R13: ffff9b0304b29c60 R14: ffff9b02fd9794d8 R15: ffff9b0304b29c40 [ 2292.164376] FS: 0000000000000000(0000) GS:ffff9b0304b00000(0000) knlGS:0000000000000000 [ 2292.164377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2292.164377] CR2: ffff9b02ff605128 CR3: 0000000140f32000 CR4: 0000000000340ee0 [ 2292.164378] Call Trace: [ 2292.164378] <IRQ> [ 2292.164378] __queue_work+0x1e0/0x410 [ 2292.164378] queue_work_on+0x36/0x40 [ 2292.164379] soft_cursor+0x1a7/0x230 [ 2292.164379] bit_cursor+0x3b4/0x5a0 [ 2292.164379] ? cursor_timer_handler+0x1/0x50 [ 2292.164380] ? fbcon_cursor+0xfb/0x180 [ 2292.164380] ? bit_putcs+0x510/0x510 [ 2292.164380] hide_cursor+0x2a/0x90 [ 2292.164381] vt_console_print+0x3c2/0x3d0 [ 2292.164381] console_unlock+0x39d/0x590 [ 2292.164381] vprintk_emit+0x164/0x280 [ 2292.164382] printk+0x48/0x4a [ 2292.164385] ? psi_task_change+0x91/0xc0 [ 2292.164386] no_context.cold+0x1c/0x21b [ 2292.164386] ? __netif_receive_skb_list_core+0x253/0x2b0 [ 2292.164386] exc_page_fault+0xe9/0x1a0 [ 2292.164387] asm_exc_page_fault+0x1e/0x30 [ 2292.164387] RIP: 0010:psi_task_change+0x90/0xc0 [ 2292.164388] Code: df 48 81 c7 b0 03 00 00 74 ad 45 89 f8 44 89 e1 89 ea 44 89 f6 41 83 e0 01 e8 2c fb ff ff 48 85 db 75 bc 49 8b 85 f8 0c 00 00 <48> 8b 58 70 48 85 db 75 c1 48 c7 c3 c0 da a5 89 48 89 df eb cb 41 [ 2292.164388] RSP: 0018:ffffaa00800a8e90 EFLAGS: 00010046 [ 2292.164389] RAX: ffff9b02ff7be800 RBX: 0000000000000000 RCX: 0000000000002800 [ 2292.164389] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9b0301d626c0 [ 2292.164390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004 [ 2292.164390] R10: 0000000000000000 R11: 000000000000706b R12: 0000000000000004 [ 2292.164391] R13: ffff9b0301d626c0 R14: 0000000000000001 R15: 0000000000000001 [ 2292.164391] try_to_wake_up+0x529/0x5c0 [ 2292.164391] ? update_load_avg+0x7a/0x610 [ 2292.164392] ? __hrtimer_init+0xd0/0xd0 [ 2292.164392] hrtimer_wakeup+0x1e/0x30 [ 2292.164392] __hrtimer_run_queues+0x118/0x280 [ 2292.164393] hrtimer_interrupt+0x10e/0x280 [ 2292.164393] __sysvec_apic_timer_interrupt+0x61/0x100 [ 2292.164393] asm_call_on_stack+0x12/0x20 [ 2292.164394] </IRQ> [ 2292.164394] sysvec_apic_timer_interrupt+0x6f/0x90 [ 2292.164394] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 2292.164395] RIP: 0010:native_safe_halt+0xe/0x10 [ 2292.164396] Code: 02 20 48 8b 00 a8 08 75 c4 e9 7b ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 36 70 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 26 70 49 00 f4 c3 cc cc 0f 1f 44 00 [ 2292.164396] RSP: 0018:ffffaa0080073ed0 EFLAGS: 00000246 [ 2292.164397] RAX: ffffffff88b71700 RBX: 0000000000000001 RCX: 0000000000000001 [ 2292.164397] RDX: 0000000000000001 RSI: 0000000000000083 RDI: 0000000000000001 [ 2292.164397] RBP: 0000000000000001 R08: ffff9b0304b1d5a0 R09: 0000000000000400 [ 2292.164398] R10: 00000000000000e4 R11: 0000000000000000 R12: 0000000000000000 [ 2292.164398] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 2292.164399] ? __sched_text_end+0x3/0x3 [ 2292.164399] default_idle+0x1a/0x140 [ 2292.164399] do_idle+0x1f3/0x2a0 [ 2292.164400] cpu_startup_entry+0x19/0x20 [ 2292.164400] start_secondary+0x144/0x170 [ 2292.164400] secondary_startup_64+0xb6/0xc0 [ 2292.164400] Modules linked in: ipt_rpfilter xt_multiport iptable_mangle xt_set iptable_raw iptable_filter iptable_nat ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_conntrack xt_comment nft_counter xt_mark nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support sunrpc i2c_i801 i2c_smbus joyd [ 2292.164410] Lost 31 message(s)!
I had add a worker node to the Kubernetes cluster (Centos OS in KVM). This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine. The node have swap. I have been using this node before without problem. After I add the worker node to the Kubernetes cluster, the node (Fedora 32) crashed after a while. [ 4233.816610] Oops: 0000 [#1] SMP NOPTI [ 4233.816636] CPU: 1 PID: 139909 Comm: systemd-userwor Kdump: loaded Not tainted 5.8.4-200.fc32.x86_64 #1 [ 4233.816702] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 4233.816754] RIP: 0010:__rb_erase_color+0x99/0x240 [ 4233.816776] Code: 48 89 c5 4c 8b 65 08 49 39 d4 75 a1 4c 8b 65 10 49 8b 5c 24 08 41 f6 04 24 01 0f 84 f8 00 00 00 49 8b 44 24 10 48 85 c0 74 05 <f6> 00 01 74 3f 48 85 db 74 aa f6 03 01 75 a5 48 8b 43 10 49 89 44 [ 4233.816863] RSP: 0018:ffffbad204327d60 EFLAGS: 00010202 [ 4233.817973] RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000006 [ 4233.819110] RDX: ffff8f0592bcf188 RSI: ffff8f05b5852480 RDI: ffff8f0592bcf188 [ 4233.820190] RBP: ffff8f059661ea80 R08: 0000000000000000 R09: 0000000000000000 [ 4233.821266] R10: ffff8f05b3a2df40 R11: ffff8f059acf7700 R12: ffff8f0594d2fae8 [ 4233.822334] R13: ffffffffb5291710 R14: ffff8f05b5852480 R15: 0000000000000000 [ 4233.823232] FS: 0000000000000000(0000) GS:ffff8f05b9e40000(0000) knlGS:0000000000000000 [ 4233.823753] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4233.824266] CR2: 0000000000000010 CR3: 0000000135698000 CR4: 00000000003406e0 [ 4233.824851] Call Trace: [ 4233.825341] unlink_file_vma+0x3d/0x60 [ 4233.825833] free_pgtables+0x92/0xf0 [ 4233.826295] exit_mmap+0xa6/0x170 [ 4233.826763] mmput+0x61/0x140 [ 4233.827212] do_exit+0x2fc/0xaf0 [ 4233.827724] ? syscall_trace_enter+0x14a/0x290 [ 4233.828174] do_group_exit+0x33/0xa0 [ 4233.828617] __x64_sys_exit_group+0x14/0x20 [ 4233.829125] do_syscall_64+0x52/0x90 [ 4233.829561] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4233.830032] RIP: 0033:0x7f31eaf373c1 [ 4233.830455] Code: Bad RIP value. [ 4233.830874] RSP: 002b:00007fffbc769e18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 4233.831279] RAX: ffffffffffffffda RBX: 00007f31eb02e470 RCX: 00007f31eaf373c1 [ 4233.831696] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 4233.832089] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000000 [ 4233.832493] R10: 00007f31eabb284e R11: 0000000000000246 R12: 00007f31eb02e470 [ 4233.832899] R13: 0000000000000002 R14: 00007f31eb02e948 R15: 0000000000000000 [ 4233.833286] Modules linked in: ipt_rpfilter xt_set iptable_raw ip_set_hash_ip ip_set_hash_net ip_set veth ipip tunnel4 ip_tunnel nf_conntrack_netlink nfnetlink xt_addrtype xt_nat xt_statistic ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_comment xt_mark rfkill vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc kvm_amd ccp pktcdvd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl vmw_balloon pcspkr joydev i2c_piix4 vmw_vmci br_netfilter bridge stp binfmt_misc llc overlay ip_tables raid1 vmwgfx drm_kms_helper cec ttm drm mptsas crc32c_intel scsi_transport_sas mptscsih serio_raw mptbase vmxnet3 ata_generic pata_acpi target_core_mod fuse vhost_net tun tap vhost vhost_iotlb [last unloaded: cfg80211] [ 4233.837381] CR2: 0000000000000010
New finding When the VM hangs but not crash, it consume all the CPU (200% if two CPU is assigned). When it happens, I could get a crash dump by virsh dump. But the output of the dump could not be analyse by the crash utility. gdb ../../vmlinux-5.8.4 GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [1699MB]: patching 126940 gdb minimal_symbol values <readmem: ffffffff18083cc8, KVADDR, "__pgtable_l5_enabled", 4, (FOE|Q), 7ffceb752218> <read_kdump: addr: ffffffff18083cc8 paddr: 77ff18083cc8 cnt: 4> crash: read error: kernel virtual address: ffffffff18083cc8 type: "__pgtable_l5_enabled" The behavior is the same for the VM running KVM or vmware workstation.
Sometimes the guest VM hangs and consume lots of CPU instead of crash and reboot. Those problem is problem of Qemu/KVM or the crash utility? (case reported with https://bugzilla.redhat.com/show_bug.cgi?id=1876589)
All previous testing are using KVM acceleration. With a VM that crashes frequently, I change it to use Qemu+TCG. It could run without problem for two hours. But that VM is very slow.
When a VM freeze (not crashing), it consume 100% VCPU of the host. Here's a pstack on the qemu process. # pstack 31258 Thread 5 (Thread 0x7f5e66dff700 (LWP 31719)): #0 0x00007f5fcb549aaf in poll () from target:/lib64/libc.so.6 #1 0x00007f5fcc69cace in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0 #2 0x00007f5fcc69ce53 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x00007f5fcbf612db in red_worker_main () from target:/lib64/libspice-server.so.1 #4 0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0 #5 0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6 Thread 4 (Thread 0x7f5fc61ff700 (LWP 31276)): #0 0x00007f5fcb54b3bb in ioctl () from target:/lib64/libc.so.6 #1 0x000055b8588b9519 in kvm_vcpu_ioctl () #2 0x000055b8588b95d9 in kvm_cpu_exec () #3 0x000055b85889daac in qemu_kvm_cpu_thread_fn () #4 0x000055b858c95683 in qemu_thread_start () #5 0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6 Thread 3 (Thread 0x7f5fc6f8e700 (LWP 31275)): #0 0x00007f5fcb549aaf in poll () from target:/lib64/libc.so.6 #1 0x00007f5fcc69cace in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0 #2 0x00007f5fcc69ce53 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x000055b8589c0fb1 in iothread_run () #4 0x000055b858c95683 in qemu_thread_start () #5 0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6 Thread 2 (Thread 0x7f5fc93ff700 (LWP 31267)): #0 0x00007f5fcb54f37d in syscall () from target:/lib64/libc.so.6 #1 0x000055b858c95fd2 in qemu_event_wait () #2 0x000055b858ca88c2 in call_rcu_thread () #3 0x000055b858c95683 in qemu_thread_start () #4 0x00007f5fcb626432 in start_thread () from target:/lib64/libpthread.so.0 #5 0x00007f5fcb554913 in clone () from target:/lib64/libc.so.6 Thread 1 (Thread 0x7f5fc9c84700 (LWP 31258)): #0 0x00007f5fcb549bae in ppoll () from target:/lib64/libc.so.6 #1 0x000055b858c91255 in qemu_poll_ns () #2 0x000055b858c92615 in main_loop_wait () #3 0x000055b8589c72af in main_loop () #4 0x000055b85884e79c in main ()
These are two conditions would not crash KVM: 1) VMware workstation is stopped. Only run with KVM. 2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow. Here is my questions: 1) It is suspected that the processes for Qemu may have corruption when both hypervisor are running. Why there is no problem logged in the host? Is there any mechanism that could prevent or alert user about the problem? 2) Should Qemu stop starting VM with KVM if it had detected another Hypervisor is already running?
hese are some conditions would not crash KVM: 1) VMware workstation is stopped. Only run VM with KVM. 2) VMware workstation is running. For Qemu guests, they need to run without KVM acceleration. But the speed for Qemu guests would be very slow. 3) Vmware workstation is running. For Qemu guests, they could run with KVM acceleration, but those guests could not start docker or cri-o. If those processes are started, it would likely crash the KVM guest. Note the third condition, it should be related to hardware virtualization used by docker and cri-o. While Vmware workstaion is running, I could use KVM to install some guest OS without problem. Once I start docker or cri-o daemon inside a KVM guest, those VMs start crashing.
Assuming the host is VMware and therefore the KVM host is L1, it's likely using nested KVM but VMware isn't emulating/implementing nested virt correctly. In any case there's not very much we can do to deal with closed source software.
The host is Fedora 32 Linux. So there are two Type 2 hypervisors: 1) VMware workstation (it's not ESX. For ESX it needs to be installed on the host) 2) Qemu with KVM acceleration by loading Linux kernel module The strange thing is that it seems it is able to start or run VM with both hypervisors at the same time. But if there are container applications (Docker or cri-o) running in guests using KVM and VMware workstation at the same time, the VM using KVM could easily crash. For VMware workstation guests, it only need to run a plain OS inside the VM, not necessary running Docker or cri-o) to trigger crashing the VM inside KVM.
> This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine. I'm having a really hard time understanding what the configuration is. "VMware workstation" is some kind of proprietary product. Does it load proprietary kernel modules? If so, then this is immediately NOTABUG - go and ask VMware for help. Does the host (the baremetal bit) run Fedora? What version of Fedora? What guests are you running? What is running in the guests?
(In reply to Richard W.M. Jones from comment #22) > > This node is running on VMware workstation. The host machine of KVM and VMware workstation is the same machine. > > I'm having a really hard time understanding what the configuration is. > "VMware workstation" is some kind of proprietary product. Does it > load proprietary kernel modules? If so, then this is immediately NOTABUG - > go and ask VMware for help. Does the host (the baremetal bit) run Fedora? > What version of Fedora? What guests are you running? What is running > in the guests? 'This node' refers to a Kubernetes worker node. That node/VM runs under VMware workstation. There is only one physical machine (bare metal), it's Fedora Linux versino 32. There are two Type 2 hypervisors on it. One of them is Vmware workstation and the other one is KVM. VMware workstation loads kernel modules. Those kernel modules are GPL, not proprietary. Here's the related modules: $ lsmod |grep vm | sort ccp 106496 1 kvm_amd irqbypass 16384 19 kvm kvm 823296 61 kvm_amd kvm_amd 114688 8 vmmon 131072 8 vmnet 65536 63 vmw_vmci 90112 9 vmw_vsock_vmci_transport vmw_vsock_vmci_transport 32768 0 vsock 49152 1 vmw_vsock_vmci_transport vmw_vmci, vm_vsock_vmci_transport and vsock are provided by Fedora 32 host OS. vmmon and vmnet are provided by VMware. For VMware workstation 15.5, we need to patch these modules for Kernel 5.7/5.8. I use the module/patch from github : https://github.com/mkubecek/vmware-host-modules/tree/player-15.5.6/vmmon-only https://github.com/mkubecek/vmware-host-modules/tree/player-15.5.6/vmnet-only I had performed further test, with VMware workstation running guests. Any guest using KVM would crash randomly (even a plain OS without docker/cri-o/K8S). I run guest KVM like Centos 7.8 and Fedora 32.
Re-assigning to the kernel, since all signs point towards bad interaction between the vmware kernel modules and the main kernel. Debugging vmware kmods is outside scope of Fedora though really, as whether they're open source or not, they are still out of tre kmods.
Running two different hypervisors at the same time is a really, really bad idea. As one of these hypervisors is proprietary, Vmware is likely the only once who can make this work -- if they want to, of course. I don't see what can be done in Fedora/Upstream.
Checking out some old posts ten years ago about KVM and VMware workstation :) (I am using AMD CPU) https://communities.vmware.com/thread/188067 May be these two hypervisors are not ready to run concurrently.
The best way forward here is to work with VMware and the community to get the vmmon and vmnet modules upstream. (Apparently other VMware-related modules have gone upstream). When everything is upstream it should be possible to fix the kernel either so it doesn't let you load kvm.ko and vmmon at the same time, or even better so the hypervisor state can be shared in some way. Until that time there's not much Fedora can do about this, so I am closing this bug.
Greetings, I am using Fedora 34 and kernel 5.13.4-200.fc34.x86_64. Since VMware Workstation 16.1.2 does not support kernel 5.13.4, I used the patching module from https://github.com/mkubecek/vmware-host-modules/tree/workstation-16.1.2 Now I had been running with VMware workstation and KVM without crashing problem for about two weeks. So some of the patches that could fix the problem. Thanks.
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
This message is a reminder that Fedora Linux 36 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '36'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 36 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16. Fedora Linux 36 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.