Created attachment 1710503 [details] guest trace Description of problem: Coldvplug 256 memory devices and start the guest, the guest crashed for "Kernel panic - not syncing: System is deadlocked on memory". Version-Release number of selected component (if applicable): libvirt-6.5.0-1.module+el8.3.0+7323+d54bb644.x86_64 qemu-kvm-5.0.0-2.module+el8.3.0+7379+0505d6ca.x86_64 Guest Kernel version: 4.18.0-221.el8.x86_64 on an x86_64 How reproducible: 100% Steps: 1. Define a domain with below xml included. <maxMemory slots='256' unit='KiB'>138412032</maxMemory> <memory unit='KiB'>1548288</memory> <currentMemory unit='KiB'>1548288</currentMemory> .... <cpu mode='host-model' check='partial'> <numa> <cell id='0' cpus='0-1' memory='512000' unit='KiB'/> <cell id='1' cpus='2-3' memory='512000' unit='KiB'/> </numa> </cpu> .... <memory model='dimm' discard='no'> <target> <size unit='KiB'>524288</size> <node>0</node> </target> <alias name='ua-27ab1a68-b716-45e4-8430-9aad07908da8'/> <address type='dimm' slot='0'/> </memory> 2. Cold plug 255 memory devices to the domain. for i in `seq 1 254`; do virsh attach-device avocado-vt-vm1 mem.xml --config; done; mem.xml - <memory model='dimm' discard='no'> <target> <size unit='KiB'>524288</size> <node>0</node> </target> </memory> 3. Tried to start the domain and it crashed. CPU: 0 PID: 2 Comm: kthreadd Tainted: G W --------- - - 4.18.0-193.el8.x86_64 #1 [ 1.425000] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014 [ 1.425000] Call Trace: [ 1.425000] dump_stack+0x5c/0x80 [ 1.425000] panic+0xe7/0x2a9 [ 1.425000] out_of_memory.cold.32+0x5e/0x87 [ 1.425000] __alloc_pages_slowpath+0xc18/0xd00 [ 1.425000] __alloc_pages_nodemask+0x245/0x280 [ 1.425000] __vmalloc_node_range+0x11d/0x230 [ 1.425000] copy_process.part.34+0x8dd/0x1910 [ 1.425000] ? _do_fork+0xe3/0x3a0 [ 1.425000] ? kthread_flush_work_fn+0x10/0x10 [ 1.425000] ? __switch_to_asm+0x35/0x70 [ 1.425000] _do_fork+0xe3/0x3a0 [ 1.425000] ? __set_cpus_allowed_ptr+0xa6/0x200 [ 1.425000] kernel_thread+0x25/0x30 [ 1.425000] kthreadd+0x2ae/0x300 [ 1.425000] ? kthread_create_on_cpu+0xa0/0xa0 [ 1.425000] ret_from_fork+0x35/0x40 [ 1.425000] ---[ end Kernel panic - not syncing: System is deadlocked on memory [ 1.425000] ]--- Whole trace log is attached Expected result : The domain can be started. Additional info: Hot plug 256 memory devices, the domain works well.
Two guest versions of 4.18.0-193.el8.x86_64 & 4.18.0-221.el8.x86_64 are tried and the same issue is met. Thanks!
Can you please attach the generated qemu command line? All in all I do not think libvirt has anything to do with the issue. Passing down the stack.
Created attachment 1711148 [details] qemu_cmd
Amnon - passing onto you for someone with memory device mgmt to triage.
I tried adding 256 dimms in qemu cli and boot guest, but didn't hit the issue, guest boot up without error. qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901 guest kernel: 4.18.0-228.el8.x86_64 host kernel: 4.18.0-227.el8.x86_64 QEMU cli: # /usr/libexec/qemu-kvm \ -m 4G,maxmem=40G,slots=256 \ -object memory-backend-ram,id=m0,size=2G \ -object memory-backend-ram,id=m1,size=2G \ -numa node,memdev=m0 \ -numa node,memdev=m1 \ -object memory-backend-ram,id=mem0,size=128M \ -device pc-dimm,id=dimm0,memdev=mem0 \ ... -object memory-backend-ram,id=mem255,size=128M \ -device pc-dimm,id=dimm255,memdev=mem255 Hi Jing, Can you reproduce with qemu?
Yes, I can reproduce it. Version: qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.x86_64 libvirt-6.6.0-2.module+el8.3.0+7567+dc41c0a9.x86_64 Host & Guest kernel: 4.18.0-232.el8.x86_64 The same step in the bug description and part of guest stack trace - 40.063643] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 40.066447] Out of memory and no killable processes... [ 40.068053] Kernel panic - not syncing: System is deadlocked on memory [ 40.068053] [ 40.069047] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W --------- - - 4.18.0-232.el8.x86_64 #1 [ 40.069047] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014 [ 40.069047] Call Trace: [ 40.069047] dump_stack+0x5c/0x80 [ 40.069047] panic+0xe7/0x2a9 [ 40.069047] out_of_memory.cold.31+0x5e/0x89 [ 40.069047] __alloc_pages_slowpath+0xc24/0xd40 [ 40.069047] __alloc_pages_nodemask+0x245/0x280 [ 40.069047] alloc_page_interleave+0x13/0x70 [ 40.069047] new_slab+0x3e3/0x9e0 [ 40.069047] ___slab_alloc+0x3b6/0x580 [ 40.069047] ? __d_alloc+0x22/0x1d0 [ 40.069047] ? __d_alloc+0x22/0x1d0 [ 40.069047] __slab_alloc+0x1c/0x30 [ 40.069047] kmem_cache_alloc+0x183/0x1b0 [ 40.069047] __d_alloc+0x22/0x1d0 [ 40.069047] d_alloc+0x1b/0xa0 [ 40.069047] d_alloc_parallel+0x54/0x4a0 [ 40.069047] __lookup_slow+0x6e/0x150 [ 40.069047] lookup_one_len+0x73/0x80 [ 40.069047] start_creating+0x66/0xf0 [ 40.069047] tracefs_create_file+0x2e/0x140 [ 40.069047] trace_create_file+0xd/0x20 [ 40.069047] event_create_dir+0x207/0x510 [ 40.069047] event_trace_init+0x241/0x2b3 [ 40.069047] ? do_early_param+0x91/0x91 [ 40.069047] tracer_init_tracefs+0x6e/0x1c1 [ 40.069047] ? register_tracer+0x1b4/0x1b4 [ 40.069047] do_one_initcall+0x46/0x1c3 [ 40.069047] ? do_early_param+0x91/0x91 [ 40.069047] kernel_init_freeable+0x1b4/0x25d [ 40.069047] ? rest_init+0xaa/0xaa [ 40.069047] kernel_init+0xa/0xfa [ 40.069047] ret_from_fork+0x35/0x40 [ 40.069047] ---[ end Kernel panic - not syncing: System is deadlocked on memory [ 40.069047] ]---
The issue may be caused by the total memory size including the hotplug memory devices is too large for the host (the same host used in comment 7) I tried to use the memory size of 134717440 KiB to start a domain without numa cell configuration as below. <memory unit='KiB'>134717440</memory> <currentMemory unit='KiB'>134717440</currentMemory> ... <cpu mode='host-model' check='partial'> <feature policy='disable' name='vmx'/> </cpu> # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: qemu unexpectedly closed the monitor: 2020-08-18T02:49:13.313425Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory
Hi Igor, Would you please help review this bug? Looks like an expected OOM to me, what do you think?
Simpler reproducer: /usr/libexec/qemu-kvm -m 256M,maxmem=40G,slots=256 -nographic -object memory-backend-ram,id=mem0,size=2G -device pc-dimm,id=dimm0,memdev=mem0 rhel8_disk.img
So, we start a VM with 1548288 KiB (1.5 GiB) of memory and want to coldplug 256 * 512 MiB (128 GiB). That did never work. When booting up, ACPI code will detect and add that memory to Linux. This will allocate memory for metadata. The memmap (metadata) for the 128 GiB alone needs 2 GiB. At one point adding memory will fail and the guest will continue booting up. As there is not a lot of free memory left, the guest will crash while booting up. 1. (likely) One issue is might be that memory is not getting onlined immediately automatically when coldplugging. You could try forcing immediate onlining via "memhp_default_state=online" on the kernel cmdline. 2. (unlikely) Memory is getting onlined to ZONE_MOVABLE instead of ZONE_NORMAL. 128 GiB vs. 1.5 GiB would be very bad zone ratio. Note that the setup in comment 6 is different than the original report. When hotplugging the DIMMs instead of coldplugging them, it behaves a little bit differently, because there we at least already have a running system where udev rules can online hotplugged memory automatically. BUT, you will run into the exact same issue during reboots. What is the target use case here? Or is it just playing around with DIMM configurations?
Hi Jing, Could you please check above comment and answer the questions? Thanks.
Thanks for the detailed explain. The issue was found when we ran memory test case. The test cases is to hot plug memory devices to a running domain until the memory device number reach the max_slots(256) and random reboot the vm during this period.
Okay, so I assume this is a new test case, because it couldn't ever really have worked - right? You should tweak your test case to supply more boot memory - Something around 4 GiB should be okay for the huge amount of memory you're planning on hotplugging.
(In reply to David Hildenbrand from comment #15) > Okay, so I assume this is a new test case, because it couldn't ever really > have worked - right? > >> It's not a new test case. But it often met other issues before. > You should tweak your test case to supply more boot memory - Something > around 4 GiB should be okay for the huge amount of memory you're planning on > hotplugging. OK, I'll try it later.
Hi, did you have a chance to try? We'd like to close this BZ. Thanks!
I tried it with libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64 & libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64. It can pass with 4G as a initial memory size. Thanks!
Thanks! Closing as per comment 18.