Bug 1866360 - Kernel panic - not syncing: System is deadlocked on memory
Summary: Kernel panic - not syncing: System is deadlocked on memory
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 8.3
Assignee: David Hildenbrand
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-05 12:08 UTC by Jing Qi
Modified: 2021-06-02 01:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 07:59:03 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
guest trace (26.90 KB, text/plain)
2020-08-05 12:08 UTC, Jing Qi
no flags Details
qemu_cmd (63.78 KB, text/plain)
2020-08-12 06:54 UTC, Jing Qi
no flags Details

Description Jing Qi 2020-08-05 12:08:22 UTC
Created attachment 1710503 [details]
guest trace

Description of problem:

Coldvplug 256 memory devices and start the guest, the guest crashed for "Kernel panic - not syncing: System is deadlocked on memory".

Version-Release number of selected component (if applicable):


libvirt-6.5.0-1.module+el8.3.0+7323+d54bb644.x86_64
qemu-kvm-5.0.0-2.module+el8.3.0+7379+0505d6ca.x86_64
Guest Kernel version:  4.18.0-221.el8.x86_64 on an x86_64

How reproducible:
100%

Steps:
1. Define a domain with below xml included.
  <maxMemory slots='256' unit='KiB'>138412032</maxMemory>
  <memory unit='KiB'>1548288</memory>
  <currentMemory unit='KiB'>1548288</currentMemory>
  ....
<cpu mode='host-model' check='partial'>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>
....
 <memory model='dimm' discard='no'>
      <target>
        <size unit='KiB'>524288</size>
        <node>0</node>
      </target>
      <alias name='ua-27ab1a68-b716-45e4-8430-9aad07908da8'/>
      <address type='dimm' slot='0'/>
    </memory>

2. Cold plug 255 memory devices to the domain.
for i in `seq 1 254`; do virsh attach-device avocado-vt-vm1 mem.xml --config; done;
   mem.xml -
<memory model='dimm' discard='no'>
      <target>
        <size unit='KiB'>524288</size>
        <node>0</node>
      </target>
</memory>
3.  Tried to start the domain and it crashed.
 CPU: 0 PID: 2 Comm: kthreadd Tainted: G        W        --------- -  - 4.18.0-193.el8.x86_64 #1
[    1.425000] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.2.1+7284+aa32a2c4 04/01/2014
[    1.425000] Call Trace:
[    1.425000]  dump_stack+0x5c/0x80
[    1.425000]  panic+0xe7/0x2a9
[    1.425000]  out_of_memory.cold.32+0x5e/0x87
[    1.425000]  __alloc_pages_slowpath+0xc18/0xd00
[    1.425000]  __alloc_pages_nodemask+0x245/0x280
[    1.425000]  __vmalloc_node_range+0x11d/0x230
[    1.425000]  copy_process.part.34+0x8dd/0x1910
[    1.425000]  ? _do_fork+0xe3/0x3a0
[    1.425000]  ? kthread_flush_work_fn+0x10/0x10
[    1.425000]  ? __switch_to_asm+0x35/0x70
[    1.425000]  _do_fork+0xe3/0x3a0
[    1.425000]  ? __set_cpus_allowed_ptr+0xa6/0x200
[    1.425000]  kernel_thread+0x25/0x30
[    1.425000]  kthreadd+0x2ae/0x300
[    1.425000]  ? kthread_create_on_cpu+0xa0/0xa0
[    1.425000]  ret_from_fork+0x35/0x40
[    1.425000] ---[ end Kernel panic - not syncing: System is deadlocked on memory
[    1.425000]  ]---

Whole trace log is attached

Expected result :
The domain can be started.

Additional info:
Hot plug 256 memory devices, the domain works well.

Comment 1 Jing Qi 2020-08-10 05:47:50 UTC
Two guest versions of 4.18.0-193.el8.x86_64 & 4.18.0-221.el8.x86_64  are tried and the same issue is met. Thanks!

Comment 2 Jaroslav Suchanek 2020-08-12 06:26:42 UTC
Can you please attach the generated qemu command line? All in all I do not think libvirt has anything to do with the issue. Passing down the stack.

Comment 3 Jing Qi 2020-08-12 06:54:42 UTC
Created attachment 1711148 [details]
qemu_cmd

Comment 4 John Ferlan 2020-08-13 14:35:37 UTC
Amnon - passing onto you for someone with memory device mgmt to triage.

Comment 6 Yumei Huang 2020-08-17 06:35:15 UTC
I tried adding 256 dimms in qemu cli and boot guest, but didn't hit the issue, guest boot up without error.

qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901
guest kernel: 4.18.0-228.el8.x86_64
host kernel: 4.18.0-227.el8.x86_64

QEMU cli:
# /usr/libexec/qemu-kvm \
    -m 4G,maxmem=40G,slots=256  \
    -object memory-backend-ram,id=m0,size=2G \
    -object memory-backend-ram,id=m1,size=2G \
    -numa node,memdev=m0 \
    -numa node,memdev=m1 \
    -object memory-backend-ram,id=mem0,size=128M \
    -device pc-dimm,id=dimm0,memdev=mem0 \
    ...
    -object memory-backend-ram,id=mem255,size=128M \
    -device pc-dimm,id=dimm255,memdev=mem255


Hi Jing,

Can you reproduce with qemu?

Comment 7 Jing Qi 2020-08-18 01:52:29 UTC
Yes, I can reproduce it.
Version:
qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.x86_64
libvirt-6.6.0-2.module+el8.3.0+7567+dc41c0a9.x86_64
Host & Guest kernel:
4.18.0-232.el8.x86_64 

The same step in the bug description and part of guest stack trace -
  
 40.063643] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[   40.066447] Out of memory and no killable processes...
[   40.068053] Kernel panic - not syncing: System is deadlocked on memory
[   40.068053] 
[   40.069047] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W        --------- -  - 4.18.0-232.el8.x86_64 #1
[   40.069047] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.3.0+7638+07cf13d2 04/01/2014
[   40.069047] Call Trace:
[   40.069047]  dump_stack+0x5c/0x80
[   40.069047]  panic+0xe7/0x2a9
[   40.069047]  out_of_memory.cold.31+0x5e/0x89
[   40.069047]  __alloc_pages_slowpath+0xc24/0xd40
[   40.069047]  __alloc_pages_nodemask+0x245/0x280
[   40.069047]  alloc_page_interleave+0x13/0x70
[   40.069047]  new_slab+0x3e3/0x9e0
[   40.069047]  ___slab_alloc+0x3b6/0x580
[   40.069047]  ? __d_alloc+0x22/0x1d0
[   40.069047]  ? __d_alloc+0x22/0x1d0
[   40.069047]  __slab_alloc+0x1c/0x30
[   40.069047]  kmem_cache_alloc+0x183/0x1b0
[   40.069047]  __d_alloc+0x22/0x1d0
[   40.069047]  d_alloc+0x1b/0xa0
[   40.069047]  d_alloc_parallel+0x54/0x4a0
[   40.069047]  __lookup_slow+0x6e/0x150
[   40.069047]  lookup_one_len+0x73/0x80
[   40.069047]  start_creating+0x66/0xf0
[   40.069047]  tracefs_create_file+0x2e/0x140
[   40.069047]  trace_create_file+0xd/0x20
[   40.069047]  event_create_dir+0x207/0x510
[   40.069047]  event_trace_init+0x241/0x2b3
[   40.069047]  ? do_early_param+0x91/0x91
[   40.069047]  tracer_init_tracefs+0x6e/0x1c1
[   40.069047]  ? register_tracer+0x1b4/0x1b4
[   40.069047]  do_one_initcall+0x46/0x1c3
[   40.069047]  ? do_early_param+0x91/0x91
[   40.069047]  kernel_init_freeable+0x1b4/0x25d
[   40.069047]  ? rest_init+0xaa/0xaa
[   40.069047]  kernel_init+0xa/0xfa
[   40.069047]  ret_from_fork+0x35/0x40
[   40.069047] ---[ end Kernel panic - not syncing: System is deadlocked on memory
[   40.069047]  ]---

Comment 8 Jing Qi 2020-08-18 02:53:48 UTC
The issue may be caused by the total memory size including the hotplug memory devices is too large for the host (the same host used in comment 7)
I tried to use the memory size of 134717440 KiB to start a domain without numa cell configuration as below.


<memory unit='KiB'>134717440</memory>
  <currentMemory unit='KiB'>134717440</currentMemory>
...
<cpu mode='host-model' check='partial'>
    <feature policy='disable' name='vmx'/>
  </cpu>

# virsh start avocado-vt-vm1
error: Failed to start domain avocado-vt-vm1
error: internal error: qemu unexpectedly closed the monitor: 2020-08-18T02:49:13.313425Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory

Comment 10 Yumei Huang 2021-03-05 09:40:43 UTC
Hi Igor,

Would you please help review this bug? Looks like an expected OOM to me, what do you think?

Comment 11 Igor Mammedov 2021-03-16 12:19:17 UTC
Simpler reproducer:

/usr/libexec/qemu-kvm -m 256M,maxmem=40G,slots=256 -nographic -object memory-backend-ram,id=mem0,size=2G -device pc-dimm,id=dimm0,memdev=mem0 rhel8_disk.img

Comment 12 David Hildenbrand 2021-03-16 12:31:45 UTC
So, we start a VM with 1548288 KiB (1.5 GiB) of memory and want to coldplug 256 * 512 MiB (128 GiB).

That did never work. When booting up, ACPI code will detect and add that memory to Linux. This will allocate memory for metadata.
The memmap (metadata) for the 128 GiB alone needs 2 GiB. At one point adding memory will fail and the guest will continue booting up. As there is not a lot of free memory left, the guest will crash while booting up.

1. (likely) One issue is might be that memory is not getting onlined immediately automatically when coldplugging.

You could try forcing immediate onlining via "memhp_default_state=online" on the kernel cmdline.

2. (unlikely) Memory is getting onlined to ZONE_MOVABLE instead of ZONE_NORMAL. 128 GiB vs. 1.5 GiB would be very bad zone ratio.

Note that the setup in comment 6 is different than the original report.

When hotplugging the DIMMs instead of coldplugging them, it behaves a little bit differently, because there we at least already have a running system where udev rules can online hotplugged memory automatically. BUT, you will run into the exact same issue during reboots.


What is the target use case here? Or is it just playing around with DIMM configurations?

Comment 13 Yumei Huang 2021-03-19 03:31:58 UTC
Hi Jing,

Could you please check above comment and answer the questions? Thanks.

Comment 14 Jing Qi 2021-03-19 07:33:38 UTC
Thanks for the detailed explain. 
The issue was found when we ran memory test case. The test cases is to hot plug memory devices to a running domain until the memory device number reach the max_slots(256) and random reboot the vm during this period.

Comment 15 David Hildenbrand 2021-03-19 09:37:58 UTC
Okay, so I assume this is a new test case, because it couldn't ever really have worked - right?

You should tweak your test case to supply more boot memory - Something around 4 GiB should be okay for the huge amount of memory you're planning on hotplugging.

Comment 16 Jing Qi 2021-03-19 10:30:09 UTC
(In reply to David Hildenbrand from comment #15)
> Okay, so I assume this is a new test case, because it couldn't ever really
> have worked - right?
> 
>> It's not a new test case. But it often met other issues before.

> You should tweak your test case to supply more boot memory - Something
> around 4 GiB should be okay for the huge amount of memory you're planning on
> hotplugging.

OK, I'll try it later.

Comment 17 David Hildenbrand 2021-05-14 13:41:23 UTC
Hi,

did you have a chance to try? We'd like to close this BZ.

Thanks!

Comment 18 Jing Qi 2021-05-18 01:24:25 UTC
I tried it with libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64 & libvirt-7.3.0-1.module+el8.5.0+11004+f4810536.x86_64.

It can pass with 4G as a initial memory size. Thanks!

Comment 19 David Hildenbrand 2021-05-19 07:59:03 UTC
Thanks! Closing as per comment 18.


Note You need to log in before you can comment on or make changes to this bug.