Bug 1507149

Summary:

[LLNL 7.5 Bug] slab leak causing a crash when using kmem control group

Product:

Red Hat Enterprise Linux 7

Reporter:

Ben Woodard <woodard>

Component:

kernel

Assignee:

Aristeu Rozanski <arozansk>

kernel sub component:

Control Groups

QA Contact:

Chao Ye <cye>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

urgent

CC:

aaron_wilk, afox, aleksander.lukasz, amdas, anand, aquini, arozansk, arunmk, atomlin, atragler, bjarolim, cfernandez, chaithco, charles.wright, chenxu198511, codyja, cye, daniel.kuffner, dhoward, dwd, foraker1, fwissing, hannsj_uhl, ikulkarn, jennyw, jentrena, jijia, joaquin.lopez, jpirko, jpmenil, julio.valcarcel, jwinn, kolyshkin, kyoshida, lining916740672, liwan, mikelley, mmilgram, mschibli, mystery.wd, nmurray, pasik, perobins, plazonic, qguo, quackmaster, rbobek, rcheerla, redhat-bugs, redtex, R.Eggermont, ribarry, riehecky, rmonther, sakulkar, shuwang, sman, smeisner, sroza, subhat, sukulkar, svarada, tgraf, tgummels, uobergfe, vagrawal, vamsee, woodard, wredtex, xyhao06, yoliynyk, zhenghc00

Version:

7.4

Keywords:

ZStream

Target Milestone:

Flags:

jpmenil: needinfo-
jpmenil: needinfo-

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-3.10.0-1075.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1748234 1748236 1748237 1752421 (view as bug list)

Environment:

Last Closed:

2020-03-31 19:12:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1392283, 1420851, 1549423, 1599298, 1649189, 1689150, 1713152, 1748234, 1748236, 1748237, 1752421, 1754591

Attachments:

Description	Flags
0003-mm-memcg-slab-fix-races-in-per-memcg-cache-creation-destruction.patch	none
0004-mm-memcg-slab-do-not-destroy-children-caches-if-parent-has-aliases.patch	none
0001-memcg-slab-kmem-cache-create-memcg-fix-memleak-on-fail-path.patch	none
0002-memcg-slab-clean-up-memcg-cache-initialization-destruction.patch	none
Output of /proc/slabinfo on the machine showing the issue	none
Test patches.	none
Slab active/Total Graph	none
Slab active/Total Graph data	none

Description Ben Woodard 2017-10-27 20:06:02 UTC

Description of problem:
We upgraded SLURM from 2.3.3 to 17.02.7.on some of our HPC clusters and we are now having machines which use PSM2 i.e. machines that have hfi1/OPA crash at a rate of about 1/node week of uptime. Because of the size of the clusters, this ends up being at a rate of several per day. The new slurm the 17.02.7 is more aggressive in how it uses cgroups. 

Control groups are implicated and in particular memcg's are circumstantially implicated but possibly it has to do with the layout of the control groups or the way that they are used. This is not containers and the way that systemd creates slices, scopes, and services. These are HPC compute nodes and the layout of the control groups is along the lines of: 

/sys/fs/cgroup/memory/slurm/uid_#/job_#/step_# 

They also use CPU  and freezer cgroups on the cluster where the problem first appeared. 

The 4 new cgroup parameters the newer Slurm on rzgenie is now setting:
>         slurm_cgroup_conf->constrain_kmem_space = false ;
>         slurm_cgroup_conf->allowed_kmem_space = -1;
>         slurm_cgroup_conf->max_kmem_percent = 100;
>         slurm_cgroup_conf->min_kmem_space = XCGROUP_DEFAULT_MIN_RAM;

What we see is memory.kmem.limit_in_bytes and memory.limit_in_bytes are being set in /sys/fs/cgroup/memory/slurm/uid_<uid>/job_<jobid>, whereas on the older slurm memory.memsw.limit_in_bytes and memory.limit_in_bytes are being set.

We're working on figuring out a minimal reproducer but we haven't pinned it down fully. 

Interestingly, it seems to leak slab caches but it may be more harmless on PSM machines which have qib cards. I don't think that we've seen one crash yet. It is possible that the leak is just slower on PSM machines.

One thing is kind of unusual about PSM & PSM2 vs. other users of the slab caches. They create a new slab for each and every job and then supposedly cleans it up at the end of the job. I personally don't know the association but from what I gather so far PSM uses this in some way with its intra-node MPI traffic.


As evidence that we are leaking slabs on a 7.3.z cluster and old Slurm.
quartz187{foraker1}24: uptime; ls /sys/kernel/slab | wc -l; symlinks /sys/kernel/slab/ | grep dangling | wc -l
  14:17:57 up 85 days, 22:47,  8 users,  load average: 0.85, 0.55, 0.49
 305
 0

On a cluster with the same kernel and a new Slurm this is a state of a node before it starts complaining in the log files.
 rzgenie46{foraker1}22: uptime; ls /sys/kernel/slab | wc -l; symlinks /sys/kernel/slab/ | grep dangling | wc -l
  14:24:05 up 8 days,  3:09,  1 user,  load average: 0.00, 0.01, 0.05
 106884
 44496

With the same kernel and the new Slurm this is a node after errors start appearing in the logs but before it crashes
 rzgenie11{foraker1}22: uptime; ls /sys/kernel/slab | wc -l; symlinks /sys/kernel/slab/ | grep dangling | wc -l
  14:25:02 up 2 days, 7 min,  2 users,  load average: 0.08, 0.03, 0.92
 480947
 288058

On a test machine with 7.4.z and a new Slurm
 [root@opal108:~]# uptime; ls /sys/kernel/slab | wc -l; symlinks /sys/kernel/slab/ | grep dangling | wc -l
  14:17:44 up 23 days,  6:10,  1 user,  load average: 0.00, 0.01, 1.35
 55687
 41502

In the log on PSM machines we see:

2017-10-25 12:01:52 [12719.510450] kmem_cache_destroy qib-user-sdma-pkts-0-02.00(161:step_172): Slab cache still has objects
2017-10-25 13:41:45 [18712.867408] kmem_cache_destroy qib-user-sdma-pkts-0-02.00(335:step_337): Slab cache still has objects
2017-10-25 13:55:54 [19561.888042] kmem_cache_destroy qib-user-sdma-pkts-0-02.00(349:step_382): Slab cache still has objects
2017-10-25 14:00:52 [19860.274998] kmem_cache_destroy qib-user-sdma-pkts-0-02.00(353:step_397): Slab cache still has objects

but on the PSM2 machines we see a different error:

[169008.932378] cache_from_obj: Wrong slab cache. kmalloc-64(8628:step_2444) but
 object is from kmem_cache_node
[169008.943989] cache_from_obj: Wrong slab cache. kmalloc-256(8628:step_2444) bu
t object is from kmem_cache
[169018.286339] cache_from_obj: Wrong slab cache. kmalloc-64(8685:step_2503) but
 object is from kmem_cache_node
[169018.298377] cache_from_obj: Wrong slab cache. kmalloc-64(8685:step_2503) but
 object is from kmem_cache_node

Eventually when there are excessive numbers of slab caches the node crashes with something like:

2017-10-23 04:29:29 [234888.228548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8
2017-10-23 04:29:29 [234888.238069] IP: [<ffffffff811fa883>] mem_cgroup_css_offline+0x133/0x170
2017-10-23 04:29:29 [234888.246169] PGD 1cf4281067 PUD 11e728f067 PMD 0
2017-10-23 04:29:29 [234888.252066] Oops: 0000 1 SMP
2017-10-23 04:29:29 [234888.256337] Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) kpatch_stack_clash_514_26_1_1chaos_kpatch(OE) kpatch(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport intel_powerclamp coretemp intel_rapl iosf_mbi kvm iTCO_wdt irqbypass ipmi_devintf iTCO_vendor_support hfi1 sb_edac rdmavt pcspkr edac_core lpc_ich shpchp i2c_i801 ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc ib_ipoib rdma_ucm ib_ucm msr_safe(OE) ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache mgag200 crct10dif_pclmul 8021q crct10dif_common crc32_pclmul garp drm_kms_helper stp syscopyarea llc sysfillrect crc32c_intel mrp ghash_clmulni_intel sysimgblt aesni_intel igb lrw scsi_transport_iscsi gf128mul fb_sys_fops dca glue_helper ttm ahci mxm_wmi ablk_helper ptp cryptd libahci drm pps_core i2c_algo_bit libata i2c_core fjes wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
2017-10-23 04:29:29 [234888.382291] CPU: 12 PID: 131244 Comm: slurmstepd Tainted: G W OE K------------ 3.10.0-514.26.1.1chaos.ch6_1.x86_64 #1
2017-10-23 04:29:29 [234888.396817] Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0020.122820161512 12/28/2016

[169020.702294] BUG: unable to handle kernel NULL pointer dereference at 0000000
0000000b8
[169020.712241] IP: [<ffffffff811fa883>] mem_cgroup_css_offline+0x133/0x170
[169020.720775] PGD 7ca706067 PUD f2a6c2067 PMD 0 
[169020.726856] Oops: 0000 [#1] SMP 
[169020.731585] Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) kpatch_stack_clash_514_26_1_1chaos_kpatch(OE) kpatch(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_powerclamp coretemp intel_rapl iosf_mbi hfi1 kvm rdmavt iTCO_wdt irqbypass ipmi_devintf iTCO_vendor_support ib_core sb_edac lpc_ich edac_core pcspkr i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc msr_safe(OE) nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache 8021q garp stp
[169020.818684]  llc crct10dif_pclmul crct10dif_common mrp crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel drm_kms_helper scsi_transport_iscsi igb syscopyarea aesni_intel sysfillrect sysimgblt lrw gf128mul fb_sys_fops dca glue_helper ttm ahci ablk_helper ptp mxm_wmi cryptd libahci drm pps_core i2c_algo_bit libata i2c_core fjes wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[169020.863456] CPU: 65 PID: 180664 Comm: slurmstepd Tainted: G        W  OE K------------   3.10.0-514.26.1.1chaos.ch6_1.x86_64 #1
[169020.879535] Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0020.122820161512 12/28/2016
[169020.893659] task: ffff8817102f0fb0 ti: ffff88168e680000 task.ti: ffff88168e680000
[169020.903834] RIP: 0010:[<ffffffff811fa883>]  [<ffffffff811fa883>] mem_cgroup_css_offline+0x133/0x170
[169020.915852] RSP: 0018:ffff88168e683dd0  EFLAGS: 00010286
[169020.923654] RAX: 0000000000000000 RBX: ffff881e1c209480 RCX: ffff8815c02c0000
[169020.933528] RDX: ffff88097b0b7c00 RSI: ffff88017fc08e00 RDI: 0000000000001400
[169020.943379] RBP: ffff88168e683df8 R08: 0000000000000000 R09: 0000000000000000
[169020.953217] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88097b0b7f18
[169020.963069] R13: ffff88097b0b7f08 R14: ffff880fcdb26800 R15: ffff88168e683fd8
[169020.972935] FS:  00002aaaaab12d00(0000) GS:ffff88203e540000(0000) knlGS:0000000000000000
[169020.983899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[169020.992242] CR2: 00000000000000b8 CR3: 0000000457d91000 CR4: 00000000003407e0
[169021.002158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[169021.012072] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[169021.021976] Stack:
[169021.026133]  ffff880fcdb26800 ffffffff81b05660 ffff8817cad27680 0000000000000000
[169021.036389]  ffff882032667a00 ffff88168e683e40 ffffffff81112397 00000000fffffffe
[169021.046663]  ffff88168e683e58 ffff8817cad27680 ffff8808f36ee120 0000000000000000
[169021.056929] Call Trace:
[169021.061630]  [<ffffffff81112397>] cgroup_destroy_locked+0xe7/0x360
[169021.070506]  [<ffffffff81112632>] cgroup_rmdir+0x22/0x40
[169021.078427]  [<ffffffff812147a8>] vfs_rmdir+0xa8/0x100
[169021.086141]  [<ffffffff81218ca5>] do_rmdir+0x1a5/0x200
[169021.093838]  [<ffffffff81219e56>] SyS_rmdir+0x16/0x20
[169021.101459]  [<ffffffff816ae549>] system_call_fastpath+0x16/0x1b
[169021.110131] Code: 48 85 d2 48 8b 88 b8 00 00 00 48 c7 c0 ff ff ff ff 74 07 48 63 82 40 03 00 00 48 8b 44 c1 08 48 8b 35 3a 40 90 00 bf 00 14 00 00 <48> 8b 90 b8 00 00 00 c6 42 28 01 48 8b 90 b8 00 00 00 48 83 c2 
[169021.135947] RIP  [<ffffffff811fa883>] mem_cgroup_css_offline+0x133/0x170
[169021.145477]  RSP <ffff88168e683dd0>
[169021.151407] CR2: 00000000000000b8

The dump stack from cache_from_object prints:
[ 6613.519407] cache_from_obj: Wrong slab cache. kmalloc-64(73:step_0) but objec
t is from kmem_cache_node
[ 6613.530358] ------------[ cut here ]------------
[ 6613.535972] WARNING: at mm/slab.h:249 kmem_cache_free+0x1d8/0x230()
[ 6613.543400] Modules linked in: osc(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) kpatch_stack_clash_514_26_1_1chaos_kpatch(OE) kpatch(OE) xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables nfsv3 nf_log_ipv4 nf_log_common xt_LOG xt_multiport ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_powerclamp coretemp intel_rapl iosf_mbi hfi1 kvm rdmavt iTCO_wdt irqbypass ipmi_devintf iTCO_vendor_support ib_core sb_edac lpc_ich edac_core pcspkr i2c_i801 shpchp ipmi_si ipmi_msghandler acpi_power_meter acpi_cpufreq binfmt_misc msr_safe(OE) nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache 8021q garp stp
[ 6613.625935]  llc crct10dif_pclmul crct10dif_common mrp crc32_pclmul crc32c_intel mgag200 ghash_clmulni_intel drm_kms_helper scsi_transport_iscsi igb syscopyarea aesni_intel sysfillrect sysimgblt lrw gf128mul fb_sys_fops dca glue_helper ttm ahci ablk_helper ptp mxm_wmi cryptd libahci drm pps_core i2c_algo_bit libata i2c_core fjes wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[ 6613.667238] CPU: 8 PID: 115530 Comm: ares.weekly Tainted: G           OE K------------   3.10.0-514.26.1.1chaos.ch6_1.x86_64 #1
[ 6613.681346] Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0020.122820161512 12/28/2016
[ 6613.694394]  0000000000000000 0000000073dc6f76 ffff881034ea7c10 ffffffff8169d5bc
[ 6613.703419]  ffff881034ea7c48 ffffffff81087280 ffff88103d848e40 ffff88017fc07f00
[ 6613.712461]  0000000000000400 ffff880ff8ab4500 ffff881d30b6bff0 ffff881034ea7c58
[ 6613.721511] Call Trace:
[ 6613.724988]  [<ffffffff8169d5bc>] dump_stack+0x19/0x1b
[ 6613.731490]  [<ffffffff81087280>] warn_slowpath_common+0x70/0xb0
[ 6613.738978]  [<ffffffff810873ca>] warn_slowpath_null+0x1a/0x20
[ 6613.746263]  [<ffffffff811e58c8>] kmem_cache_free+0x1d8/0x230
[ 6613.753459]  [<ffffffff811e5980>] free_kmem_cache_nodes+0x60/0xa0
[ 6613.761050]  [<ffffffff811e7fbc>] kmem_cache_close+0x27c/0x330
[ 6613.768356]  [<ffffffff811e8794>] __kmem_cache_shutdown+0x14/0x80
[ 6613.775961]  [<ffffffff811ad1d4>] kmem_cache_destroy+0x44/0xf0
[ 6613.783299]  [<ffffffffa38b55f2>] hfi1_user_sdma_free_queues+0x1b2/0x210 [hfi1]
[ 6613.792294]  [<ffffffffa387fb7a>] hfi1_file_close+0x7a/0x350 [hfi1]
[ 6613.800123]  [<ffffffff8120971c>] __fput+0xfc/0x270
[ 6613.806410]  [<ffffffff812099de>] ____fput+0xe/0x10
[ 6613.812713]  [<ffffffff810b00b4>] task_work_run+0xb4/0xe0
[ 6613.819592]  [<ffffffff8108db0b>] do_exit+0x2eb/0xa80
[ 6613.826093]  [<ffffffff816a9984>] ? __do_page_fault+0x184/0x4a0
[ 6613.833578]  [<ffffffff8108e31f>] do_group_exit+0x3f/0xa0
[ 6613.840480]  [<ffffffff8108e394>] SyS_exit_group+0x14/0x20
[ 6613.847456]  [<ffffffff816ae549>] system_call_fastpath+0x16/0x1b
[ 6613.855017] ---[ end trace a39e704f62b38bb2 ]---

The maintainers of SLURM have already looked at the problem and bounced it off of the OPA maintainers at Intel and they said that it was a core kernel problem. See: https://bugs.schedmd.com/show_bug.cgi?id=3694

crash> dis -s mem_cgroup_css_offline+0x133
FILE: mm/memcontrol.c
LINE: 3503
 
  3498                  return;
  3499  
  3500          mutex_lock(&memcg->slab_caches_mutex);
  3501          list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
  3502                  cachep = memcg_params_to_cache(params);
* 3503                  cachep->memcg_params->dead = true;
  3504                  schedule_work(&cachep->memcg_params->destroy);
  3505          }
  3506          mutex_unlock(&memcg->slab_caches_mutex);
  3507  }
 
crash> dis mem_cgroup_css_offline+0x133 2
0xffffffff811fa883 <mem_cgroup_css_offline+307>:        mov    0xb8(%rax),%rdx
0xffffffff811fa88a <mem_cgroup_css_offline+314>:        movb   $0x1,0x28(%rdx)
crash> *memcg_cache_params.dead -ox
struct memcg_cache_params {
  [0x28]         bool dead;
}
crash> *kmem_cache.memcg_params -ox              
struct kmem_cache {
    [0xb8] struct memcg_cache_params *memcg_params;
}

Version-Release number of selected component (if applicable):
We see the crashes with the latest 7.3.z in production as well as the symptoms leading to the crashes on our test clusters which are running the latest 7.4.z. The fact that we have yet to have the crashes on the test clusters yet is easily attributed to the number of jobs that we are running on them.

We initially thought that it might be a problem in the 7.3.z kernels and thought that https://bugzilla.redhat.com/show_bug.cgi?id=1470325 might be the culprit but we are also seeing the kcache leak on Opal a test cluster running an up to date 7.4 kernel. However because of the lower number of jobs and the size on the test clusters we have yet to see a crash on that cluster.

How reproducible:
It takes a while to reproduce and seems to be tied to how many psm2 using jobs have been run.


Steps to Reproduce:
Still working on a minimal reproducer. Right now it still seems to be:
Install and configure the latest SLURM on an OPA cluster and then run MPI jobs which use PSM2 for intra-node MPI traffic.

Additional info:
We looked at https://bugzilla.redhat.com/show_bug.cgi?id=1502818 and we have some hope that it is a duplicate of that bug but the backport for the patch wasn't trivial and we haven't seen a RHEL7 backport yet.

Comment 6 Ben Woodard 2017-10-30 17:43:28 UTC

We found that when we turned off ConstrainKmemSpace=no in the slurm configuration there were no crashes over the weekend. Therefore it appears that the kmem cg is strongly implicated as the source of the problem. I asked LLNL's kernel engineer to apply a44cb9449182fd7b25bf5f1cc38b7f19e0b96f6d which could deal with at least part of the problem. ref: https://bugzilla.redhat.com/show_bug.cgi?id=1442618

Comment 7 Ben Woodard 2017-10-31 19:38:05 UTC

From the reporter:

I did some testing on a 7.4 node today, and I was able to trivially recreate the behavior where setting a kmem limit in a cgroup causes a number of slab caches to get created that don't go away when the cgroup is removed. The steps were,
# mkdir /sys/fs/cgroup/memory/kmem_test
# echo 137438953472 > /sys/fs/cgroup/memory/kmem_test/memory.kmem.max_usage_in_bytes (128GB, total RAM in the node)
# Spawn a shell and echo its pid into /sys/fs/cgroup/memory/kmem_test/tasks to put the shell into the cgroup.
# Do a little work in the shell. Pinging the management node seemed to be enough.
# 'ls /sys/kernel/slab | grep kmem_test' will show a bunch of new slab caches
# Exit the shell. When I tried this, it took the shell 30-60 seconds to exit. Not sure what was up there.
# rmdir /sys/fs/cgroup/memory/kmem_test
# 'ls /sys/kernel/slab | grep kmem_test' shows the slabs are still there.

No panics/messages to syslog though, and no broken symlinks in /sys/kernel/slab. I'm not rightly certain whether all of the child slab caches are supposed to go away when the cgroup is removed or not, but letting them build up into the hundreds of thousands probably isn't good behavior.

----------

Is that the expected behavior or do we have our reproducer here? What is supposed to reap these now abandoned slab caches? It seems like the build up of these slab caches is what ultimately crashes the node. So the reproducer that ultimately leads to the crash is probably something like what is described above but iterated many times.

Comment 8 Josko Plazonic 2017-11-09 16:30:31 UTC

Good morning,

I hope you don't mind if I butt in.  We have a cluster of 80 Intel OPA/PSM2 nodes running 7.4, slurm 17.04 with memory enforcement and with hfi1 drivers updated to Intel's latest 10.6.x.x that's experiencing similar issues. E.g. - after a while lots of:

[684578.150960] cache_from_obj: Wrong slab cache. kmem_cache_node but object is from kmalloc-64(6281:step_0)
[684578.162326] cache_from_obj: Wrong slab cache. kmem_cache_node but object is from kmalloc-64(6281:step_0)
[684578.173546] cache_from_obj: Wrong slab cache. kmem_cache but object is from kmalloc-256(6281:step_0)
[684578.186536] cache_from_obj: Wrong slab cache. kmem_cache_node but object is from kmalloc-64(6281:step_0)
[684578.197869] cache_from_obj: Wrong slab cache. kmem_cache_node but object is from kmalloc-64(6281:step_0)
[684578.210447] cache_from_obj: Wrong slab cache. kmem_cache but object is from kmalloc-256(6281:step_0)

in logs and we see huge /sys/kernel/slab dirs (e.g. 120k entries). No node crashes but we do get weird program crashes that we cannot easily explain any other way. 

Anyway, I just wanted to report that we tried applying a44cb9449182fd7b25bf5f1cc38b7f19e0b96f6d to 3.10.0-693.5.2.el7 and while it does not fix the problem fully it does help.  We ran repeatedly a simple MPI program (report rank) and on node without patch
ls /sys/kernel/slab|grep step|wc -l
grew at twice the rate vs the node with patch (after an hour 2043 vs 965). We'd be happy to try more things if that might help.

Thanks,

Josko

Comment 9 Travis Gummels 2017-11-09 17:05:51 UTC

Hi,

Can you both try this patch and report back it's effect on the issue:

commit f57947d27711451a7739a25bba6cddc8a385e438
Author: Li Zefan <lizefan>
Date:   Tue Jun 18 18:41:53 2013 +0800

    cgroup: fix memory leak in cgroup_rm_cftypes()
    
    The memory allocated in cgroup_add_cftypes() should be freed. The
    effect of this bug is we leak a bit memory everytime we unload
    cfq-iosched module if blkio cgroup is enabled.

I was advised the above might help.

Travis

Comment 10 Ben Woodard 2017-11-09 20:39:19 UTC

Travis, before we ask LLNL to test random patches. Can we do some testing to make sure that we don't continue to see evidence of the problem using the reproducer provided?

Comment 11 Josko Plazonic 2017-11-09 20:51:30 UTC

Good afternoon,

just finished testing also with the 2nd patch and there isn't a huge difference.  After 60 MPI runs I get 3175 items in /sys/kernel/slab on unpatched, 1285 with first patch, 1290 with both.  I also kept list of /sys/kernel/slab dir after every test - if you want to compare what's happening - and can put it up somewhere.

Thanks for the suggestion - if it can help keep'em coming.

Comment 13 chenxu 2017-12-14 02:04:56 UTC

encountered this issue with nvidia-docker(cgroup open) and 3.10.0-514.el7.x86_64 on centos

[6728212.703168]  [<ffffffff811c35c4>] __kmem_cache_shutdown+0x14/0x80
[6728212.703196]  [<ffffffff8118ceaf>] kmem_cache_destroy+0x3f/0xe0
[6728212.703222]  [<ffffffff811d4949>] kmem_cache_destroy_memcg_children+0x89/0xb0
[6728212.703252]  [<ffffffff8118ce84>] kmem_cache_destroy+0x14/0xe0
[6728212.703284]  [<ffffffffa167a507>] deinit_chunk_split_cache+0x77/0xa0 [nvidia_uvm]
[6728212.703321]  [<ffffffffa167c11e>] uvm_pmm_gpu_deinit+0x3e/0x70 [nvidia_uvm]
[6728212.703355]  [<ffffffffa1650b10>] remove_gpu+0x220/0x300 [nvidia_uvm]
[6728212.703387]  [<ffffffffa1650e31>] uvm_gpu_release_locked+0x21/0x30 [nvidia_uvm]
[6728212.703423]  [<ffffffffa1654588>] uvm_va_space_destroy+0x348/0x3b0 [nvidia_uvm]
[6728212.703458]  [<ffffffffa164a401>] uvm_release+0x11/0x20 [nvidia_uvm]
[6728212.703486]  [<ffffffff811e0329>] __fput+0xe9/0x270
[6728212.703509]  [<ffffffff811e05ee>] ____fput+0xe/0x10
[6728212.703532]  [<ffffffff810a22f4>] task_work_run+0xc4/0xe0
[6728212.703557]  [<ffffffff810815eb>] do_exit+0x2cb/0xa60
[6728212.703580]  [<ffffffff810e1f22>] ? __unqueue_futex+0x32/0x70
[6728212.703605]  [<ffffffff810e2f7d>] ? futex_wait+0x11d/0x280
[6728212.704619]  [<ffffffff81081dff>] do_group_exit+0x3f/0xa0
[6728212.705598]  [<ffffffff81092c10>] get_signal_to_deliver+0x1d0/0x6d0
[6728212.706559]  [<ffffffff81014417>] do_signal+0x57/0x6c0
[6728212.707495]  [<ffffffff8110b796>] ? __audit_syscall_exit+0x1e6/0x280
[6728212.708407]  [<ffffffff81014adf>] do_notify_resume+0x5f/0xb0

Comment 29 Aaron Tomlin 2018-02-19 10:32:18 UTC

Ben,

Can you please try the attached patches (non obsolete)?
The following Linus' commits are included:

 363a044 memcg, slab: kmem_cache_create_memcg(): fix memleak on fail path
 1aa1325 memcg, slab: clean up memcg cache initialization/destruction
 2edefe1 memcg, slab: fix races in per-memcg cache creation/destruction
 b852990 memcg, slab: do not destroy children caches if parent has aliases

Comment 30 Josko Plazonic 2018-02-20 22:06:16 UTC

Hullo,

I am not Ben but I gave it a try nevertheless.  Patches apply cleanly to 693.17.1 kernel but don't seem to help at all.  Number of entries in /sys/kernel/slab increases at almost exactly the same rate as for clean 693.17.1 kernel.

Josko

Comment 31 Kirill Kolyshkin 2018-03-08 19:29:12 UTC

There are a few reports of the same thing happening when running Docker here: https://github.com/moby/moby/issues/35446

Kernel versions reported there are all different, but they look recent.

Comment 32 xyh 2018-03-22 09:33:02 UTC

#!/bin/bash 
mkdir -p pages
for x in `seq 1280000`; do
        [ $((x % 1000)) -eq 0 ] && echo $x
        mkdir /sys/fs/cgroup/memory/foo
        # echo 1M > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes
        echo 100M > /sys/fs/cgroup/memory/foo/memory.kmem.limit_in_bytes
        echo $$ >/sys/fs/cgroup/memory/foo/cgroup.procs
        memhog 4K &>/dev/null
        echo trex>pages/$x
        echo $$ >/sys/fs/cgroup/memory/cgroup.procs 
        rmdir /sys/fs/cgroup/memory/foo 
done




[root@localhost ~]# cat /proc/mounts 
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=4077588k,nr_inodes=1019397,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs rw,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
/dev/mapper/centos-root / xfs rw,relatime,attr2,inode64,noquota 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=31,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
/dev/vda1 /boot xfs rw,relatime,attr2,inode64,noquota 0 0


i encountered same issue use sciript above 

i got this scripit from
https://github.com/torvalds/linux/commit/73f576c04b9410ed19660f74f97521bee6e1c546

and did some changes to reproduce 

"Docker leaking cgroups causing no space left on device? #29638"
https://github.com/moby/moby/issues/29638

unfortunately,encountered  this bug .

on Longterm stable 4.4.121 ,i did not see the same scence.

FYI.

Comment 34 Cody Jarrett 2018-05-15 15:54:35 UTC

Encountering this same issue with Centos 7, kernel 3.10.0-693.21.1.el7.x86_64, and docker-ce-17.12.1.ce-1.el7.centos.x86_64. The behavior we see is after a period of heavy container usage on hosts, the /sys/kernel/slab dir will have over 1 mil files/dirs, and new containers won't startup anymore. 

Also the following is logged heavily:
May 14 19:50:06 hostname kernel: cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmem_cache(5071:9934c4c3821bcb474f5a113d5a1822c276fcf196d6c2e31b1b8dd969df9d50d2)
May 14 19:50:06 hostname kernel: cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmem_cache(5071:9934c4c3821bcb474f5a113d5a1822c276fcf196d6c2e31b1b8dd969df9d50d2)
May 14 19:50:06 hostname kernel: cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmem_cache(5071:9934c4c3821bcb474f5a113d5a1822c276fcf196d6c2e31b1b8dd969df9d50d2)
May 14 19:50:06 hostname kernel: cache_from_obj: Wrong slab cache. kmalloc-256 but object is from kmem_cache(5071:9934c4c3821bcb474f5a113d5a1822c276fcf196d6c2e31b1b8dd969df9d50d2)

Comment 36 Di Weng 2018-07-10 09:25:33 UTC

Looks like Kubernetes users are also experiencing this issue, where the recent changes of enabling kmem limit in Kubernetes cause cgroups to leak. For details please refer to https://github.com/kubernetes/kubernetes/issues/61937.

Comment 40 Cody Jarrett 2018-08-13 18:57:29 UTC

Is this just pending someone testing the patches?

Comment 41 Joe Thompson 2018-08-15 02:11:27 UTC

1442618 referenced up near the top is a private bug -- can the relevant parts be copied here (or has that already been done), or is there another bug that has them?

Comment 44 Anand Patil 2018-08-23 20:31:17 UTC

This is biting us and at least one of our customers via the Kubernetes issue linked above. The following script repros it reliably on a Kubernetes node running kernel 3.10.0-862.9.1.el7.x86_64 :

#!/bin/bash
run_some_pods () {
  for j in {1..20}
  do
    kubectl run -i -t busybox-${j} --image=busybox --restart=Never -- echo "hi" &> /dev/null &
  done
  sleep 1
  wait
  for j in {1..20}
  do
    kubectl delete pods busybox-${j} --grace-period=0 --force &> /dev/null || true
  done
}
for i in {1..3279}
do
  echo "Running pod batch $i"
  run_some_pods
done

We'd be happy to test any patches.

Comment 45 Anand Patil 2018-08-29 16:16:54 UTC

We were able to work around the issue in the context of kubernetes 1.8.12 by doing the following on startup. Before running the v1.8 kubelet, we run a v1.6 kubelet with suitable arguments, and verify that it completes enough of its startup process to create /sys/fs/cgroup/memory/kubepods . Then, we start up the kubernetes 1.8.12 cluster as normal.


We tried creating those cgroup folders manually as a simpler version of the above workaround using the following script:

#!/bin/bash

for thing in memory blkio "cpu,cpuacct" cpuset devices freezer hugetlb memory net_cls perf_event systemd
do
   mkdir-p /sys/fs/cgroup/${thing}/kubepods
   mkdir -p /sys/fs/cgroup/${thing}/kubepods/burstable
   mkdir -p /sys/fs/cgroup/${thing}/kubepods/besteffort
done 

However, this simpler attempt did not prevent the repro script above from showing the issue.


What would we need to add to this simpler workaround to prevent the issue as starting a v1.6 kubelet does?

Comment 46 Vamsee Yarlagadda 2018-08-30 00:36:11 UTC

Posting stats from one of the machines that shows this problem:

Problem:
> [root@vd0916 ~]# mkdir /sys/fs/cgroup/memory/b6c438c7ba469d5c3bad89c4a81d732a68a375b096d3791c4847cd517f8eee7c
> mkdir: cannot create directory ‘/sys/fs/cgroup/memory/b6c438c7ba469d5c3bad89c4a81d732a68a375b096d3791c4847cd517f8eee7c’: No space left on device

Some metrics:
> [root@vd0916 ~]# ll /sys/kernel/slab | wc -l 
> 2587635

> [root@vd0916 ~]# slabtop -s -c

 Active / Total Objects (% used)    : 87188577 / 222858888 (39.1%)
 Active / Total Slabs (% used)      : 4716565 / 4716565 (100.0%)
 Active / Total Caches (% used)     : 77 / 118 (65.3%)
 Active / Total Size (% used)       : 17769913.34K / 63916938.41K (27.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.29K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
29129856 15635925  53%    0.19K 693568       42   5548544K dentry
23371439 5516568  23%    0.57K 417348       56  13355136K radix_tree_node
22468896 22386411  99%    0.11K 624136       36   2496544K sysfs_dir_cache
19747455 13294931  67%    0.10K 506345       39   2025380K buffer_head
17578139 5788599  32%    0.15K 331663       53   2653304K xfs_ili
15482112  88288   0%    0.25K 241908	   64   3870528K kmalloc-256
11236544  44493   0%    0.50K 175571	   64   5618272K kmalloc-512
11229792 1416428  12%    0.38K 267376       42   4278016K blkdev_requests
11110932 5855107  52%    0.09K 264546       42   1058184K kmalloc-96
9687120 1467586  15%    1.06K 322904	   30  10332928K xfs_inode
9019200 3112167  34%    0.12K 140925	   64   1127400K kmalloc-128
8103040 894521  11%    0.06K 126610	  64    506440K kmalloc-64
7442190 2117485  28%    0.19K 177195	   42   1417560K kmalloc-192
7131390  32130   0%    0.23K 101877	  70   1630032K cfq_queue
4258608 753366  17%    0.66K  88721	  48   2839072K shmem_inode_cache
3663990 2814407  76%    0.58K  66618	   55   2131776K inode_cache
3575296 382122  10%    0.01K   6983	 512     27932K kmalloc-8
3378048 2483023  73%    0.03K  26391	  128    105564K kmalloc-32
1172800 1168793  99%    0.06K  18325	   64     73300K kmem_cache_node
660992 152980  23%    0.02K   2582	256     10328K kmalloc-16
588736 587894  99%    0.25K   9199	 64    147184K kmem_cache
570556 336689  59%    0.64K  11644	 49    372608K proc_inode_cache
527904  28209   5%    1.00K  16497	 32    527904K kmalloc-1024
308160   6622   2%    1.56K  15408	 20    493056K mm_struct
225215   4247   1%    1.03K   7265	 31    232480K nfs_inode_cache
201952 133963  66%    4.00K  25244        8    807808K kmalloc-4096
132600   3240   2%    0.39K   3315	 40     53040K xfs_efd_item
123368 123368 100%    0.07K   2203	 56	 8812K Acpi-ParseExt
110048  37821  34%    2.00K   6878	 16    220096K kmalloc-2048
 71055   3090   4%    2.06K   4737	 15    151584K idr_layer_cache
 51684  50881  98%    0.05K    708	 73	 2832K fanotify_event_info
 51000  50933  99%    0.12K    750	 68	 6000K dnotify_mark
 37504  36803  98%    0.06K    586	 64	 2344K anon_vma
 33303  33303 100%    0.08K    653	 51	 2612K selinux_inode_security
 28083  27187  96%    0.21K    759	 37	 6072K vm_area_struct
 27534  18839  68%    0.81K    706	 39     22592K task_xstate
 26214  21624  82%    0.04K    257	102	 1028K Acpi-Namespace
 26163  25143  96%    0.31K    513	 51	 8208K nf_conntrack_ffffffff81a26d80
 25959  25808  99%    0.62K    509	 51     16288K sock_inode_cache
 25530  25312  99%    0.09K    555	 46	 2220K configfs_dir_cache
 24565  24225  98%    0.05K    289	 85	 1156K shared_policy_node
 23205  22976  99%    0.10K    595	 39	 2380K blkdev_ioc
 20416  20197  98%    0.18K    464	 44	 3712K xfs_log_ticket


> cat /proc/slabinfo

   <<<< See file attached >>>>

Comment 47 Vamsee Yarlagadda 2018-08-30 00:38:31 UTC

Created attachment 1479631 [details]
Output of /proc/slabinfo on the machine showing the issue

Comment 48 Vamsee Yarlagadda 2018-08-30 01:51:23 UTC

Kernel version of the host that I was able to reproduce the issue.

> [root@vd0916 ~]# uname -r
> 3.10.0-327.36.3.el7.x86_64

Reproducibility steps:

> I was running a script similar to what @Anand has posted https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c44.
> By starting a lot of kubernetes pods and killing them in a tight loop.

Comment 49 Anand Patil 2018-08-30 01:53:46 UTC

The source of the strange workaround in comment 45 was this article: http://www.linuxfly.org/kubernetes-19-conflict-with-centos7/? which one of our colleagues translated for us. His summary was:

The reason that upgrading k8s from 1.6 to 1.9 alone does not reproduce the cgroup memory leak issue is that without restarting the node kubelet will re-use the

/sys/fs/cgroup/memory/kubepods

directory that was created by k8s 1.6 kubelet, which has cgroup kernel memory limit disabled by default. Restarting the nodes will result in

/sys/fs/cgroup/memory/kubepods

directory being re-created by k8s 1.9 kubelet and hence will have cgroup kernel memory limit enabled by default.

Tomorrow I'll start a test to double-check the workaround on a second node.

Comment 50 Arun M. Krishnakumar 2018-08-30 02:19:10 UTC

I tried the script in Comment 32 and was able to have the same issue (where a docker run fails due to ENOSPC). The script is copied below for ease of reading:

#!/bin/bash 
mkdir -p pages
for x in `seq 1280000`; do
        [ $((x % 1000)) -eq 0 ] && echo $x
        mkdir /sys/fs/cgroup/memory/foo
        # echo 1M > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes
        echo 100M > /sys/fs/cgroup/memory/foo/memory.kmem.limit_in_bytes
        echo $$ >/sys/fs/cgroup/memory/foo/cgroup.procs
        memhog 4K &>/dev/null
        echo trex>pages/$x
        echo $$ >/sys/fs/cgroup/memory/cgroup.procs 
        rmdir /sys/fs/cgroup/memory/foo 
done

I suspected that this could be due to short-lived processes (our use case potentially has many short-lived containers in an error-handling path). I modified the script to the following with the sleeper being a compiled C program that prints pid (getpid()) and then sleeps for a second:

#!/bin/bash
mkdir -p pages
for x in `seq 1280000`; do
        [ $((x % 1000)) -eq 0 ] && echo $x
        mkdir /sys/fs/cgroup/memory/test2
        # echo 1M > /sys/fs/cgroup/memory/test2/memory.limit_in_bytes
        echo 100M > /sys/fs/cgroup/memory/test2/memory.kmem.limit_in_bytes
        ./sleeper >/sys/fs/cgroup/memory/test2/cgroup.procs
        memhog 4K &>/dev/null
        echo trex>pages/$x
        echo $$ >/sys/fs/cgroup/memory/cgroup.procs
        rmdir /sys/fs/cgroup/memory/test2
done

---------------------------------------
On a machine without the ENOSPC:

ls -lrt /sys/kernel/slab/ | wc -l
251

slabtop -s l -o
 Active / Total Objects (% used)    : 533404 / 535929 (99.5%)
 Active / Total Slabs (% used)      : 18399 / 18399 (100.0%)
 Active / Total Caches (% used)     : 65 / 95 (68.4%)
 Active / Total Size (% used)       : 182723.21K / 183691.12K (99.5%)
 Minimum / Average / Maximum Object : 0.01K / 0.34K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
124131 123889  99%    0.19K   5911       21     23644K dentry
 89908  89908 100%    0.15K   3458       26     13832K xfs_ili
 95280  95280 100%    1.06K   3176       30    101632K xfs_inode
 89427  89427 100%    0.10K   2293       39      9172K buffer_head
 16335  16147  98%    0.58K    605       27      9680K inode_cache
 18072  18072 100%    0.11K    502       36      2008K sysfs_dir_cache


---------------------------------------
On a machine with the ENOSPC:

ls -lrt /sys/kernel/slab/ | wc -l
2491573

sudo slabtop -s l -o
 Active / Total Objects (% used)    : 99818530 / 209669755 (47.6%)
 Active / Total Slabs (% used)      : 4532638 / 4532638 (100.0%)
 Active / Total Caches (% used)     : 125 / 161 (77.6%)
 Active / Total Size (% used)       : 21303475.11K / 59862027.02K (35.6%)
 Minimum / Average / Maximum Object : 0.01K / 0.29K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
30771153 17915952  58%    0.19K 732647       42   5861176K dentry
22472028 22454051  99%    0.11K 624223       36   2496892K sysfs_dir_cache
22912383 19079451  83%    0.10K 587497       39   2349988K buffer_head
18623542 3876899  20%    0.57K 333069       56  10658208K radix_tree_node
8240850 837650  10%    1.06K 274695       30   8790240K xfs_inode
10468080 6422369  61%    0.09K 249240       42    996960K kmalloc-96
10072062 677935   6%    0.38K 239811       42   3836976K blkdev_requests
13732880 415473   3%    0.25K 214577       64   3433232K kmalloc-256

---------------------------------------

I can see that with a slightly long-lived process (as shown below), the case has not been hit even though we have run through 90k iterations. The active slabs is at 18k for 1s long processes while for the short-lived process the active slabs was at 400k at a similar point.

---------------------------------------
time ./sleeper
4427

real	0m1.001s
user	0m0.000s
sys	0m0.001s
---------------------------------------
time echo $$
29978

real	0m0.000s
user	0m0.000s
sys	0m0.000s
---------------------------------------

Based on a superficial understanding of the comments in the bug and the test results, it seems that the 'leak' is exercised in cases where there are many short-lived processes / containers. Is this assumption right? We can implement this as a workaround if this reduces the frequency of the leak.

Comment 51 Arun M. Krishnakumar 2018-08-30 03:02:44 UTC

In continuation to Comment 50, I just tried to run the script with the short-lived process on the side. The 'Active / Total Slabs (% used)' jumped from 18k to 27k in about 1k iterations of the script and does not show any signs of going down. So I strongly suspect that a quickly exiting process could somehow lead to this state.

Could someone please corroborate this based on better understanding of the issue and the patch? We could implement a workaround based on this.

Comment 53 Steven Fishback 2018-09-20 13:56:51 UTC

I cannot view External Bug ID: Red Hat Knowledge Base (Solution) 3562061

Comment 54 Cody Jarrett 2018-09-21 15:15:06 UTC

I cannot see that KB article either? Can you please update this?

Comment 58 Aaron Tomlin 2018-10-15 12:53:58 UTC

(In reply to Vamsee Yarlagadda from comment #48)
> Kernel version of the host that I was able to reproduce the issue.
> > [root@vd0916 ~]# uname -r
> > 3.10.0-327.36.3.el7.x86_64

As per Bug 1502818, can you attempt to reproduce the reported issue under
kernel-3.10.0-862.el7?

Comment 60 Erik Lattimore 2018-10-17 15:37:49 UTC

We seem to be experiencing this issue with short lived containers with kernel 3.10.0-862.9.1.el7.x86_64 and Docker version 18.03.1-ce, build 9ee9f40

Comment 62 Kirill Kolyshkin 2018-11-01 21:46:06 UTC

So, as far as I know kernel memory controller in RHEL7 kernel is very old, useless*, and buggy and it either needs to be disabled entirely, or fixed by backported changes from 4.x kernels (where it was basically rewritten from scratch). A workaround is NOT to enable/use kmem; I am working on that in https://github.com/opencontainers/runc/pull/1921 (or https://github.com/opencontainers/runc/pull/1920).

* it is useless as there's no reclamation of kernel memory, i.e. dcache and icache entries are not being reclaimed once we run out of kernel memory in this cgroup.

Comment 63 Kirill Kolyshkin 2018-11-01 22:01:10 UTC

RHEL 7.6 kernel should have version number of 3.10.0-957 -- can anyone check if the bug is fixed in there?

Comment 64 Aristeu Rozanski 2018-11-01 23:16:55 UTC

The issue is a bit different, but same symptoms. Upstream the slab caches aren't
immediately freed but only when memory pressure occours. Because of changes
present upstream, it's possible to free the memcg id as soon as it goes offline
so the slab caches and corresponding memcg keep existing until everything else
is freed.

Two things are happening in RHEL7: kmem caches aren't being emptied even during
memory pressure and the memcg id isn't being freed so the -ENOMEM and failure
to create new memcgs come from the fact that there're 65535 memory cgroups still
using an id waiting for the slab caches to be freed.

This has not been fixed in 7.6 and I'm working on the fix. It's possible that
just solving the first problem will address both, but I'm not sure yet.

Comment 71 Aristeu Rozanski 2019-05-08 19:59:06 UTC

Test kernel: http://people.redhat.com/~arozansk/a18a3f62f17ae367d23647cad92c05f0b5ca39dc/

Known issues: dentry isn't being accounted; moving a process while it's allocating kmem with
existing or new slab caches will leak both cache and the old memory cgroup.

In practical terms this test kernel should reduce significantly the kmem cache and memory cgroup leaks in short lived containers where the processes running in the container finish before the
cgroup is removed.

I'll keep working on the remaining issues.

Comment 74 Kirill Kolyshkin 2019-06-07 01:11:10 UTC

Aristeu, is there any decent way to see which backports have you applied (other than to download src.rpm and try to figure it out)?

In general, all related upstream fixes should be from Vladimir Davydov (vdavydov and vdavydov.dev).

Comment 75 Aristeu Rozanski 2019-06-07 14:05:37 UTC

Created attachment 1578367 [details]
Test patches.

Attached.

Comment 76 Aristeu Rozanski 2019-06-14 13:24:09 UTC

Folks, can I get some feedback on the test kernel please?

Comment 87 Markus Schibli 2019-07-18 12:04:18 UTC

Created attachment 1591753 [details]
Slab active/Total Graph

Comment 88 Markus Schibli 2019-07-18 12:04:48 UTC

Created attachment 1591754 [details]
Slab active/Total Graph data

Comment 101 Jan Stancek 2019-08-16 07:19:39 UTC

Patch(es) committed on kernel-3.10.0-1075.el7

Comment 105 Aaron 2019-08-19 12:26:17 UTC

Quick question about the kernel the update is committed on... I see that 7.7 was released with kernel 3.10.0-1062.el7 earlier this month and in https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c101 Jan Stancek mentioned the patch was included in kernel 3.10.0-1075.el7. RHEL 8.0 starts with the new 4.18.0 kernel. Typically the kernel updates are 1062.x.x releases, i.e. the latest kernel for 7.6 is 3.10.0-957.27.4.el7.

My question is, which release of RHEL is 3.10.0-1075.el7 slated to be included in? 7.8?

Comment 106 Aristeu Rozanski 2019-08-19 14:08:24 UTC

(In reply to Aaron from comment #105)
> Quick question about the kernel the update is committed on... I see that 7.7
> was released with kernel 3.10.0-1062.el7 earlier this month and in
> https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c101 Jan Stancek
> mentioned the patch was included in kernel 3.10.0-1075.el7. RHEL 8.0 starts
> with the new 4.18.0 kernel. Typically the kernel updates are 1062.x.x
> releases, i.e. the latest kernel for 7.6 is 3.10.0-957.27.4.el7.
> 
> My question is, which release of RHEL is 3.10.0-1075.el7 slated to be
> included in? 7.8?

7.8. These fixes are likely to be present in a future zstream release.

Comment 108 Rafael Aquini 2019-08-21 19:27:20 UTC

*** Bug 1739145 has been marked as a duplicate of this bug. ***

Comment 118 Aaron 2019-09-13 17:08:45 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c106

Thanks @Aristeu, I see https://bugzilla.redhat.com/show_bug.cgi?id=1748237 was created for a z-stream update to 7.5 with status of "ON_QA". Since 7.8 has not yet been released, is there a z-stream 3.10.0-1062.x release in the works as well?

Comment 120 Aristeu Rozanski 2019-09-16 13:46:21 UTC

Hi Aaron,

(In reply to Aaron from comment #118)
> https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c106
> 
> Thanks @Aristeu, I see https://bugzilla.redhat.com/show_bug.cgi?id=1748237
> was created for a z-stream update to 7.5 with status of "ON_QA". Since 7.8
> has not yet been released, is there a z-stream 3.10.0-1062.x release in the
> works as well?

yes, there's a BZ to include it on 7.7 kernel as well (1752421)

Comment 121 Chao Ye 2019-10-08 07:18:28 UTC

Verified with kernel-3.10.0-1098.el7:
============================================
slableakv1 - ppc64le:
......
Wait for SUnreclaim stabilized
SUnreclaim stabilized now
One exec per cpu on 8 - Iterations: 4000 - Before: SUnreclaim: 164672 kB - After: SUnreclaim: 190400 kB
Total Leak: 25728 kB - Leak per execute: .8040 kB - Total time: 5863s
Kernel: 3.10.0-1098.el7.ppc64le - podman: podman version 1.4.4

slableakv2 - ppc64le:
......
Wait for SUnreclaim stabilized
SUnreclaim stabilized now
One exec per cpu on 8 - Iterations: 4000 - Before: SUnreclaim: 168384 kB - After: SUnreclaim: 199488 kB
Total Leak: 31104 kB - Leak per execute: .9720 kB - Total time: 5172s
Kernel: 3.10.0-1098.el7.ppc64le - podman: podman version 1.4.4

slableakv1 - x86_64:
......
Wait for SUnreclaim stabilized
SUnreclaim stabilized now
One exec per cpu on 12 - Iterations: 6000 - Before: SUnreclaim: 96400 kB - After: SUnreclaim: 158772 kB
Total Leak: 62372 kB - Leak per execute: .8662 kB - Total time: 10442s
Kernel: 3.10.0-1098.el7.x86_64 - podman: podman version 1.4.4

slableakv2 - x86_64:
......
Wait for SUnreclaim stabilized
SUnreclaim stabilized now
One exec per cpu on 12 - Iterations: 6000 - Before: SUnreclaim: 82584 kB - After: SUnreclaim: 115444 kB
Total Leak: 32860 kB - Leak per execute: .4563 kB - Total time: 5274s
Kernel: 3.10.0-1098.el7.x86_64 - podman: podman version 1.4.4


Both slableakv1(podman exec) and slableakv2(shortlive containers) tests showing good result. Move to VERIFIED.

Beaker Jobs:
https://beaker.engineering.redhat.com/jobs/3807009

Comment 122 Gustavo Luiz Duarte (IBM) 2019-10-10 21:35:16 UTC

*** Bug 1392283 has been marked as a duplicate of this bug. ***

Comment 123 IBM Bug Proxy 2019-10-10 21:46:27 UTC

------- Comment From mjwolf.com 2017-05-17 17:25 EDT-------
This should not be with Gustavo on the IBM side.  Re-assigning the bug

------- Comment From gmgaydos.com 2017-05-26 14:06 EDT-------
Red Hat:  This bug has been reassigned on May 25 in order to improve response time.

------- Comment From chavez.com 2017-05-30 18:21 EDT-------
Hello,

How much RAM should we have installed in the system? Also, does this problem happen with PowerVM, KVM and/or bare metal? Any information you can give us on the config will help us set up a similar environment for a recreate. Thanks.

------- Comment From chavez.com 2017-06-01 18:37 EDT-------
(In reply to comment #10)
> Hello,
> How much RAM should we have installed in the system? Also, does this problem
> happen with PowerVM, KVM and/or bare metal? Any information you can give us
> on the config will help us set up a similar environment for a recreate.
> Thanks.

Just to clarify, this question was addressed to the Red Hat reporter, Wang Shu. The needinfo flag in the Red Hat bug was already set accordingly.

------- Comment From diegodo.com 2017-06-13 16:10 EDT-------

Mauricio did a check on the LPAR used to reproduce the problem. This system has 16 CPUs & 2677 MB RAM.

I was able to reproduce this problem using my local system (KVM guest):

Red Hat Enterprise Linux Server 7.4 Beta (Maipo)

[root@rhel74snapshot2 ~]# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    8
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (raw), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-31

[root@rhel74snapshot2 ~]# free -m
total        used        free      shared  buff/cache   available
Mem:            498         193          25           4         278          20
Swap:          8192         136        8056

I wasn't able to reproduce with less cpus.

The steps were the same as documented in the first comment:

[root@rhel74snapshot2 ~]# systemctl start cgconfig
[root@rhel74snapshot2 ~]# for i in $(seq 1 1000); do mkdir /sys/fs/cgroup/cpuacct/cgroup_$i; done

------- Comment From diegodo.com 2017-06-23 11:41 EDT-------
UPDATE

I couldn't reproduce this problem on kernel >= 4.1.
I was able to increase the "for" execution in 10x and nothing happened.

Now I'm trying to find the problem and creating a backport to kernel 3.10.

------- Comment From diegodo.com 2017-06-28 14:09 EDT-------
UPDATE:

I did some research using a basic "free -m" command, and there is a huge difference between kernel 3.10 and kernel >= 4.1 - the creation of 1000 dirs in 3.10 consumes about 100MB, while in 4.1 consumes 20MB.

Because there are a lot of changes between these kernels in this area of cgroup, I'm trying to figure out what exactly did the difference in this size and trying to create a backport to this improvement.

Please feel free to suggest/ask something.

------- Comment From diegodo.com 2017-07-11 12:59 EDT-------
UPDATE

I did some tests on x86 arch and I figured out that the creation of 1000 cgroups have almost the same memory consumption (100MB) as we can see below:

-------------------------------------------------------------------------------
Before:

total        used        free      shared  buff/cache   available
Mem:            570         126         290           4         154         293
Swap:             0           0           0

After:

total        used        free      shared  buff/cache   available
Mem:            570         125         185           4         259         193
Swap:             0           0           0

[root@localhost ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             32
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel Core Processor (Skylake)
Stepping:              3
CPU MHz:               2496.000
BogoMIPS:              4992.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-31
[root@localhost ~]#

-------------------------------------------------------------------------------

Before:
total        used        free      shared  buff/cache   available
Mem:          14560         313       13774          12         472       13947
Swap:          8192           0        8192

After:
total        used        free      shared  buff/cache   available
Mem:          14560         329       13659          12         571       13836
Swap:          8192           0        8192

Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    8
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (raw), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-31

---------------------------------------------

So, it seems that memory consumption are almost the same for both architectures.

------- Comment From diegodo.com 2017-11-22 14:30 EDT-------
(In reply to comment #21)
> It seems that the bug can be reproduced only on systems where the first NUMA
> node has no CPUs assigned:
> $ uname -r
> 3.10.0-693.el7.ppc64le
>
> $ cat test.sh
> echo 3 > /proc/sys/vm/drop_caches
> grep ^MemAvail /proc/meminfo
> for i in $(seq 10); do
> for j in $(seq 100); do
> mkdir /sys/fs/cgroup/cpuacct/cg_${i}_${j}
> done
> grep ^MemAvail /proc/meminfo
> done
>
> === First system ===
>
> $ lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          1
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> Hypervisor vendor:     (null)
> Virtualization type:   full
> L1d cache:             64K
> L1i cache:             32K
> L2 cache:              512K
> L3 cache:              8192K
> NUMA node0 CPU(s):     0-15
>
> $ ./test.sh
> MemAvailable:   16756096 kB
> MemAvailable:   16715200 kB
> MemAvailable:   16668544 kB
> MemAvailable:   16627712 kB
> MemAvailable:   16580992 kB
> MemAvailable:   16528448 kB
> MemAvailable:   16475840 kB
> MemAvailable:   16423360 kB
> MemAvailable:   16379648 kB
> MemAvailable:   16327040 kB
> MemAvailable:   16280384 kB
>
> === Second system ===
>
> $ lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          2
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> Hypervisor vendor:     (null)
> Virtualization type:   full
> L1d cache:             64K
> L1i cache:             32K
> NUMA node0 CPU(s):     0-15
> NUMA node3 CPU(s):
>
> $ ./test.sh
> MemAvailable:   16766720 kB
> MemAvailable:   16725888 kB
> MemAvailable:   16679168 kB
> MemAvailable:   16618304 kB
> MemAvailable:   16571712 kB
> MemAvailable:   16519296 kB
> MemAvailable:   16466816 kB
> MemAvailable:   16420160 kB
> MemAvailable:   16367680 kB
> MemAvailable:   16318144 kB
> MemAvailable:   16265664 kB
>
> === Third system ===
>
> $ lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          2
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> Hypervisor vendor:     (null)
> Virtualization type:   full
> L1d cache:             64K
> L1i cache:             32K
> L2 cache:              512K
> L3 cache:              8192K
> NUMA node0 CPU(s):
> NUMA node2 CPU(s):     0-15
>
> $ uname -r
> 3.10.0-693.el7.ppc64le
>
> $ ./test.sh
> MemAvailable:   22156736 kB
> MemAvailable:   16632896 kB
> MemAvailable:   11104064 kB
> MemAvailable:    5566976 kB
> MemAvailable:      34112 kB
> !!! OOM !!!

I was not able to reproduce the result with Numa Node0 with no Cpu using qemu:

[root@localhost ~]# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             16
NUMA node(s):          2
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (raw), altivec supported
Hypervisor vendor:     KVM
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):
NUMA node1 CPU(s):     0-15

[root@localhost ~]# ./test.sh
MemAvailable:      86848 kB
MemAvailable:      31168 kB
MemAvailable:      56704 kB
MemAvailable:      51328 kB
MemAvailable:      44608 kB
MemAvailable:      40576 kB
MemAvailable:      37632 kB
MemAvailable:      27328 kB
MemAvailable:      29440 kB
MemAvailable:      30720 kB
MemAvailable:      32064 kB

------- Comment From diegodo.com 2018-02-28 06:56 EDT-------
Hi Li Wang,

I need help to reproduce this problem. I wasn't able to reproduce this problem how you can see in Comment #24.

Can anyone help me to reproduce this?

Thank you

Comment 124 Jean-Philippe Menil 2019-12-18 08:21:21 UTC

Guys,
any news when it will be backported to 7.7?

Comment 125 Marc Milgram 2019-12-18 13:54:41 UTC

Hi Jean-Philippe,

(In reply to Jean-Philippe Menil from comment #124)
> Guys,
> any news when it will be backported to 7.7?

It was backported to RHEL-7.7 in BZ 1752421.
Upgrade to kernel-3.10.0-1062.4.1.el7 from RHSA-2019:3055
https://access.redhat.com/errata/RHSA-2019:3055

Comment 126 Jean-Philippe Menil 2019-12-19 08:06:03 UTC

Yeah i guess it was fixed:

https://github.com/docker/for-linux/issues/841

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1062.7.1.el7.x86_64 root=/dev/mapper/rootvg-lvroot ro crashkernel=auto rd.lvm.lv=rootvg/lvroot rd.lvm.lv=rootvg/lvswap rhgb quiet LANG=en_US.UTF-8 namespace.unpriv_enable=1 user_namespace.enable=1

uname -a
Linux hsv5234s 3.10.0-1062.7.1.el7.x86_64 #1 SMP Wed Nov 13 08:44:42 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

Dec 18 08:20:41 hsv5234s kernel: [1080660.051671] runc:[1:CHILD]: page allocation failure: order:7, mode:0xc0d0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051679] CPU: 17 PID: 16064 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
Dec 18 08:20:41 hsv5234s kernel: [1080660.051682] Hardware name: LENOVO System x3650 M5 -[8871AC1]-/01KN179, BIOS -[TCE140D-2.90]- 02/13/2019
Dec 18 08:20:41 hsv5234s kernel: [1080660.051684] Call Trace:
Dec 18 08:20:41 hsv5234s kernel: [1080660.051696]  [<ffffffffbcb7ac23>] dump_stack+0x19/0x1b
Dec 18 08:20:41 hsv5234s kernel: [1080660.051704]  [<ffffffffbc5c3e50>] warn_alloc_failed+0x110/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.051708]  [<ffffffffbc5c8a5f>] __alloc_pages_nodemask+0x9df/0xbe0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051714]  [<ffffffffbc616c08>] alloc_pages_current+0x98/0x110
Dec 18 08:20:41 hsv5234s kernel: [1080660.051719]  [<ffffffffbc5e3c08>] kmalloc_order+0x18/0x40
Dec 18 08:20:41 hsv5234s kernel: [1080660.051723]  [<ffffffffbc622136>] kmalloc_order_trace+0x26/0xa0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051727]  [<ffffffffbc6266f1>] __kmalloc+0x211/0x230
Dec 18 08:20:41 hsv5234s kernel: [1080660.051732]  [<ffffffffbc63ee41>] memcg_alloc_cache_params+0x81/0xb0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051735]  [<ffffffffbc5e38b4>] do_kmem_cache_create+0x74/0xf0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051738]  [<ffffffffbc5e3a32>] kmem_cache_create+0x102/0x1b0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051753]  [<ffffffffc06b4dc1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.051759]  [<ffffffffc06b56c4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.051764]  [<ffffffffbca44054>] ops_init+0x44/0x150
Dec 18 08:20:41 hsv5234s kernel: [1080660.051767]  [<ffffffffbca44203>] setup_net+0xa3/0x160
Dec 18 08:20:41 hsv5234s kernel: [1080660.051770]  [<ffffffffbca449a5>] copy_net_ns+0xb5/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.051774]  [<ffffffffbc4cb469>] create_new_namespaces+0xf9/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.051777]  [<ffffffffbc4cb6aa>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051781]  [<ffffffffbc49ae8b>] SyS_unshare+0x1cb/0x340
Dec 18 08:20:41 hsv5234s kernel: [1080660.051785]  [<ffffffffbcb8dede>] system_call_fastpath+0x25/0x2a
Dec 18 08:20:41 hsv5234s kernel: [1080660.051787] Mem-Info:
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800] active_anon:603062 inactive_anon:658360 isolated_anon:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800]  active_file:2900635 inactive_file:2830989 isolated_file:4
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800]  unevictable:0 dirty:1534 writeback:0 unstable:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800]  slab_reclaimable:341029 slab_unreclaimable:390332
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800]  mapped:56516 shmem:134670 pagetables:12006 bounce:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051800]  free:46786 free_pcp:2175 free_cma:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051807] Node 0 DMA free:15880kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 18 08:20:41 hsv5234s kernel: [1080660.051815] lowmem_reserve[]: 0 971 15057 15057
Dec 18 08:20:41 hsv5234s kernel: [1080660.051820] Node 0 DMA32 free:60060kB min:2808kB low:3508kB high:4212kB active_anon:94876kB inactive_anon:149864kB active_file:233272kB inactive_file:218016kB unevictable:0kB isolated(anon):0kB isolated(file):16kB present:1226660kB managed:995204kB mlocked:0kB dirty:112kB writeback:0kB mapped:14792kB shmem:16016kB slab_reclaimable:112532kB slab_unreclaimable:89812kB kernel_stack:2080kB pagetables:2928kB unstable:0kB bounce:0kB free_pcp:1384kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:23 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.051828] lowmem_reserve[]: 0 0 14085 14085
Dec 18 08:20:41 hsv5234s kernel: [1080660.051831] Node 0 Normal free:51656kB min:40708kB low:50884kB high:61060kB active_anon:1375020kB inactive_anon:1401180kB active_file:4987756kB inactive_file:4918788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14680064kB managed:14423212kB mlocked:0kB dirty:3076kB writeback:0kB mapped:94496kB shmem:141648kB slab_reclaimable:612400kB slab_unreclaimable:685176kB kernel_stack:10704kB pagetables:32204kB unstable:0kB bounce:0kB free_pcp:4540kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.051839] lowmem_reserve[]: 0 0 0 0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051843] Node 1 Normal free:59548kB min:46548kB low:58184kB high:69820kB active_anon:942352kB inactive_anon:1082396kB active_file:6381128kB inactive_file:6187152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16495992kB mlocked:0kB dirty:2948kB writeback:0kB mapped:116776kB shmem:381016kB slab_reclaimable:639184kB slab_unreclaimable:786324kB kernel_stack:9344kB pagetables:12892kB unstable:0kB bounce:0kB free_pcp:2744kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:64 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.051851] lowmem_reserve[]: 0 0 0 0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051854] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051866] Node 0 DMA32: 667*4kB (UEM) 594*8kB (UEM) 263*16kB (UEM) 222*32kB (UEM) 258*64kB (UEM) 132*128kB (UEM) 31*256kB (UEM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 60076kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051878] Node 0 Normal: 8586*4kB (UEM) 2023*8kB (UM) 6*16kB (M) 6*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50816kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051888] Node 1 Normal: 3685*4kB (UEM) 5529*8kB (UEM) 24*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 59356kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051899] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051901] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051903] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051905] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051906] 5869920 total pagecache pages
Dec 18 08:20:41 hsv5234s kernel: [1080660.051910] 3487 pages in swap cache
Dec 18 08:20:41 hsv5234s kernel: [1080660.051913] Swap cache stats: add 38962826, delete 38956860, find 14019018/18441754
Dec 18 08:20:41 hsv5234s kernel: [1080660.051914] Free swap  = 8253408kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051915] Total swap = 8388604kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.051917] 8174980 pages RAM
Dec 18 08:20:41 hsv5234s kernel: [1080660.051918] 0 pages HighMem/MovableOnly
Dec 18 08:20:41 hsv5234s kernel: [1080660.051920] 192404 pages reserved
Dec 18 08:20:41 hsv5234s kernel: [1080660.051923] kmem_cache_create(nf_conntrack_ffff98b468a20000) failed with error -12
Dec 18 08:20:41 hsv5234s kernel: [1080660.051926] CPU: 17 PID: 16064 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
Dec 18 08:20:41 hsv5234s kernel: [1080660.051928] Hardware name: LENOVO System x3650 M5 -[8871AC1]-/01KN179, BIOS -[TCE140D-2.90]- 02/13/2019
Dec 18 08:20:41 hsv5234s kernel: [1080660.051930] Call Trace:
Dec 18 08:20:41 hsv5234s kernel: [1080660.051933]  [<ffffffffbcb7ac23>] dump_stack+0x19/0x1b
Dec 18 08:20:41 hsv5234s kernel: [1080660.051936]  [<ffffffffbc5e3ab7>] kmem_cache_create+0x187/0x1b0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051942]  [<ffffffffc06b4dc1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.051947]  [<ffffffffc06b56c4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.051950]  [<ffffffffbca44054>] ops_init+0x44/0x150
Dec 18 08:20:41 hsv5234s kernel: [1080660.051953]  [<ffffffffbca44203>] setup_net+0xa3/0x160
Dec 18 08:20:41 hsv5234s kernel: [1080660.051956]  [<ffffffffbca449a5>] copy_net_ns+0xb5/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.051959]  [<ffffffffbc4cb469>] create_new_namespaces+0xf9/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.051961]  [<ffffffffbc4cb6aa>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 18 08:20:41 hsv5234s kernel: [1080660.051965]  [<ffffffffbc49ae8b>] SyS_unshare+0x1cb/0x340
Dec 18 08:20:41 hsv5234s kernel: [1080660.051967]  [<ffffffffbcb8dede>] system_call_fastpath+0x25/0x2a
Dec 18 08:20:41 hsv5234s kernel: [1080660.051970] Unable to create nf_conn slab cache
Dec 18 08:20:41 hsv5234s kernel: [1080660.119124] runc:[1:CHILD]: page allocation failure: order:7, mode:0xc0d0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119135] CPU: 4 PID: 16094 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
Dec 18 08:20:41 hsv5234s kernel: [1080660.119139] Hardware name: LENOVO System x3650 M5 -[8871AC1]-/01KN179, BIOS -[TCE140D-2.90]- 02/13/2019
Dec 18 08:20:41 hsv5234s kernel: [1080660.119142] Call Trace:
Dec 18 08:20:41 hsv5234s kernel: [1080660.119156]  [<ffffffffbcb7ac23>] dump_stack+0x19/0x1b
Dec 18 08:20:41 hsv5234s kernel: [1080660.119167]  [<ffffffffbc5c3e50>] warn_alloc_failed+0x110/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.119174]  [<ffffffffbc5c8a5f>] __alloc_pages_nodemask+0x9df/0xbe0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119182]  [<ffffffffbc616c08>] alloc_pages_current+0x98/0x110
Dec 18 08:20:41 hsv5234s kernel: [1080660.119190]  [<ffffffffbc5e3c08>] kmalloc_order+0x18/0x40
Dec 18 08:20:41 hsv5234s kernel: [1080660.119196]  [<ffffffffbc622136>] kmalloc_order_trace+0x26/0xa0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119202]  [<ffffffffbc6266f1>] __kmalloc+0x211/0x230
Dec 18 08:20:41 hsv5234s kernel: [1080660.119209]  [<ffffffffbc63ee41>] memcg_alloc_cache_params+0x81/0xb0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119215]  [<ffffffffbc5e38b4>] do_kmem_cache_create+0x74/0xf0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119221]  [<ffffffffbc5e3a32>] kmem_cache_create+0x102/0x1b0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119238]  [<ffffffffc06b4dc1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.119249]  [<ffffffffc06b56c4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.119256]  [<ffffffffbca44054>] ops_init+0x44/0x150
Dec 18 08:20:41 hsv5234s kernel: [1080660.119260]  [<ffffffffbca44203>] setup_net+0xa3/0x160
Dec 18 08:20:41 hsv5234s kernel: [1080660.119265]  [<ffffffffbca449a5>] copy_net_ns+0xb5/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.119271]  [<ffffffffbc4cb469>] create_new_namespaces+0xf9/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.119276]  [<ffffffffbc4cb6aa>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119286]  [<ffffffffbc49ae8b>] SyS_unshare+0x1cb/0x340
Dec 18 08:20:41 hsv5234s kernel: [1080660.119293]  [<ffffffffbcb8dede>] system_call_fastpath+0x25/0x2a
Dec 18 08:20:41 hsv5234s kernel: [1080660.119296] Mem-Info:
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309] active_anon:606064 inactive_anon:659529 isolated_anon:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  active_file:2897344 inactive_file:2822214 isolated_file:4
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  unevictable:0 dirty:2854 writeback:0 unstable:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  slab_reclaimable:340962 slab_unreclaimable:390372
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  mapped:56516 shmem:134670 pagetables:12006 bounce:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  free:55605 free_pcp:512 free_cma:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119318] Node 0 DMA free:15880kB min:44kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 18 08:20:41 hsv5234s kernel: [1080660.119331] lowmem_reserve[]: 0 971 15057 15057
Dec 18 08:20:41 hsv5234s kernel: [1080660.119338] Node 0 DMA32 free:62628kB min:2808kB low:3508kB high:4212kB active_anon:94796kB inactive_anon:149968kB active_file:233240kB inactive_file:216416kB unevictable:0kB isolated(anon):0kB isolated(file):16kB present:1226660kB managed:995204kB mlocked:0kB dirty:112kB writeback:0kB mapped:14792kB shmem:16016kB slab_reclaimable:112628kB slab_unreclaimable:89972kB kernel_stack:2080kB pagetables:2928kB unstable:0kB bounce:0kB free_pcp:316kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:66 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.119351] lowmem_reserve[]: 0 0 14085 14085
Dec 18 08:20:41 hsv5234s kernel: [1080660.119357] Node 0 Normal free:61096kB min:40708kB low:50884kB high:61060kB active_anon:1381536kB inactive_anon:1404520kB active_file:4987048kB inactive_file:4901144kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14680064kB managed:14423212kB mlocked:0kB dirty:3780kB writeback:0kB mapped:94496kB shmem:141648kB slab_reclaimable:612400kB slab_unreclaimable:685176kB kernel_stack:10704kB pagetables:32204kB unstable:0kB bounce:0kB free_pcp:1144kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.119369] lowmem_reserve[]: 0 0 0 0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119375] Node 1 Normal free:82816kB min:46548kB low:58184kB high:69820kB active_anon:947596kB inactive_anon:1084024kB active_file:6369088kB inactive_file:6170932kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16495992kB mlocked:0kB dirty:7524kB writeback:0kB mapped:116776kB shmem:381016kB slab_reclaimable:638820kB slab_unreclaimable:786324kB kernel_stack:9344kB pagetables:12892kB unstable:0kB bounce:0kB free_pcp:584kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 18 08:20:41 hsv5234s kernel: [1080660.119386] lowmem_reserve[]: 0 0 0 0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119391] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119412] Node 0 DMA32: 1166*4kB (UEM) 677*8kB (UEM) 266*16kB (UEM) 225*32kB (UEM) 258*64kB (UEM) 131*128kB (UEM) 31*256kB (UEM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62752kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119432] Node 0 Normal: 9639*4kB (UEM) 2269*8kB (UEM) 251*16kB (UEM) 83*32kB (UEM) 3*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63572kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119450] Node 1 Normal: 5940*4kB (UEM) 7125*8kB (UEM) 221*16kB (UEM) 30*32kB (UEM) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 85320kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119470] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119474] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119477] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119481] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119483] 5858268 total pagecache pages
Dec 18 08:20:41 hsv5234s kernel: [1080660.119488] 3481 pages in swap cache
Dec 18 08:20:41 hsv5234s kernel: [1080660.119492] Swap cache stats: add 38962827, delete 38956867, find 14019018/18441754
Dec 18 08:20:41 hsv5234s kernel: [1080660.119494] Free swap  = 8253408kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119496] Total swap = 8388604kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119499] 8174980 pages RAM
Dec 18 08:20:41 hsv5234s kernel: [1080660.119516] 0 pages HighMem/MovableOnly
Dec 18 08:20:41 hsv5234s kernel: [1080660.119518] 192404 pages reserved
Dec 18 08:20:41 hsv5234s kernel: [1080660.119523] kmem_cache_create(nf_conntrack_ffff98b287b81480) failed with error -12
Dec 18 08:20:41 hsv5234s kernel: [1080660.119535] CPU: 4 PID: 16094 Comm: runc:[1:CHILD] Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.7.1.el7.x86_64 #1
Dec 18 08:20:41 hsv5234s kernel: [1080660.119552] Hardware name: LENOVO System x3650 M5 -[8871AC1]-/01KN179, BIOS -[TCE140D-2.90]- 02/13/2019
Dec 18 08:20:41 hsv5234s kernel: [1080660.119567] Call Trace:
Dec 18 08:20:41 hsv5234s kernel: [1080660.119590]  [<ffffffffbcb7ac23>] dump_stack+0x19/0x1b
Dec 18 08:20:41 hsv5234s kernel: [1080660.119611]  [<ffffffffbc5e3ab7>] kmem_cache_create+0x187/0x1b0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119630]  [<ffffffffc06b4dc1>] nf_conntrack_init_net+0xf1/0x260 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.119663]  [<ffffffffc06b56c4>] nf_conntrack_pernet_init+0x14/0x150 [nf_conntrack]
Dec 18 08:20:41 hsv5234s kernel: [1080660.119680]  [<ffffffffbca44054>] ops_init+0x44/0x150
Dec 18 08:20:41 hsv5234s kernel: [1080660.119706]  [<ffffffffbca44203>] setup_net+0xa3/0x160
Dec 18 08:20:41 hsv5234s kernel: [1080660.119721]  [<ffffffffbca449a5>] copy_net_ns+0xb5/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.119735]  [<ffffffffbc4cb469>] create_new_namespaces+0xf9/0x180
Dec 18 08:20:41 hsv5234s kernel: [1080660.119752]  [<ffffffffbc4cb6aa>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119772]  [<ffffffffbc49ae8b>] SyS_unshare+0x1cb/0x340
Dec 18 08:20:41 hsv5234s kernel: [1080660.119790]  [<ffffffffbcb8dede>] system_call_fastpath+0x25/0x2a
Dec 18 08:20:41 hsv5234s kernel: [1080660.119807] Unable to create nf_conn slab cache

Do you wan me to open a new bug for the same issue ?

Comment 127 Chao Ye 2019-12-19 08:20:46 UTC

Hello Jean-Philippe Menil,

Are you using ip_vs in your env? There is a know issue with ip_vs + user ns+ net ns, BZ#1540585.

Thanks,
Chao

Comment 128 Jean-Philippe Menil 2019-12-19 08:24:26 UTC

nope:

lsmod | grep ip_vs

return nothing :)

By the way, BZ#1540585 is restricted ...

Comment 129 Chao Ye 2019-12-19 09:29:54 UTC

(In reply to Jean-Philippe Menil from comment #128)
> nope:
> 
> lsmod | grep ip_vs
> 
> return nothing :)
> 
> By the way, BZ#1540585 is restricted ...

Thanks you, Jean-Philippe Menil,

Could you help try boot with 'cgroup.memory=nokmem' and see if this issue still exist. Also, can you try to run container without nf_conntrack loaded. I'll check with our network qe, and see if we could reproduce this in house. Many thanks!

Chao

Comment 130 Jean-Philippe Menil 2019-12-19 09:37:27 UTC

Chao,

i will try the cgroup.memory=nokmem parameter, effectively, could be a good candidate.
Just to let you know that the issue is intermittent and took us around a week to trig it.

There is absolutely no way that we run anything without nf_conntrack ...

Comment 131 Rafael Aquini 2019-12-19 17:06:09 UTC

Per log snippets pasted on comment #126, it doesn't look like to be the same issue that was coped with in this ticket, 
but a severe memory fragmentation case instead.


There's no panic, but page allocation failure warnings, for high-order blocks:

...
Dec 18 08:20:41 hsv5234s kernel: [1080660.051671] runc:[1:CHILD]: page allocation failure: order:7, mode:0xc0d0
...


and the reason is because there are no order-7 (128kB) blocks available on the zones the request could grab memory from (per mode flags), as well as there are no blocks bigger than order-7 available for buddy splits, as we can observe on the provided log splats:

...
Dec 18 08:20:41 hsv5234s kernel: [1080660.119432] Node 0 Normal: 9639*4kB (UEM) 2269*8kB (UEM) 251*16kB (UEM) 83*32kB (UEM) 3*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63572kB
Dec 18 08:20:41 hsv5234s kernel: [1080660.119450] Node 1 Normal: 5940*4kB (UEM) 7125*8kB (UEM) 221*16kB (UEM) 30*32kB (UEM) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 85320kB
...


and meminfo data doesn't suggest a bloated slab usage, but a bloated page-cache usage instead:
...
Dec 18 08:20:41 hsv5234s kernel: [1080660.119296] Mem-Info:
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309] active_anon:606064 inactive_anon:659529 isolated_anon:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  active_file:2897344 inactive_file:2822214 isolated_file:4
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  unevictable:0 dirty:2854 writeback:0 unstable:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  slab_reclaimable:340962 slab_unreclaimable:390372
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  mapped:56516 shmem:134670 pagetables:12006 bounce:0
Dec 18 08:20:41 hsv5234s kernel: [1080660.119309]  free:55605 free_pcp:512 free_cma:0
...

-- Rafael

Comment 132 Jean-Philippe Menil 2019-12-20 09:08:54 UTC

So, what you suggest is just increase min_free_kbytes?

Comment 133 Rafael Aquini 2020-01-15 20:54:46 UTC

*** Bug 1489799 has been marked as a duplicate of this bug. ***

Comment 134 Jean-Philippe Menil 2020-01-22 07:31:40 UTC

Chao,

the cgroup.memory=nokmem seem to do the trick.

Thanks again and best regards,

Jean-Philippe

Comment 135 Vishal Agrawal 2020-03-20 21:06:16 UTC

Hi Chao,

I got a customer [case : 02605103] facing the same problem.

~~~~~
Mar 10 10:44:45 prd-k8s-worker101 kubelet: E0310 10:44:45.452884    9223 pod_workers.go:190] Error syncing pod 101ce83b-60eb-4212-9ab2-08bf2d933054 ("kafka-connector-healer-1583854800-2gb8s_
tenet(101ce83b-60eb-4212-9ab2-08bf2d933054)"), skipping: failed to ensure that the pod: 101ce83b-60eb-4212-9ab2-08bf2d933054 cgroups exist and are correctly applied: failed to create contain
er for [kubepods burstable pod101ce83b-60eb-4212-9ab2-08bf2d933054] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod101ce83b-60eb-4212-9ab2-08bf2d933054: cannot allocate memory
~~~~~

Messages are flooding just as in above snip.

He is running 3.10.0-1062.12.1.el7.x86_64 already.

I grabbed strace when problem was present.

[root@prd-k8s-worker101 psjoberg]#  strace -f -o /root/strace_cgroup2.out cgcreate -g memory:kubepods/burstable/IODINE
cgcreate: can't create cgroup kubepods/burstable/IODINE: Cgroup, operation not allowed



[root@prd-k8s-worker101 psjoberg]# cat /root/strace_cgroup2.out 
132066 execve("/bin/cgcreate", ["cgcreate", "-g", "memory:kubepods/burstable/IODINE"], [/* 22 vars */]) = 0
132066 brk(NULL)                        = 0x555645f29000
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43b8000
132066 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
132066 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
132066 fstat(3, {st_mode=S_IFREG|0644, st_size=26713, ...}) = 0
132066 mmap(NULL, 26713, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f51d43b1000
132066 close(3)                         = 0
132066 open("/lib64/libcgroup.so.1", O_RDONLY|O_CLOEXEC) = 3
132066 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240D\0\0\0\0\0\0"..., 832) = 832
132066 fstat(3, {st_mode=S_IFREG|0755, st_size=104224, ...}) = 0
132066 mmap(NULL, 4665984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f51d3d24000
132066 mprotect(0x7f51d3d3b000, 2097152, PROT_NONE) = 0
132066 mmap(0x7f51d3f3b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f51d3f3b000
132066 mmap(0x7f51d3f3d000, 2466432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f51d3f3d000
132066 close(3)                         = 0
132066 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
132066 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200m\0\0\0\0\0\0"..., 832) = 832
132066 fstat(3, {st_mode=S_IFREG|0755, st_size=142144, ...}) = 0
132066 mmap(NULL, 2208904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f51d3b08000
132066 mprotect(0x7f51d3b1f000, 2093056, PROT_NONE) = 0
132066 mmap(0x7f51d3d1e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f51d3d1e000
132066 mmap(0x7f51d3d20000, 13448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f51d3d20000
132066 close(3)                         = 0
132066 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
132066 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P&\2\0\0\0\0\0"..., 832) = 832
132066 fstat(3, {st_mode=S_IFREG|0755, st_size=2156072, ...}) = 0
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43b0000
132066 mmap(NULL, 3985888, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f51d373a000
132066 mprotect(0x7f51d38fd000, 2097152, PROT_NONE) = 0
132066 mmap(0x7f51d3afd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c3000) = 0x7f51d3afd000
132066 mmap(0x7f51d3b03000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f51d3b03000
132066 close(3)                         = 0
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43af000
132066 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43ad000
132066 arch_prctl(ARCH_SET_FS, 0x7f51d43adb80) = 0
132066 mprotect(0x7f51d3afd000, 16384, PROT_READ) = 0
132066 mprotect(0x7f51d3d1e000, 4096, PROT_READ) = 0
132066 mprotect(0x7f51d3f3b000, 4096, PROT_READ) = 0
132066 mprotect(0x555645ac1000, 4096, PROT_READ) = 0
132066 mprotect(0x7f51d43b9000, 4096, PROT_READ) = 0
132066 munmap(0x7f51d43b1000, 26713)    = 0
132066 set_tid_address(0x7f51d43ade50)  = 132066
132066 set_robust_list(0x7f51d43ade60, 24) = 0
132066 rt_sigaction(SIGRTMIN, {0x7f51d3b0e860, [], SA_RESTORER|SA_SIGINFO, 0x7f51d3b17630}, NULL, 8) = 0
132066 rt_sigaction(SIGRT_1, {0x7f51d3b0e8f0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f51d3b17630}, NULL, 8) = 0
132066 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
132066 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
132066 brk(NULL)                        = 0x555645f29000
132066 brk(0x555645f4a000)              = 0x555645f4a000
132066 brk(NULL)                        = 0x555645f4a000
132066 open("/proc/cgroups", O_RDONLY|O_CLOEXEC) = 3
132066 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43b7000
132066 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 1024) = 230
132066 read(3, "", 1024)                = 0
132066 open("/proc/mounts", O_RDONLY|O_CLOEXEC) = 4
132066 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43b6000
132066 read(4, "rootfs / rootfs rw 0 0\nsysfs /sy"..., 1024) = 1024
132066 read(4, "l,nosuid,nodev,noexec,relatime,p"..., 1024) = 1024
132066 read(4, "ebugfs rw,relatime 0 0\nhugetlbfs"..., 1024) = 1024
132066 read(4, "proc rw,nosuid,nodev,noexec,rela"..., 1024) = 1024
132066 read(4, "25fee5d1d08edf87aefb18dd7a5c144f"..., 1024) = 1024
132066 read(4, "abel,relatime 0 0\ntmpfs /var/lib"..., 1024) = 1024
132066 read(4, "7f3cd7f4557a51bff97846504d968e27"..., 1024) = 1024
132066 read(4, ".io~secret/ssot-processor-token-"..., 1024) = 1024
132066 read(4, "ir=/var/lib/docker/overlay/4022a"..., 1024) = 1024
132066 read(4, "bef704f791045ca20906025d914/moun"..., 1024) = 1024
132066 read(4, "mpfs rw,seclabel,relatime 0 0\ntm"..., 1024) = 1024
132066 read(4, "e78bce0b/merged overlay rw,secla"..., 1024) = 1024
132066 read(4, "roc rw,nosuid,nodev,noexec,relat"..., 1024) = 1024
132066 read(4, "cker/containers/3594e77e3c3b353e"..., 1024) = 1024
132066 read(4, ".io~secret/ssot-processor-token-"..., 1024) = 1024
132066 read(4, "latime,lowerdir=/var/lib/docker/"..., 1024) = 1024
132066 read(4, "147daf/volumes/kubernetes.io~sec"..., 1024) = 1024
132066 read(4, "1d84afbaef3dd73e500fd212cfefb4e5"..., 1024) = 1024
132066 read(4, "bel,nosuid,nodev,noexec,relatime"..., 1024) = 1024
132066 read(4, "kubelet/pods/5229f250-0bb5-4615-"..., 1024) = 1024
132066 read(4, "rkdir=/var/lib/docker/overlay/ec"..., 1024) = 1024
132066 read(4, "owerdir=/var/lib/docker/overlay/"..., 1024) = 1024
132066 read(4, "mounts/shm tmpfs rw,seclabel,nos"..., 1024) = 1024
132066 read(4, "per,workdir=/var/lib/docker/over"..., 1024) = 1024
132066 read(4, "nodev,noexec,relatime 0 0\ntmpfs "..., 1024) = 1024
132066 read(4, "var/lib/docker/overlay/ac0ad3d37"..., 1024) = 1024
132066 read(4, "-4bc6-9c5b-1a0c2ef092dd/volumes/"..., 1024) = 1024
132066 read(4, "55e99237c7d32a8e85b4f2f47c5b2d4d"..., 1024) = 1024
132066 read(4, " /var/lib/docker/containers/3e46"..., 1024) = 1024
132066 read(4, "ar/lib/docker/overlay/708310bef3"..., 1024) = 1024
132066 read(4, "/var/lib/docker/overlay/3a7bc822"..., 1024) = 1024
132066 read(4, "10abe71d4cfa4506fffa59471a5a54e4"..., 1024) = 1024
132066 read(4, "100183f1c4cda2efc169622d625662/w"..., 1024) = 1024
132066 read(4, "b76b/merged overlay rw,seclabel,"..., 1024) = 1024
132066 read(4, "e13b53/root,upperdir=/var/lib/do"..., 1024) = 1024
132066 read(4, "/docker/overlay/3004b765e83e7c5b"..., 1024) = 1024
132066 read(4, "73da8084d3646be98da1ebf7bf32e039"..., 1024) = 1024
132066 read(4, "7071f5387761c2d09186dcb877d50ac8"..., 1024) = 1024
132066 read(4, "dbdc16d7dd5add9575579534180/uppe"..., 1024) = 1024
132066 read(4, "ork 0 0\noverlay /var/lib/docker/"..., 1024) = 1024
132066 read(4, "relatime,lowerdir=/var/lib/docke"..., 1024) = 1024
132066 read(4, "cker/overlay/3f7aee3450f473d4c94"..., 1024) = 1024
132066 read(4, "3d14643f5fb12fe46d1243ab246298dc"..., 1024) = 1024
132066 read(4, "13e047b643fc0f48a46dbb/merged ov"..., 1024) = 1024
132066 read(4, "eb6bf8d8b956019dbf8587ab/root,up"..., 1024) = 1024
132066 read(4, "r,workdir=/var/lib/docker/overla"..., 1024) = 1024
132066 read(4, "overlay/b5c2a757796e0a00e5a6b2b1"..., 1024) = 1024
132066 read(4, "r/overlay/3a7bc82268e1e432147f4a"..., 1024) = 1024
132066 read(4, "149df10503bf62d621f098e7a765e36a"..., 1024) = 1024
132066 read(4, "2f029e2616d6d67d/work 0 0\noverla"..., 1024) = 1024
132066 read(4, "erlay rw,seclabel,relatime,lower"..., 1024) = 1024
132066 read(4, "perdir=/var/lib/docker/overlay/2"..., 1024) = 1024
132066 read(4, "y/5501fd0719bf03746d78240bedd47a"..., 1024) = 1024
132066 read(4, "81a143658ac3da9afed334682ccf18a6"..., 1024) = 1024
132066 read(4, "144c26b4d72d1a0023b6def921366f54"..., 1024) = 1024
132066 read(4, "ea91e64f67495/upper,workdir=/var"..., 1024) = 1024
132066 read(4, "y /var/lib/docker/overlay/3ab88f"..., 1024) = 1024
132066 read(4, "l,relatime 0 0\ntmpfs /var/lib/ku"..., 1024) = 1024
132066 read(4, "overlay/d02e11e096cca9176e98b00d"..., 1024) = 1024
132066 read(4, "b/docker/overlay/86ea03d933cbb06"..., 1024) = 1024
132066 read(4, "f91d54c proc rw,nosuid,nodev,noe"..., 1024) = 1024
132066 read(4, "rdir=/var/lib/docker/overlay/1c5"..., 1024) = 1024
132066 read(4, "da131c59a5af78d91c476cf64c99ced1"..., 1024) = 1024
132066 read(4, "e18708ae244f60e55945e97842f1504d"..., 1024) = 1024
132066 read(4, "44a2b1e09f182b20a7502af60a08571a"..., 1024) = 1024
132066 read(4, "/overlay/732d75ed02b7f3cd7f4557a"..., 1024) = 1024
132066 read(4, " rw,seclabel,relatime 0 0\noverla"..., 1024) = 1024
132066 read(4, "18c91aa4a50241d06a2bf72820ee/upp"..., 1024) = 1024
132066 read(4, "69868c0615d4a4cd3482c36a0e40/upp"..., 1024) = 1024
132066 read(4, "ns/817eb67a936c proc rw,nosuid,n"..., 1024) = 1024
132066 read(4, "686b97c79d3b919bef603/merged ove"..., 1024) = 1024
132066 read(4, "fc5ce34c296f7f590a3598b/root,upp"..., 1024) = 1024
132066 read(4, ",workdir=/var/lib/docker/overlay"..., 1024) = 1024
132066 read(4, "6d7cd22/upper,workdir=/var/lib/d"..., 1024) = 1024
132066 read(4, "8d8b956019dbf8587ab/root,upperdi"..., 1024) = 1024
132066 read(4, "kdir=/var/lib/docker/overlay/48e"..., 1024) = 1024
132066 read(4, "/netns/5ea3bbc2414d proc rw,nosu"..., 1024) = 1024
132066 read(4, "5d79e30ba4afd6c68dc0c8752/merged"..., 1024) = 1024
132066 read(4, "023b6def921366f540a9a99897d/root"..., 1024) = 1024
132066 read(4, "/docker/netns/d243aeae8026 proc "..., 1024) = 1024
132066 read(4, "c40a0c8c1bf834bf515fe56086877a3d"..., 1024) = 1024
132066 read(4, "04cbfe0388fa7c1ba942aec189949896"..., 1024) = 1024
132066 read(4, "ffd3959467c515ec5fb3d857a8e09dda"..., 1024) = 1024
132066 read(4, "b968701cd3a58b18d790c5601/merged"..., 1024) = 1024
132066 read(4, "60a0a54b1608f500d0bd66e4ab28bd6d"..., 1024) = 1024
132066 read(4, "7d9d2d37/merged overlay rw,secla"..., 1024) = 1024
132066 read(4, "b8faf9667fc2a0bf755eb57bf4315bd/"..., 1024) = 1024
132066 read(4, "0272c09b34b6a59894bec081972aff77"..., 1024) = 1024
132066 read(4, "latime,lowerdir=/var/lib/docker/"..., 1024) = 1024
132066 read(4, "tmpfs rw,seclabel,relatime 0 0\no"..., 1024) = 1024
132066 read(4, "6ae83c0f426419532642913841d367d7"..., 1024) = 1024
132066 read(4, "61b4/work 0 0\noverlay /var/lib/d"..., 1024) = 1024
132066 read(4, "4d5e5678e4ebb09334056cf4d4bbc37e"..., 1024) = 1024
132066 read(4, "d30890fe8d9d7733/upper,workdir=/"..., 1024) = 1024
132066 read(4, "cker/overlay/07ef1c31ef2e2a4bdfa"..., 1024) = 1024
132066 read(4, "069c86989d3ed346e2bcfa3e59305ecd"..., 1024) = 1024
132066 read(4, " 0\nshm /var/lib/docker/container"..., 1024) = 1024
132066 read(4, "29db4a56add2f6f4a31cbe6302a8a96a"..., 1024) = 1024
132066 read(4, "y/4502a0aaf969ad02344062e2c30a94"..., 1024) = 1024
132066 read(4, "869ff7cfce56845562753ed8b206cef3"..., 1024) = 1024
132066 read(4, "71aca264fe7dbedf11c74df4a9c19f73"..., 1024) = 1024
132066 read(4, "cd41cdc5e1467/upper,workdir=/var"..., 1024) = 1024
132066 read(4, "y /var/lib/docker/overlay/171cc7"..., 1024) = 1024
132066 read(4, "dir=/var/lib/docker/overlay/d75f"..., 1024) = 1024
132066 read(4, "4b99d3bd80356131daca1e52fc6ae34d"..., 1024) = 1024
132066 read(4, "34ce2c5b3f0c50884d0c235b513211d5"..., 1024) = 1024
132066 read(4, "620515ad/merged overlay rw,secla"..., 1024) = 1024
132066 read(4, "b8b984f9a6735c10f02ca98161f6dfaf"..., 1024) = 1024
132066 read(4, "verlay/8eac9bf965261c6606bace18a"..., 1024) = 1024
132066 read(4, "0c6c127fbd685955f6ea262d21a89dfb"..., 1024) = 1024
132066 read(4, "1991d3d8176ad2cd/merged overlay "..., 1024) = 1024
132066 read(4, "f2d7-6bea-4300-b34b-658f0ff15f82"..., 1024) = 507
132066 read(4, "", 1024)                = 0
132066 close(3)                         = 0
132066 munmap(0x7f51d43b7000, 4096)     = 0
132066 close(4)                         = 0
132066 munmap(0x7f51d43b6000, 4096)     = 0
132066 open("/proc/mounts", O_RDONLY|O_CLOEXEC) = 3
132066 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
132066 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f51d43b7000
132066 read(3, "rootfs / rootfs rw 0 0\nsysfs /sy"..., 1024) = 1024
132066 close(3)                         = 0
132066 munmap(0x7f51d43b7000, 4096)     = 0
132066 mkdir("/sys", 0775)              = -1 EEXIST (File exists)
132066 mkdir("/sys/fs", 0775)           = -1 EEXIST (File exists)
132066 mkdir("/sys/fs/cgroup", 0775)    = -1 EEXIST (File exists)
132066 mkdir("/sys/fs/cgroup/memory", 0775) = -1 EEXIST (File exists)
132066 mkdir("/sys/fs/cgroup/memory/kubepods", 0775) = -1 EEXIST (File exists)
132066 mkdir("/sys/fs/cgroup/memory/kubepods/burstable", 0775) = -1 EEXIST (File exists)
132066 mkdir("/sys/fs/cgroup/memory/kubepods/burstable/IODINE", 0775) = -1 ENOMEM (Cannot allocate memory)                   <<=----------------
132066 stat("/sys/fs/cgroup/memory/kubepods/burstable/IODINE", 0x7ffea363eac0) = -1 ENOENT (No such file or directory)
132066 write(2, "cgcreate: can't create cgroup ku"..., 87) = 87
132066 exit_group(50007)                = ?
132066 +++ exited with 87 +++


I do see that 'ip_vs' is loaded, but I think problem is different here than bug 1540585. Could you confirm and give suggestions.

I am asking customer to test 1062.18.1.el7 because of bug 1768386 [Sanitize MM backported code for RHEL7 [rhel-7.7.z]]
I will share the result if he is able to test.

He did not got a reproducer, but it happens almost twice a week on production kubernetes server.

Comment 137 errata-xmlrpc 2020-03-31 19:12:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1016

Comment 138 Douglas Duckworth 2020-11-06 18:42:42 UTC

Hello,

I am getting this bug on 3.10.0-1062.18.1.el7.x86_64.

Nov  5 09:19:21 server-with-a-problem dockerd: time="2020-11-05T09:19:21.562680365-05:00" level=error msg="Handler for POST /containers/4e00783f74ad9033b146fb5ef7a2c2499ffb74e25cf7800fe6e479a1385180c5/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused \"process_linux.go:297: applying cgroup configuration for process caused \\\"mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod5e1a5f41-c6ce-47db-b640-e9db42f79b37/4e00783f74ad9033b146fb5ef7a2c2499ffb74e25cf7800fe6e479a1385180c5: cannot allocate memory\\\"\": unknown"

I am ***not using cgroup.memory=nokmem*** as I thought the bug was fixed in 1062.18.1.

Has anyone tested 1062.18.1?

I have obviously not experienced the bug with cgroup.memory=nokmem.

Thanks,
Doug