Bug 1881829 - luksFormat, luksAddKey etc. cause OOM crash
Summary: luksFormat, luksAddKey etc. cause OOM crash
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cryptsetup
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.0
Assignee: Ondrej Kozina
QA Contact: guazhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 07:34 UTC by Martin Pitt
Modified: 2021-06-03 00:22 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-23 07:53:40 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Martin Pitt 2020-09-23 07:34:31 UTC
Description of problem: In current CentOS 8 stream, cryptsetup keeps triggering the OOM killer. I noticed this in our Cockpit integration tests when moving from CentOS 8 to -Stream [1].

Our test VMs only have 1.1 GiB of RAM, but there's > 500 MiB available still, which certainly out to be enough for simple operations:

# free -h
              total        used        free      shared  buff/cache   available
Mem:          934Mi       259Mi       355Mi       0,0Ki       320Mi       539Mi


Version-Release number of selected component (if applicable):

cryptsetup-2.3.3-2.el8.x86_64
kernel-4.18.0-236.el8.x86_64

How reproducible: Always


Steps to Reproduce:
1. dd if=/dev/zero of=/var/tmp/img bs=1M count=500
2. cryptsetup luksFormat /var/tmp/img

Actual results:

Aborts with "Killed". dmesg shows

cryptsetup invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
CPU: 0 PID: 4150 Comm: cryptsetup Not tainted 4.18.0-236.el8.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x5c/0x80
 dump_header+0x51/0x308
 ? virtballoon_oom_notify+0x25/0x70 [virtio_balloon]
 oom_kill_process.cold.28+0xb/0x10
 out_of_memory+0x1c1/0x4b0
 __alloc_pages_slowpath+0xc24/0xd40
 ? __alloc_pages_slowpath+0xcfc/0xd40
 ? page_counter_try_charge+0x57/0xc0
 __alloc_pages_nodemask+0x245/0x280
 alloc_pages_vma+0x74/0x1d0
 do_anonymous_page+0x91/0x360
 __handle_mm_fault+0x77c/0x7c0
 handle_mm_fault+0xc2/0x1d0
 __get_user_pages+0x260/0x7a0
 populate_vma_page_range+0x6d/0x70
 __mm_populate+0x9d/0x140
 vm_mmap_pgoff+0x110/0x120
 ksys_mmap_pgoff+0x59/0x270
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7ff358a46967
Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74 52 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 79 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f
RSP: 002b:00007ffca3a63428 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff358a46967
RDX: 0000000000000003 RSI: 000000001d36f000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 000000001d36f000 R14: 0000000000000022 R15: 00000000ffffffff
Mem-Info:
active_anon:40430 inactive_anon:67 isolated_anon:0
 active_file:4 inactive_file:4 isolated_file:0
 unevictable:154863 dirty:0 writeback:0 unstable:0
 slab_reclaimable:6757 slab_unreclaimable:14491
 mapped:55893 shmem:153 pagetables:3595 bounce:0
 free:11894 free_pcp:14 free_cma:0
Node 0 active_anon:161720kB inactive_anon:268kB active_file:16kB inactive_file:16kB unevictable:619452kB isolated(anon):0kB isolated(file):0kB mapped:223572kB dirty:0kB writeback:0kB shmem:612kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 270336kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Node 0 DMA free:4168kB min:760kB low:948kB high:1136kB active_anon:464kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:10248kB writepending:0kB present:15992kB managed:15360kB mlocked:10248kB kernel_stack:0kB pagetables:40kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 853 853 853 853
Node 0 DMA32 free:43408kB min:43432kB low:54288kB high:65144kB active_anon:161256kB inactive_anon:268kB active_file:16kB inactive_file:16kB unevictable:609192kB writepending:28kB present:1163116kB managed:941936kB mlocked:609192kB kernel_stack:3136kB pagetables:14340kB bounce:0kB free_pcp:56kB local_pcp:56kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 30*4kB (UM) 18*8kB (UM) 4*16kB (U) 6*32kB (U) 3*64kB (U) 3*128kB (U) 0*256kB 2*512kB (UM) 2*1024kB (UM) 0*2048kB 0*4096kB = 4168kB
Node 0 DMA32: 1012*4kB (UME) 812*8kB (UME) 386*16kB (UME) 110*32kB (UE) 44*64kB (UE) 19*128kB (UME) 10*256kB (UME) 2*512kB (UM) 14*1024kB (M) 0*2048kB 0*4096kB = 43408kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
56045 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
294777 pages RAM
0 pages HighMem/MovableOnly
55453 pages reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  751]     0   751    29668     1129   278528        0             0 systemd-journal
[  775]     0   775    29729     1484   237568        0         -1000 systemd-udevd
[  859]    32   859    16786      847   172032        0             0 rpcbind
[  870]   998   870   409089     2554   339968        0             0 polkitd
[  875]     0   875   106146     1276   450560        0             0 sssd
[  885]    81   885    18009      743   176128        0          -900 dbus-daemon
[  888]   993   888    32229      546   139264        0             0 chronyd
[  895]     0   895    20915      459   208896        0             0 systemd-machine
[  911]   991   911    40029      980   208896        0             0 rngd
[  920]     0   920   107800      704   458752        0             0 sssd_be
[  924]     0   924    58982      376    77824        0             0 ksmtuned
[  934]     0   934   108685     1128   466944        0             0 sssd_nss
[  988]     0   988    24715      723   229376        0             0 systemd-logind
[ 1109]     0  1109   150482     1113   405504        0             0 NetworkManager
[ 1121]     0  1121   159089     4510   458752        0             0 tuned
[ 1125]     0  1125    62888      133   114688        0             0 rhsmcertd
[ 1126]     0  1126    77883      610   188416        0             0 gssproxy
[ 1217]     0  1217   345575     3509   786432        0             0 libvirtd
[ 1220]     0  1220    54409      770   188416        0             0 rsyslogd
[ 1222]     0  1222    23073      985   212992        0         -1000 sshd
[ 1223]     0  1223    56581      277    65536        0             0 agetty
[ 1224]     0  1224    56491      398    77824        0             0 agetty
[ 1226]     0  1226    58267      446   110592        0             0 crond
[ 1343]     0  1343    40926     1003   323584        0             0 sshd
[ 1373]     0  1373    23478     1007   221184        0             0 systemd
[ 1378]   983  1378    18340      185   147456        0             0 dnsmasq
[ 1381]     0  1381    18315      108   139264        0             0 dnsmasq
[ 1388]     0  1388    60801     1250   315392        0             0 (sd-pam)
[ 1404]     0  1404    40926      858   311296        0             0 sshd
[ 1407]     0  1407    57413      414    73728        0             0 bash
[ 1476]     0  1476   196648     1969   471040        0             0 udisksd
[ 1588]   994  1588    22612      425   135168        0             0 cockpit-tls
[ 1591]   992  1591    77956      597   245760        0             0 cockpit-ws
[ 1596]     0  1596    38985      856   327680        0             0 cockpit-session
[ 1601]  1000  1601     6851      130    86016        0             0 ssh-agent
[ 1604]  1000  1604    23478      363   217088        0             0 systemd
[ 1607]  1000  1607    64420     1319   323584        0             0 (sd-pam)
[ 1626]  1000  1626   174419     2784   307200        0             0 cockpit-bridge
[ 1631]     0  1631    91747      962   303104        0             0 sudo
[ 1636]     0  1636   118695     3020   274432        0             0 cockpit-bridge
[ 1655]  1000  1655    57888     1182   344064        0             0 cockpit-pcp
[ 1666]     0  1666    72154      322   200704        0             0 timedatex
[ 1668]     0  1668    86934     2138   299008        0             0 realmd
[ 1707]     0  1707   206808     1922   569344        0             0 packagekitd
[ 1754]     0  1754    78680     2947   241664        0             0 platform-python
[ 1768]     0  1768    66981     1426   147456        0             0 platform-python
[ 3181]     0  3181    40926     1402   335872        0             0 sshd
[ 3184]     0  3184    40926      921   323584        0             0 sshd
[ 3185]     0  3185    59001      667   102400        0             0 bash
[ 3610]     0  3610    66357     2167   139264        0             0 platform-python
[ 3614]     0  3614    22464     1016   208896        0             0 udevadm
[ 4028]     0  4028    78181      891   233472        0             0 journalctl
[ 4057]     0  4057    66357     2060   135168        0             0 platform-python
[ 4061]     0  4061    22464     1557   208896        0             0 udevadm
[ 4073]     0  4073    40926     1508   335872        0             0 sshd
[ 4076]     0  4076    40926      780   319488        0             0 sshd
[ 4077]     0  4077    59004      755    94208        0             0 bash
[ 4148]     0  4148    54262      198    73728        0             0 sleep
[ 4150]     0  4150   187176   154839  1380352        0             0 cryptsetup
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/session-9.scope,task=cryptsetup,pid=4150,uid=0
Out of memory: Killed process 4150 (cryptsetup) total-vm:748704kB, anon-rss:395800kB, file-rss:223556kB, shmem-rss:0kB, UID:0


Expected results: cryptsetup succeeds.


Additional info:

This happens on "real" drives too, not just an image file. That's just the easiest way to reproduce this.


[1] https://github.com/cockpit-project/bots/pull/1250

Comment 1 Ondrej Kozina 2020-09-23 07:53:40 UTC
LUKS2 format uses memory hard pbkdf. By default it aims to use half system memory top (or 1GB whatever hit first) but it may not be enough in your case. You may lower the memory footprint by --pbkdf-memory option (unit is KiB)

For example "cryptsetup luksFormat --pbkdf-memory 32768" will create device with keyslot having pbkdf memory cost set to 32MiBs.

Or you may increase RAM in your VM. I think 2GB would be enough.

Comment 2 Martin Pitt 2020-09-23 08:59:55 UTC
Respectfully, I disagree about "not a bug" -- this is something that works in RHEL 8.3, Fedora, Debian, and everywhere else. Requiring 1 GB of RAM by default to do such an operation sounds quite ridiculous -- if --pbkdf-memory 32768 works, why isn't that the default then? It's not that hard to check how much memory is available. Such "unbreak my software" options are highly non-discoverable and quite pointless.

Comment 3 Ondrej Kozina 2020-09-23 09:48:30 UTC
Increasing memory footprint of pbkdf is basically point of memory hard functions. I don't think this bug is proper place to discuss theory (and papers) behind it so I'm going to completely ignore all remarks about it being ridiculous, with all respect.

It seems to me you run VM with available system memory below recommended minimum values (see https://wiki.centos.org/About/Product, which is basically copy/paste from here: https://access.redhat.com/articles/rhel-limits). Exactly for small embedded systems or low memory VMs there's --pbkdf-memory argument. Or you may switch back to pbkdf2 with "--pbkdf pbkdf2". I'm not sure what the VM is supposed to test and what purpose the cryptsetup has in it. But we don't plan to change general defaults for custom tailored VMs.

Comment 4 Milan Broz 2020-09-23 10:16:15 UTC
I do not think the discussion needs to be heated here. It is all about trade-off usability vs. security.

We started to use memory-hard key derivation functions to keep our technology protecting long-term keys in disk encryption better. Mainly because existing algorithms are not acceptable if using massive GPUs cracking power etc. (See all papers about PBKDF2 problems).

That said, there is always a way how to downgrade requested parameters. Or even switch to PBKDF2 that was used in LUKS1 - but it is up to the user's decision.

As a default, resources are tested on minimal distro specs. (And there are some limits, based on experience.)
But still, there are situations where this does not work - the main problem is with the awful design of OOM behavior and various systems like ballooning and overcommitting that are designed for systems
where you allocate a lot of memory, but will never use that in reality.

In the memory-hard function, you WILL use that memory, and it must be physical memory (otherwise, run time increases almost exponentially).

Comment 5 Martin Pitt 2020-09-23 10:54:21 UTC
Thanks for the detailed explanations! I don't want to bake a --pbkdf-memory option into cockpit, as that really shouldn't be the place to decide about what's a good default policy. But running with 1.5 GiB RAM works, and the document about the minimum RAM requirement makes sense then. I sent a PR to run the LUKS tests with more memory instead.

Comment 6 Mai Ling 2021-06-02 20:27:48 UTC
I believe this is still a bug in the sense that it should first check for available memory before triggering OOM - i.e. refuse to run with an expplanation message instead of ending up having the kernel invoke OOM. 

btw, I get OOM even on these numbers:

[root@localhost-live ~]# free -mh
               total        used        free      shared  buff/cache   available
Mem:           1.9Gi       236Mi       932Mi       0.0Ki       733Mi       919Mi
Swap:          5.8Gi       114Mi       5.7Gi

is that expected?


Note You need to log in before you can comment on or make changes to this bug.