Bug 1921923 - Kernel under-reporting or hiding about a third of available memory in "debug" 5.11 series due to use of CONFIG_KASAN
Summary: Kernel under-reporting or hiding about a third of available memory in "debug"...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-28 21:05 UTC by Adam Williamson
Modified: 2022-02-11 21:17 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-11 21:17:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg 5.11.0-0.rc5.134.fc34.x86_64+debug (142.52 KB, text/plain)
2021-01-30 18:35 UTC, Chris Murphy
no flags Details
config 5.11.0-0.rc5.134.fc34.x86_64+debug (226.96 KB, text/plain)
2021-01-30 18:38 UTC, Chris Murphy
no flags Details
diff config-5.11debug config-5.10debug (30.55 KB, text/plain)
2021-01-30 19:11 UTC, Chris Murphy
no flags Details

Description Adam Williamson 2021-01-28 21:05:27 UTC
Since Fedora-Rawhide-20210127.n.1 (when kernel 5.11.0-0.rc5.20210127git2ab38c17aac1.136.fc34 landed), Rawhide seems to be under-reporting or hiding or reserving or doing...something with about a third of the RAM available to the system. This is affecting openQA tests run in qemu, and cmurf reports it affecting some other systems too; not sure about bare metal yet.

openQA VMs usually have 2GB of RAM. With rc4 and earlier kernels, 'free' shows around 2GB of total memory; with the affected kernel, it shows around 1.4GB. We also see these messages during early boot:

mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 1370176K/2096608K available (43019K kernel code, 11036K rwdata, 27184K rodata, 5036K init, 31780K bss, 726176K reserved, 0K cma-reserved)

with an rc4 kernel, it looked like this:

mem auto-init: stack:byref_all(zero), heap alloc:off, heap free:off
Memory: 1982588K/2096604K available (14345K kernel code, 3467K rwdata, 9732K rodata, 2548K init, 5516K bss, 113756K reserved, 0K cma-reserved)

it seems like that's logging a bunch of reservations which all suddenly increased significantly in size.

Comment 1 Chris Murphy 2021-01-28 21:12:25 UTC
I'm reproducing it in any "debug" kernel going back to rc3.


5.11.0-0.rc3.122.fc34.x86_64+debug
Jan 28 13:59:35 flap.local kernel: Memory: 6561012K/8276388K available (43019K kernel code, 10978K rwdata, 14408K rodata, 5028K init, 31864K bss, 1715120K reserved, 0K cma-reserved)

5.11.0-0.rc4.129.fc34.x86_64+debug
Jan 28 13:57:40 flap.local kernel: Memory: 6561008K/8276388K available (43019K kernel code, 11031K rwdata, 14488K rodata, 5032K init, 31788K bss, 1715124K reserved, 0K cma-reserved)

5.11.0-0.rc5.134.fc34.x86_64+debug
Jan 28 13:56:08 flap.local kernel: Memory: 6547356K/8276392K available (43019K kernel code, 11036K rwdata, 27184K rodata, 5036K init, 31780K bss, 1728780K reserved, 0K cma-reserved)

5.11.0-0.rc5.134.fc34.x86_64
Jan 28 13:54:19 flap.local kernel: Memory: 7915056K/8276392K available (16393K kernel code, 3469K rwdata, 26796K rodata, 2544K init, 5484K bss, 361076K reserved, 0K cma-reserved)


The reserved allocation is huge in the debug kernels.

Comment 2 Chris Murphy 2021-01-28 22:08:17 UTC
5.10.11-200.fc33.x86_64
Jan 28 15:05:33 flap.local kernel: Memory: 7935620K/8276392K available (14345K kernel code, 3465K rwdata, 9704K rodata, 2536K init, 5504K bss, 340512K reserved, 0K cma-reserved)

5.10.11-200.fc33.x86_64+debug
Jan 28 15:04:32 flap.local kernel: Memory: 7907936K/8276392K available (16394K kernel code, 4567K rwdata, 10024K rodata, 4328K init, 20748K bss, 368196K reserved, 0K cma-reserved)

Looks like it is both 5.11 and debug specific.

Comment 3 Chris Murphy 2021-01-28 22:11:49 UTC
Could it be GCC 11.0.0 related, triggered by one of the kernel debug options? All the 5.10 kernels are made with GCC 10.2.1, and 5.11 kernels are made with GCC 11.0.0.

Comment 4 Justin M. Forbes 2021-01-29 21:56:45 UTC
To see if it is gcc related, this is yesterday's kernel built with f33 if anyone wants to test when it is done:

https://koji.fedoraproject.org/koji/taskinfo?taskID=60846680

Comment 5 Chris Murphy 2021-01-30 08:54:56 UTC
> https://koji.fedoraproject.org/koji/taskinfo?taskID=60846680

x86_64 appears to be stuck

Comment 6 Chris Murphy 2021-01-30 18:35:42 UTC
Created attachment 1752459 [details]
dmesg 5.11.0-0.rc5.134.fc34.x86_64+debug

Comment 7 Chris Murphy 2021-01-30 18:38:41 UTC
Created attachment 1752461 [details]
config 5.11.0-0.rc5.134.fc34.x86_64+debug

Comment 8 Chris Murphy 2021-01-30 19:11:59 UTC
Created attachment 1752470 [details]
diff config-5.11debug config-5.10debug

$ diff -up config-5.10.11-200.fc33.x86_64+debug config-5.11.0-0.rc5.134.fc34.x86_64+debug

Comment 9 Chris Murphy 2021-01-30 19:44:47 UTC
There's a bunch of extra KASAN stuff enabled in the 5.11 debug kernel. Generic KASAN dedicates 1/8th of kernel memory to its shadow memory which would account for 1/2 of what's missing, but I've got no idea if it shows up in reserved or if it reduces /proc/meminfo MemTotal.

Comment 10 Chris Murphy 2021-01-30 23:12:22 UTC
Compile v5.11-rc5 from kernel.org git using config-5.11.0-0.rc5.134.fc34.x86_64+debug, and gcc 11.0.0, and I get this:

Jan 30 15:13:02 fmac.local kernel: Memory: 10083432K/12493424K available (40971K kernel code, 10760K rwdata, 26332K rodata, 5000K init, 30200K bss, 2409736K reserved, 0K cma-reserved)

Next disable these, which are enabled in 5.11 debug but not 5.10 debug:

-CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
-CONFIG_KASAN=y
-CONFIG_KASAN_GENERIC=y
-CONFIG_KASAN_INLINE=y
-CONFIG_KASAN_STACK=1
-CONFIG_KASAN_VMALLOC=y

make -j8

Initialize kernel stack variables at function entry
> 1. no automatic initialization (weakest) (INIT_STACK_NONE)
  2. zero-init structs marked for userspace (weak) (GCC_PLUGIN_STRUCTLEAK_USER)
  3. zero-init structs passed by reference (strong) (GCC_PLUGIN_STRUCTLEAK_BYREF) (NEW)
  4. zero-init anything passed by reference (very strong) (GCC_PLUGIN_STRUCTLEAK_BYREF_ALL) (NEW)
choice[1-4?]: 1
Poison kernel stack before returning from syscalls (GCC_PLUGIN_STACKLEAK) [N/y/?] n
Enable heap memory zeroing on allocation by default (INIT_ON_ALLOC_DEFAULT_ON) [N/y/?] n
Enable heap memory zeroing on free by default (INIT_ON_FREE_DEFAULT_ON) [N/y/?] n
*
* KCSAN: dynamic data race detector
*
KCSAN: dynamic data race detector (KCSAN) [N/y/?] (NEW) n
*
* KASAN: runtime memory debugger
*
KASAN: runtime memory debugger (KASAN) [N/y/?] (NEW) n

And now I get this:
Jan 30 16:01:34 fmac.local kernel: Memory: 11955664K/12493424K available (16394K kernel code, 4384K rwdata, 22168K rodata, 4316K init, 21020K bss, 537500K reserved, 0K cma-reserved)

So it looks like it's strictly the newly enabled CONFIG_KASAN options.

Comment 11 Adam Williamson 2021-01-31 17:58:19 UTC
Thanks a lot for working that out, Chris!

Comment 12 Justin M. Forbes 2021-02-01 22:29:19 UTC
This makes sense, and some of those options don't exist without gcc11.  Security specifically requested those be turned on.

Comment 13 Chris Murphy 2021-02-01 22:58:41 UTC
I think this is an exceptional memory hit for Rawhide users though, and anecdotally is seems to slow things down too. So...is it reasonable to enable KASAN in a subset of kernels? I'm not sure what strategy makes sense, all the rc0 and rc1 kernels? Or the first kernel built for each rc? But all debug kernels from now on? Eek.

Comment 14 Justin M. Forbes 2021-02-01 23:09:42 UTC
The first kernels built for each rc are actually non-debug kernels, so that people have a chance to test weekly updates without debuginfo.

Comment 15 Josh Poimboeuf 2021-02-01 23:28:00 UTC
KASAN was enabled on the Fedora debug kernel in the interest of consistency with ARK.  I believe that was a suggestion which came up in review (https://gitlab.com/cki-project/kernel-ark/-/merge_requests/786).

But maybe the use cases for debug kernels are different between Fedora and ARK.  Shall I disable KASAN on Fedora?

Comment 16 Adam Williamson 2021-02-01 23:54:59 UTC
Not sure how it is for that project, but the significant thing for Fedora is that most official kernel builds for Rawhide are debug kernels. That means that Rawhide testers and test systems which test Rawhide are using those kernels, so it can be a problem if the debugging stuff is so heavy it's overwhelming the system.

Comment 17 Justin M. Forbes 2021-02-01 23:56:05 UTC
Yes, so debug kernels get run by regular users of Fedora quite frequently. While in theory it seemed okay, in practice, it seems to be too heavy for regular use. I am happy to turn it off unless someone can give me a really good reason not to.

Comment 18 Adam Williamson 2021-02-01 23:58:47 UTC
I think we'd definitely get more consistent results out of openQA if Rawhide usually ran non-debug kernels. In the rare event we actually do hit a kernel bug and need debugging data, we'd probably be able to edit a test to switch to a debug kernel and get it, I'd think. Most folks running Rawhide would be happy too, I think. So the question is only whether we're actually getting any useful data out of having Rawhide run debug kernels almost all the time...

Comment 19 Alexander Bokovoy 2021-02-02 15:47:20 UTC
With the latest Rawhide compose 20210201.n.1, FreeIPA related tests succeed with kernel 5.11.0-rc6.141.fc34:

https://openqa.fedoraproject.org/tests/767587#dependencies

Comment 20 Justin M. Forbes 2021-02-02 16:11:23 UTC
5.11.0-rc6.141.fc34 is not a debug kernel.

Comment 21 Chris Murphy 2022-02-11 21:17:04 UTC
This was turned off and hasn't been a problem since.


Note You need to log in before you can comment on or make changes to this bug.