Bug 1796780
Summary: | kernel-5.6.0-0.rc0.git1.1.fc32.x86_64 panics on boot: Kernel stack is corrupted in: start_secondary+0x1b9/0x1c0 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Petr Pisar <ppisar> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | rawhide | CC: | airlied, atu, bskeggs, extras-qa, hdegoede, hvtaifwkbgefbaei, ichavero, itamar, jakub, jarodwilson, jeremy, jforbes, jglisse, john.j5live, jonathan, josef, jpazdziora, j, kernel-maint, linville, masami256, mchehab, mikhail.v.gavrilov, mjg59, mliska, omosnace, pbrobinson, rjones, steved, terje.rosten, vashirov, yaneti | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-06-12 15:26:53 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Petr Pisar
2020-01-31 08:19:45 UTC
Tried a 5.5.y built with gcc10 on rawhie and with CC_HAS_SANE_STACKPROTECTOR off and it seems to work ok in the qemu test Narrowed it down to CONFIG_STACKPROTECTOR_STRONG , with that turned off rawhide gcc10 built 5.6.0-0.rc0.git1.1.fc32.x86_64 works for me Today I learned about earlycon=efifb and can confirm that the failure is the same on real hardware *** Bug 1797413 has been marked as a duplicate of this bug. *** Adding Jakub to the CC as this is exclusive to GCC 10 and works fine in F31. Given that the start_secondary function calls boot_init_stack_canary, I'd say that is a clear kernel bug - any functions for which the stack canary can change in between their start and end, so e.g. in the kernel's case the boot_init_stack_canary function and anything that calls it, needs to have stack-protector disabled, either from the compiler command line options (-fno-stack-protector) or e.g. using optimize attribute __attribute__((optimize ("no-stack-protector"))) (though, seems that only works with GCC 7 or later). In the past you could just be lucky that nothing has been inlined into the start_secondary function that would trigger the use of stack canary in there. If somebody attaches preprocessed smpboot.i and full gcc command line used to compile it, I can have a quick look at what changed in the inlining decisions or what are the other reasons why it now has a stack canary. I can't boot VMs with kernels after 5.5.7-200.fc31: 5.6.0-0.rc3.git0.1 and 5.6.0-0.rc4.git0.1 hangs and dies. This is under Xen 4.4 hypervisor. (In reply to Terje Røsten from comment #7) > I can't boot VMs with kernels after 5.5.7-200.fc31: 5.6.0-0.rc3.git0.1 and > 5.6.0-0.rc4.git0.1 hangs and dies. > This is under Xen 4.4 hypervisor. The same stack trace? I believe you experience a different bug because Fedora 31 does not use GCC 10 for building the kernel. Created attachment 1673338 [details]
pre-processed source file
I see the same on openSUSE kernel-default (5.5.11-5). The command line used for the file is:
gcc -Wp,-MD,arch/x86/kernel/.smpboot.o.d -nostdinc -isystem /usr/local/lib64/gcc/x86_64-pc-linux-gnu/10.0.1/include -I../arch/x86/include -I./arch/x86/include/generated -I../include -I./include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__KERNEL__ -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -fno-jump-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-address-of-packed-member -O2 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable -fno-var-tracking-assignments -g -gdwarf-4 -pg -mrecord-mcount -mfentry -DCC_USING_FENTRY -fno-inline-functions-called-once -flive-patching=inline-clone -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=../= -fcf-protection=none -Wno-packed-not-aligned -I ../arch/x86/kernel -I ./arch/x86/kernel -DKBUILD_BASENAME='"smpboot"' -DKBUILD_MODNAME='"smpboot"' -c smpboot.i
The significant difference is that now with GCC 10 we do not inline:
call smp_callin
I can see usage of %gs:xyz regment register to access some data but I don't see how is the register itself modified.
Created attachment 1673339 [details]
Assembly for start_secondary with GCC 9
Created attachment 1673340 [details]
Assembly for start_secondary with GCC 10
https://lkml.org/lkml/2020/3/17/746 contains details on what exactly is going on. So did I understand correctly that the fix was made on Mar 17, and it is still not in 4.19.121, released on May 6? |