Bug 199138
Summary: | kernel panic during install bootup from create_gate_table | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Doug Chapman <dchapman> |
Component: | gcc | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | rawhide | CC: | jakub, prarit, wtogami |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | ia64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-31 18:57:34 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 163350, 199595, 199634 |
Description
Doug Chapman
2006-07-17 14:31:58 UTC
Same results on kernel-2.6.17-1.2391.fc6 Same on 2.6.17-1.2405.fc6 from the rawhide-20060717 tree as well. I discovered that if I build the kernel from source on a RHEL4.4 system I can boot it there. If I install the kernel rpm binary on that RHEL4.4 system I see the same panic as above. So, appears to be releated to the build environment somehow. This appears to be a problem introduced with gcc-4.1.1.7. If I boot a kernel that was built with 4.1.1.6 (last kernel built was 2372) I do not see a panic. If I boot a kernel built with 4.1.1.7 (first kernel built was 2391) I get an oops. I installed FC5 "unofficial" ia64 and built a kernel using gcc 4.1.1 . I then yum updated all the packages on the system to rawhide latest and installed the 4.1.1 kernel to get a rawhide latest box. I compiled the kernel using 4.1.1.8 (which is the latest gcc) and the kernel panics as above. Doug is looking closely at the panic, while I'm searching through gcc to see if we can narrow down the problem. P. I have some more details from looking at this from the kernel side. The reason for the panic is - at arch/ia64/kernel/unwind.c:2179 end = (struct unw_table_entry *) ((char *) start + punw->p_memsz); ^^^^^^^^^^^^^ the value of punw->p_memsz is wrong. With the recent compilers this is 0x7c while with either an older FC6 or an RHEL4 compiler it is always 0x48. Note that the address where this lives is based on some constants and I have verified that punw as well as &punw->p_memsize is the same regardless of the compiler version so it appears we are looking in the right location. So, now I need to determine where the value for punw->p_memsz gets initialized, appears that either it is being initialized wrong or something is overwriting it. Another bit of useful info. I get the same panic if I compile 2.6.17 without any of the redhat patches. We should discuss this with ia64-list. Er ... when you're compiling you're using the RH gcc? What happens if you compile 2.6.17 + no RH patches + "trunk" gcc? Still not settled that it is a kernel issue ;) P. punw->p_memsz comes from the unwind info for the ELF header, this gets plugged into the kernel via a linker script: arch/ia64/kernel/gate.lds.S the linker calls this "structure" .IA_64.unwind_info. I assume this is generated by the compiler but it might be the assembler or even the linker itself. If we find the code that generates this then I bet we have our culprit. By using kdb I was able to determine that the rest of the structure pointed to by punw (which is of type Elf64_Phdr) looks good except for p_memsz and p_filesz. Both are the same (incorrect) value. I've left both the linker and the assembler as constants during the tests. So I'm leaning toward gcc for now ... Still testing ... P. I ran a few tests: I built a kernel with the 20060711 upstream RH version of gcc and the kernel boots without any issues. I built a kernel with the 20060711 RH RPM version of gcc and the kernel does not boot. I can flip between gcc's on my system by setting an alias for one or another. By switching between gcc's I can generate kernels that do boot and kernels that do not. I also tried building gcc from the RPM sources using the .configure options provided from [root@altix3 ~]# /usr/bin/gcc -v Using built-in specs. Target: ia64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=ia64-redhat-linux Thread model: posix gcc version 4.1.1 20060711 (Red Hat 4.1.1-7) I used this self-built RPM version to build a kernel and the resultant kernel booted without any issues. It clearly looks like gcc is the culprit, or at least some mismatch of gcc and libraries. Jakub, any ideas on what else to try? P. this problem has since been resolved and verified. |