Red Hat Bugzilla – Bug 199138
kernel panic during install bootup from create_gate_table
Last modified: 2007-11-30 17:11:38 EST
Description of problem:
Booting the kernel for installation on my HP Integrity servers I get the
following panic. Looks like something in create_gate_table. I will try
installing an older rev and upgrade to this kernel to see if I get the same thing.
IP route cache hash table entries: 1048576 (order: 9, 8388608 bytes)
TCP established hash table entries: 4194304 (order: 13, 134217728 bytes)
TCP bind hash table entries: 65536 (order: 7, 2097152 bytes)
TCP: Hash tables configured (established 4194304 bind 65536)
TCP reno registered
perfmon: version 2.0 IRQ 238
perfmon: Itanium 2 PMU detected, 16 PMCs, 18 PMDs, 4 counters (47 bits)
kernel unaligned access to 0xa000000000000634, ip=0xa000000100039eb0
Unable to handle kernel paging request at virtual address a010000600002682
swapper: Oops 8813272891392 
Modules linked in:
Pid: 1, CPU 0, comm: swapper
psr : 00001010085a6010 ifs : 8000000000000590 ip : [<a0000001007082f0>] Not
ip is at create_gate_table+0x150/0x380
unat: 0000000000000000 pfs : 0000000000000590 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000009541
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a0000001007082b0 b6 : a0000001007081a0 b7 : a00000010021a5c0
f6 : 1003e0000000000000040 f7 : 0ffdd8000000000000000
f8 : 10004ffffd27000000000 f9 : 10005b000000000000000
f10 : 1003e0000000000000038 f11 : 1003e0000000000000118
r1 : a000000100b6e160 r2 : 0000000000000000 r3 : e0000040fe569004
r8 : e0000040fd5ed900 r9 : a0000001009725e8 r10 : 0000000000000001
r11 : a000000100972610 r12 : e0000040fe56fd30 r13 : e0000040fe568000
r14 : 0000000000004000 r15 : a000000000000638 r16 : 00000000ffffffff
r17 : a000000000000000 r18 : 00000000000000b0 r19 : 0000000000000090
r20 : 0000000000000012 r21 : 0001000000000012 r22 : a010000600002682
r23 : 0010000600002682 r24 : a000000000000610 r25 : 000000000000007c
r26 : e0000040fd5ed910 r27 : e0000040fd5ed908 r28 : a00000010093ddf0
r29 : 0000000000000000 r30 : a00000010093ddf8 r31 : e0000040fe569004
<0>Kernel panic - not syncing: Attempted to kill init!
Version-Release number of selected component (if applicable):
Only tried 1 time so far, will try other hosts.
Steps to Reproduce:
1. boot the install kernel on ia64
Same results on kernel-2.6.17-1.2391.fc6
Same on 2.6.17-1.2405.fc6 from the rawhide-20060717 tree as well.
I discovered that if I build the kernel from source on a RHEL4.4 system I can
boot it there. If I install the kernel rpm binary on that RHEL4.4 system I see
the same panic as above. So, appears to be releated to the build environment
This appears to be a problem introduced with gcc-184.108.40.206. If I boot a kernel
that was built with 220.127.116.11 (last kernel built was 2372) I do not see a panic.
If I boot a kernel built with 18.104.22.168 (first kernel built was 2391) I get an oops.
I installed FC5 "unofficial" ia64 and built a kernel using gcc 4.1.1 .
I then yum updated all the packages on the system to rawhide latest and installed
the 4.1.1 kernel to get a rawhide latest box.
I compiled the kernel using 22.214.171.124 (which is the latest gcc) and the kernel
panics as above.
Doug is looking closely at the panic, while I'm searching through gcc to see
if we can narrow down the problem.
I have some more details from looking at this from the kernel side. The reason
for the panic is -
end = (struct unw_table_entry *) ((char *) start + punw->p_memsz);
the value of punw->p_memsz is wrong. With the recent compilers this is 0x7c
while with either an older FC6 or an RHEL4 compiler it is always 0x48. Note
that the address where this lives is based on some constants and I have verified
that punw as well as &punw->p_memsize is the same regardless of the compiler
version so it appears we are looking in the right location.
So, now I need to determine where the value for punw->p_memsz gets initialized,
appears that either it is being initialized wrong or something is overwriting it.
Another bit of useful info. I get the same panic if I compile 2.6.17 without
any of the redhat patches. We should discuss this with ia64-list.
Er ... when you're compiling you're using the RH gcc? What happens if you
compile 2.6.17 + no RH patches + "trunk" gcc?
Still not settled that it is a kernel issue ;)
punw->p_memsz comes from the unwind info for the ELF header, this gets plugged
into the kernel via a linker script: arch/ia64/kernel/gate.lds.S
the linker calls this "structure" .IA_64.unwind_info. I assume this is
generated by the compiler but it might be the assembler or even the linker
itself. If we find the code that generates this then I bet we have our culprit.
By using kdb I was able to determine that the rest of the structure pointed to
by punw (which is of type Elf64_Phdr) looks good except for p_memsz and
p_filesz. Both are the same (incorrect) value.
I've left both the linker and the assembler as constants during the tests. So
I'm leaning toward gcc for now ...
Still testing ...
I ran a few tests:
I built a kernel with the 20060711 upstream RH version of gcc and the kernel
boots without any issues.
I built a kernel with the 20060711 RH RPM version of gcc and the kernel does not
I can flip between gcc's on my system by setting an alias for one or another.
By switching between gcc's I can generate kernels that do boot and kernels that
I also tried building gcc from the RPM sources using the .configure options
[root@altix3 ~]# /usr/bin/gcc -v
Using built-in specs.
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
Thread model: posix
gcc version 4.1.1 20060711 (Red Hat 4.1.1-7)
I used this self-built RPM version to build a kernel and the resultant kernel
booted without any issues.
It clearly looks like gcc is the culprit, or at least some mismatch of gcc and
libraries. Jakub, any ideas on what else to try?
this problem has since been resolved and verified.