Bug 175616
Summary: | [RHEL 4 U2] kernel panic on EM64T with long cmdline args | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Brian Long <brilong> | ||||||||
Component: | kernel | Assignee: | Jim Paradis <jparadis> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.0 | CC: | jbaron, k.georgiou, peterm, rick.beldin, tao | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHSA-2006-0575 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2006-08-10 21:44:45 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 181409, 185624 | ||||||||||
Attachments: |
|
Description
Brian Long
2005-12-13 13:45:51 UTC
Created attachment 122178 [details]
HP DL360 G4 during kickstart of RHEL 4 U2
Created attachment 122179 [details]
HP DL380 G4 GRUB boot with 2.6.9-22.0.1.EL UP
This may be related to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164643 which was never resolved from U1. Created attachment 122183 [details]
DL585 Quad Opteron panic
As it turns out, I am able to duplicate this problem on an HP, the DL585 quad
opteron server trying to kickstart using the x86_64 UP kernel. However, other
opterons (Sun V20z and V40z) do not have this problem.
However, unlike the EM64T systems, once the DL585 has the x86_64 OS installed,
I am able to boot the UP kernel with a 255-character command-line and it does
not crash.
I don't think it's related to Bug 164643; that one is a NULL pointer dereference. This one is different. Disassembling the kernel and looking at the faulting address, we see that start_kernel() is trying to call late_time_init(). late_time_init() is in fact a pointer to a routine, so it's stored in kernel data space. Immediately preceeding it is "saved_command_line". For some reason, although the main command line buffer has been boosted to 2048 on x86_64, saved_command_line is still at 256. If you note carefully the RAX register (which is supposed to contain the "late_time_init" routine pointer or NULL) seems to contain ASCII data. Should be a straightforward fix. I saw the same thing and actually have it in the Issue Tracker. Sorry I didn't update the BZ as well. It seems we are crashing in "init/main.c" line 544 (address 0xffffffff80539668) because 'late_time_init' is set to a bogus value (0x656c6f736e6f6320 but not NULL) stored in register %eax. Here is the vmlinux dissassembled. ffffffff80539649: 00 00 00 00 ffffffff8053964d: e8 be 16 01 00 callq ffffffff8054ad10 <vfs_caches_init_early> ffffffff80539652: e8 87 e9 00 00 callq ffffffff80547fde <mem_init> ffffffff80539657: e8 08 0d 01 00 callq ffffffff8054a364 <kmem_cache_init> ffffffff8053965c: 48 8b 05 bd d1 f6 ff mov -601667(%rip),%rax # ffffffff804a6820 <late_time_init> ffffffff80539663: 48 85 c0 test %rax,%rax ffffffff80539666: 74 02 je ffffffff8053966a <start_kernel+0x1ef> ffffffff80539668: ff d0 callq *%eax ffffffff8053966a: e8 91 29 bd ff callq ffffffff8010c000 <calibrate_delay> ffffffff8053966f: e8 d7 f8 00 00 callq ffffffff80548f4b <pidmap_init> ffffffff80539674: e8 b5 0c 01 00 callq ffffffff8054a32e <prio_tree_init> ffffffff80539679: e8 09 10 01 00 callq ffffffff8054a687 <anon_vma_init> Here is the source that matches it from "init/main.c" vfs_caches_init_early(); mem_init(); kmem_cache_init(); numa_policy_init(); if (late_time_init) late_time_init(); calibrate_delay(); pidmap_init(); pgtable_cache_init(); prio_tree_init(); anon_vma_init(); When dumped the all sections of the kernel I find that 'late_time_init' sits right after 'saved_command_line.' ffffffff804a6700 g O .bss 0000000000000004 system_state ffffffff804a6720 g O .bss 0000000000000100 saved_command_line ffffffff804a6820 g O .bss 0000000000000008 late_time_init ffffffff804a6828 l O .bss 0000000000000008 execute_command The data in late_time_init looks like the continuation of the ASCII string from saved_command_line, so we should consider lengthening that buffer, or find some other way not to allow more data than 256 bytes to be copied into it. So I understand the U3 kernel is frozen with regards to features, but is this bug bad enough to warrant being fixed in the U3 candidate (i.e. U3 beta)? Thanks! The value 0x656c6f736e6f6320 turns out to be 'console' (backwards - endianess?). If this is from the command line above ro root=LABEL=/ quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet console=tty That line is 256 chars long. which means that 'console' is overwriting at about 245... Odd... Remove a few of the quiet words and the kernel will still oops. It will oops all the way down to 176 characters. The patch in Comment 15 has been posted to rhkernel-list on 16 Dec 2005 and is a candidate for inclusion in U4. This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release. committed in stream U4 build 34.28. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html *** Bug 185524 has been marked as a duplicate of this bug. *** |