Bug 175616 - [RHEL 4 U2] kernel panic on EM64T with long cmdline args
[RHEL 4 U2] kernel panic on EM64T with long cmdline args
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Jim Paradis
Brian Brock
:
Depends On:
Blocks: 181409 185624
  Show dependency treegraph
 
Reported: 2005-12-13 08:45 EST by Brian Long
Modified: 2013-08-05 21:17 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 17:44:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
HP DL360 G4 during kickstart of RHEL 4 U2 (94.36 KB, image/png)
2005-12-13 08:46 EST, Brian Long
no flags Details
HP DL380 G4 GRUB boot with 2.6.9-22.0.1.EL UP (79.00 KB, image/png)
2005-12-13 08:47 EST, Brian Long
no flags Details
DL585 Quad Opteron panic (78.90 KB, image/png)
2005-12-13 11:58 EST, Brian Long
no flags Details

  None (edit)
Description Brian Long 2005-12-13 08:45:51 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.0.7-1.4.1 Firefox/1.0.7

Description of problem:
I am kickstarting from the extracted RHEL 4 U2 ISO images although this problem happens outside of kickstart as well.  The kernel normally allows for up to 255 characters to be specified as boot parameters.  In kickstart, these boot parameters are passed to loader and Anaconda.  On any system except the HP DL360 G4 and DL380 G4 (both dual Xeon EM64T), the following kernel command-line is acceptable (from syslinux.cfg or pxelinux.cfg):

append initrd=pxe-tftp.esl.cisco.com::images/releases/5.00b4-4/x86_64/initrd.img ks=http://wwwin-kickstart-dev/cgi-bin/pxe/getkscfg?macaddr=00:12:79:3d:f7:d8 ramdisk_size=8192 ksdevice=00:12:79:3d:f7:d8  nostorage

As you can see, the kernel append line is 207 characters.  The x86_64 kickstart environment crashes the kernel with this append line.  The i686 kickstart works fine.  If I shorten the append line to 176 characters or less, the kickstart on x86_64 completes successfully.

This appears to be an x86_64 kernel issue only on the EM64T platform running the UP kernel.  If I pass a long command line to the SMP x86_64 kernel, the host boots without problems.

Please note, I can reproduce this on a running system as well.  If I configure GRUB to boot the 2.6.9-22.EL UP or 2.6.9-22.0.1.EL UP kernel and specify a large command-line, it will fail.  For example, if I specify:

ro root=LABEL=/ quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet console=tty

The kernel will crash.  If I shorten to less than 176 characters, the kernel will boot properly.

Version-Release number of selected component (if applicable):
kernel-2.6.9-22.EL

How reproducible:
Always

Steps to Reproduce:
1. Boot UP kernel on EM64T host
2. Specify command line 177 characters or longer
3. Kernel oops
  

Actual Results:  Kernel oops (screenshot attached)

Expected Results:  Kernel should boot fine.

Additional info:

This also happens in the 2.6.9-22.0.1.EL errata kernel.
Comment 1 Brian Long 2005-12-13 08:46:50 EST
Created attachment 122178 [details]
HP DL360 G4 during kickstart of RHEL 4 U2
Comment 2 Brian Long 2005-12-13 08:47:42 EST
Created attachment 122179 [details]
HP DL380 G4 GRUB boot with 2.6.9-22.0.1.EL UP
Comment 3 Brian Long 2005-12-13 08:59:51 EST
This may be related to
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164643 which was never
resolved from U1.
Comment 4 Brian Long 2005-12-13 11:58:07 EST
Created attachment 122183 [details]
DL585 Quad Opteron panic

As it turns out, I am able to duplicate this problem on an HP, the DL585 quad
opteron server trying to kickstart using the x86_64 UP kernel.	However, other
opterons (Sun V20z and V40z) do not have this problem.

However, unlike the EM64T systems, once the DL585 has the x86_64 OS installed,
I am able to boot the UP kernel with a 255-character command-line and it does
not crash.
Comment 5 Jim Paradis 2005-12-15 23:01:22 EST
I don't think it's related to Bug 164643; that one is a NULL pointer
dereference.  This one is different.

Disassembling the kernel and looking at the faulting address, we see that
start_kernel() is trying to call late_time_init().  late_time_init() is in fact
a pointer to a routine, so it's stored in kernel data space.  Immediately
preceeding it is "saved_command_line".  For some reason, although the main
command line buffer has been boosted to 2048 on x86_64, saved_command_line is
still at 256.  If you note carefully the RAX register (which is supposed to
contain the "late_time_init" routine pointer or NULL) seems to contain ASCII data.

Should be a straightforward fix.
Comment 6 Andy Gospodarek 2005-12-16 07:30:25 EST
I saw the same thing and actually have it in the Issue Tracker.  Sorry I didn't
update the BZ as well.  

It seems we are crashing in "init/main.c" line 544 (address 0xffffffff80539668)
because 'late_time_init' is set to a bogus value (0x656c6f736e6f6320 but not
NULL) stored in register %eax.    Here is the vmlinux dissassembled.

ffffffff80539649:       00 00 00 00
ffffffff8053964d:       e8 be 16 01 00          callq  ffffffff8054ad10
<vfs_caches_init_early>
ffffffff80539652:       e8 87 e9 00 00          callq  ffffffff80547fde <mem_init>
ffffffff80539657:       e8 08 0d 01 00          callq  ffffffff8054a364
<kmem_cache_init>
ffffffff8053965c:       48 8b 05 bd d1 f6 ff    mov    -601667(%rip),%rax      
 # ffffffff804a6820 <late_time_init>
ffffffff80539663:       48 85 c0                test   %rax,%rax
ffffffff80539666:       74 02                   je     ffffffff8053966a
<start_kernel+0x1ef>
ffffffff80539668:       ff d0                   callq  *%eax
ffffffff8053966a:       e8 91 29 bd ff          callq  ffffffff8010c000
<calibrate_delay>
ffffffff8053966f:       e8 d7 f8 00 00          callq  ffffffff80548f4b
<pidmap_init>
ffffffff80539674:       e8 b5 0c 01 00          callq  ffffffff8054a32e
<prio_tree_init>
ffffffff80539679:       e8 09 10 01 00          callq  ffffffff8054a687
<anon_vma_init>

Here is the source that matches it from "init/main.c"
      vfs_caches_init_early();
      mem_init();
      kmem_cache_init();
      numa_policy_init();
      if (late_time_init)
              late_time_init();
      calibrate_delay();
      pidmap_init();
      pgtable_cache_init();
      prio_tree_init();
      anon_vma_init();

When dumped the all sections of the kernel I find that 'late_time_init' sits
right after 'saved_command_line.'  

ffffffff804a6700 g     O .bss   0000000000000004 system_state
ffffffff804a6720 g     O .bss   0000000000000100 saved_command_line
ffffffff804a6820 g     O .bss   0000000000000008 late_time_init
ffffffff804a6828 l     O .bss   0000000000000008 execute_command

The data in late_time_init looks like the continuation of the ASCII string from
saved_command_line, so we should consider lengthening that buffer, or find some
other way not to allow more data than 256 bytes to be copied into it.
Comment 7 Brian Long 2005-12-16 09:31:41 EST
So I understand the U3 kernel is frozen with regards to features, but is this
bug bad enough to warrant being fixed in the U3 candidate (i.e. U3 beta)?  Thanks!
Comment 8 Rick Beldin 2005-12-16 10:46:14 EST
The value 0x656c6f736e6f6320 turns out to be 'console' (backwards - endianess?).

If this is from the command line above

ro root=LABEL=/ quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet
quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet
quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet quiet
quiet quiet console=tty

That line is 256 chars long.  which means that 'console' is overwriting at about
245... Odd... 
Comment 9 Brian Long 2005-12-16 11:12:10 EST
Remove a few of the quiet words and the kernel will still oops.  It will oops
all the way down to 176 characters.
Comment 18 Jim Paradis 2006-03-16 16:42:00 EST
The patch in Comment 15 has been posted to rhkernel-list on 16 Dec 2005 and is a
candidate for inclusion in U4.
Comment 20 Bob Johnson 2006-04-11 12:41:58 EDT
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.
Comment 22 Jason Baron 2006-05-03 13:02:07 EDT
committed in stream U4 build 34.28. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 25 Red Hat Bugzilla 2006-08-10 17:44:46 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html
Comment 27 Linda Wang 2006-12-06 13:31:42 EST
*** Bug 185524 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.