Bug 238768

Summary: can't determine file type of vmlinux-2.6.20-1.2948.fc6kdump
Product: [Fedora] Fedora Reporter: Zing <zing>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: davej, gdelx001, japj, rhbz001, triage
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 19:33:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 241362    
Attachments:
Description Flags
strace of kexec command
none
test srpm for x86_64
none
latest fc6 kernel
none
dump of debug run that worked
none
patch to retry locate_hole if there is a segment overlap and to limit elfcorehdr buffer to max mmap end none

Description Zing 2007-05-02 20:38:32 UTC
Description of problem:
I've just attempted to setup kdump and it errors out with:

Cannot determine the file type of /boot/vmlinux-2.6.20-1.2948.fc6kdump

Version-Release number of selected component (if applicable):
kexec-tools-1.101-51.fc6
kernel-kdump-2.6.20-1.2948.fc6

How reproducible:
always

Steps to Reproduce:
0. configure kdump.conf
1. service kdump start
  
Actual results:
above error

Expected results:
no error

Additional info:
my /proc/cmdline:
"ro root=LABEL=/ crashkernel=128@16M"

Comment 1 Neil Horman 2007-05-03 10:59:21 UTC
can you please post the contents of your /etc/sysconfig/kdump file, and do you
by any chance know the last kernel that kexec-tools-1.101-51 did happen to work
with?

Comment 2 Zing 2007-05-03 23:57:49 UTC
no problem.  all the vars in /etc/sysconfig/kdump were empty:

KDUMP_KERNELVER=""
KDUMP_COMMANDLINE=""
KEXEC_ARGS=""

the kdump kernels available to me to test with were:

kernel-kdump-2.6.20-1.2948.fc6
kernel-kdump-2.6.20-1.2944.fc6
kernel-kdump-2.6.20-1.2933.fc6
kernel-kdump-2.6.18-1.2798.fc6

I tested each out by installing them, editing the $KDUMP_KERNELVER, and then
doing a "server kdump start".

all the 2.6.20 kernels fail with "Cannot determine file type..."
the 2.6.18 fails with "Invalid memory segment 0x1000000 - 0x1262fff"

Comment 3 Neil Horman 2007-05-04 14:33:31 UTC
Looks like our reads of section headers from the new kdump headers produces
invalid values.  Either the kernel isn't formatting the section headers
properly, or we are reading them incorrectly (my guess is the later).  

Comment 4 Neil Horman 2007-05-04 14:44:41 UTC
update to previous: Turns out it looks like its the kernel thats borked.  ourput
of readelf -S /boot/vmlinux-2.6.20-1.2933.fc6kdump indicates that the section
headers are managled in the kernel image pretty badly.  Looking into why


Comment 5 Neil Horman 2007-05-04 17:19:22 UTC
Ok, it appears that the relocatable patch set has reached the fc6 kernels, which
means the the kdump kernel for i686 is not completely an elf file (it has some
elf headers, which detect it as elf, but fail later tests), we should be able to
just use the vmlinuz file on this system now, but to do that I'll need to port
over the  --args-linux patch and the bzImage patch.  I'll get to those shortly.


Comment 6 Neil Horman 2007-05-04 17:39:00 UTC
Actually there is an easier way to do this.  Can you please update your system
to kdump-1.101-69.fc7? That should set you up to use the relocatable vmlinuz
image and include all the appropriate patches.  If that works for you I can just
do a wholesale update of the latest kexec tools into fc6.  Thanks!

Comment 7 Zing 2007-05-04 18:03:47 UTC
hmmm, no go:  /var/log/messages:

kdump: kexec: failed to load kdump kernel
kdump: failed to start up

I dug for the kexec command line and ran it by hand and straced it... i'll
attach the log (i also appended the command line to the end of the log)


Comment 8 Zing 2007-05-04 18:04:51 UTC
Created attachment 154149 [details]
strace of kexec command

Comment 9 Neil Horman 2007-05-04 18:35:46 UTC
Can you tell me a bit more about this system?  Is it an x86 or an x86_64 system?
 How much memory is there in this system?

Comment 10 Zing 2007-05-04 19:09:14 UTC
this is an intel xeon dual processor blade (8843-E9U) in an IBM Bladecenter H
(8852-4XU).  It's got ~6 GB of memory (/proc/meminfo).  I tried both
kernel-2.6.20-1.2948.fc6 and kernel-PAE-2.6.20-1.2948.fc6, but both fail in the
same way (AFAICT).

If you'd like more detailed output from dmesg et. al. I can get that for you.


Comment 11 Zing 2007-05-04 19:12:55 UTC
forgot to add... I just tried the fc7 kexec-tools on a fc6 desktop x86 IBM
thinkcentre (~512MB memory) I have here and that loaded kdump "OK".  

Comment 12 Neil Horman 2007-05-04 19:30:31 UTC
Yeah, I figured.  You're xeon is an x86_64 machine, it apparently still needs
the kdump (vmlinux) file to boot, rather than the vmlinuz file.  Nominally this
is automatically set in /etc/sysconfig/kdump, but the upgrade process may be a
bit broken in the spec file.  Can you please check /etc/sysconfig/kdump and make
sure that KDUMP_IMG is set to vmlinux rather than vmlinuz?

Thanks!
Neil


Comment 13 Zing 2007-05-04 20:45:20 UTC
sorry, no go.  got back the "Cannot determine file type..." error.

kexec-tools-1.101-69.fc7

/etc/sysconfig/kdump:

KDUMP_KERNELVER="2.6.20-1.2948.fc6kdump"
KDUMP_COMMANDLINE=""
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1"
KEXEC_ARGS=" --args-linux"
KDUMP_BOOTDIR="/boot"
KDUMP_IMG="vmlinux"
KDUMP_IMG_EXT=""

kexec command:

$ /sbin/kexec  --args-linux --elf64-core-headers -p --command-line="ro
root=LABEL=/ irqpoll maxcpus=1"
--initrd=/boot/initrd-2.6.20-1.2948.fc6kdumpkdump.img
/boot/vmlinux-2.6.20-1.2948.fc6kdump
Cannot determine the file type of /boot/vmlinux-2.6.20-1.2948.fc6kdump


Comment 14 Neil Horman 2007-05-07 19:51:12 UTC
interesting.  Ok, I was wrong before, you do need the kernel-kdump packages'
vmlinux file to load with kdump on x86_64.  I'm currently testing with
1.2929.fc6 here and after adding some debug code to kexec I've noticed something
interesting, we're getting permission denied errors when trying to open
/proc/kcore, even with selinux disabled, and running as root.  Looking at the
2.6.20-1.2949.fc6 code base I see that the open_kdump function has been modified
from upstream in the devmem patch such that open_kdump unilaterally returns
-EPERM, meaning the file is forever unreadable.  Davej, can you comment on why
this was done?  Is it in error, or do I need to find another way for kexec to
grab the info it needs about the running kernel?  Thanks much! 

Comment 15 Dave Jones 2007-05-15 17:22:56 UTC
you mean open_kcore I assume ?

I'm not sure why it was done that way.
It was done before we added kdump support, so I guess it just wasn't planned ahead.

For similar reasons, 'crash' broke because of this, and instead of making
/proc/kcore read-only, we added a separate read-only /dev/crash


Comment 16 Neil Horman 2007-05-16 13:40:05 UTC
Yes, sorry open_kcore is what I meant.  so we should be reading /dev/crash
instead.  I'll put a patch together to that end.  Thanks Dave.

Comment 17 Neil Horman 2007-05-16 14:24:03 UTC
Created attachment 154826 [details]
test srpm for x86_64

Heres a srpm with a fix for the problem dave described. Please build/test it
out and let me know if it solves your remaining problem.  Thanks!

Comment 18 Zing 2007-05-17 19:09:20 UTC
I still get 'Cannot determine the file type...", but could it be because I'm
running fedora x86 on this machine?  my uname -m returns 'i686' and the compile
of the srpm only builds the x86 arch tree on this machine.  I also don't see an
open of "/dev/crash" in an strace of the kexec.  

Comment 19 Neil Horman 2007-05-17 20:21:01 UTC
um, yeah.  I wish I'd known that previously.  So are you telling me that you're
running an x86 kernel with an x86 kexec-tools package, but you're trying to get
kexec-tools to load a 64 bit kernel from the x86_64 kernel-kdump package?  Thats
not going to work.  I'd suggest we go back to the fc7 kexec pacakge, make sure
that kdump points to the running vmlinuz image (which it should by default if
you deconfigure everything that you had configured before in the sysconfig file,
and then capture from the startup script the errors tha kexec spits out.  Thats
the case that we need to fix here.  Using a 32 bit kexec-tools package to load a
64 bit kdump kernel isn't going to fly.

Comment 20 Zing 2007-05-18 00:20:14 UTC
sorry for confusion. I'm running an x86 kernel with x86 kexec-tools, and was
trying to load the x86 kernel-kdump.

# uname -a
Linux host-new 2.6.20-1.2948.fc6PAE #1 SMP Fri Apr 27 19:30:08 EDT 2007 i686
i686 i386 GNU/Linux

Ok, I restart back to the fc7 kexec-tools package:

# rpm -q kexec-tools
kexec-tools-1.101-69.fc7
# rpm --qf '%{ARCH}' -q kexec-tools
i386

contents of /etc/sysconfig/kdump:
KDUMP_KERNELVER=""
KDUMP_COMMANDLINE=""
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1"
KEXEC_ARGS=" --args-linux"
KDUMP_BOOTDIR="/boot"
KDUMP_IMG="vmlinuz"
KDUMP_IMG_EXT=""

$ service kdump start
Detected /etc/kdump.conf or /boot/vmlinuz-2.6.20-1.2948.fc6PAE change
Rebuilding /boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
Starting kdump:                                            [FAILED]

error in /var/log/messages:
kdump: kexec: failed to load kdump kernel
kdump: failed to start up

here is the kexec command the script tried to run which returns an error:
# /sbin/kexec --args-linux --elf64-core-headers -p --command-line="ro
root=LABEL=/  irqpoll maxcpus=1"            
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE 
Cannot load /boot/vmlinuz-2.6.20-1.2948.fc6PAE

Next, I try the following in /etc/sysconfig/kdump:
KDUMP_KERNELVER="2.6.20-1.2948.fc6kdump"
KDUMP_KERNELVER=""
KDUMP_COMMANDLINE=""
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1"
KEXEC_ARGS=" --args-linux"
KDUMP_BOOTDIR="/boot"
KDUMP_IMG="vmlinux"
KDUMP_IMG_EXT=""

# service kdump start
Detected /etc/kdump.conf or /boot/vmlinux-2.6.20-1.2948.fc6kdump change
Rebuilding /boot/initrd-2.6.20-1.2948.fc6kdumpkdump.img
Starting kdump:                                            [FAILED]

error in /var/log/messages:
kdump: kexec: failed to load kdump kernel
kdump: failed to start up

here is the kexec command that tries to run with following error:
# /sbin/kexec  --args-linux --elf64-core-headers -p --command-line="ro
root=LABEL=/  irqpoll maxcpus=1"      
--initrd=/boot/initrd-2.6.20-1.2948.fc6kdumpkdump.img
/boot/vmlinux-2.6.20-1.2948.fc6kdump
Cannot determine the file type of /boot/vmlinux-2.6.20-1.2948.fc6kdump


Comment 21 Neil Horman 2007-05-18 14:06:17 UTC
Ok, Since we doing x86 here, rather than x86_64, I re-set this up locally and it
appears that we're back to what I was seeing in comment #4.  We need to
understand why thats happening.

Comment 22 Neil Horman 2007-05-18 14:29:56 UTC
No, scratch that last comment.  Man this gets confusing.  With the addition of
the relocatable packages, if you are using x86 kernel images, you don't need to
use the kdump images at all.  And it appears that you are running the kdump
kernel as your normal kernel (as is evidenced by the fact that your initrd is
reading as initrd-2.6.20-1.2948.fc6kdumpkdump.img, rather than
initrd-2.6.20-1.2948.fc6kdump.img).  Please don't do that.  Just boot the normal
smp/up kernel that fc6 offers.  I just tried that here with the fc7 kexec-tools
(which is required now that we can do relocatability on fc6).

I'm going to try booting the PAE kernel here to see what that does.  My guess is
that it isn't working because fc7 needs an el5 patch that I never got around to
forward porting to force 64 bit elf headers.  I can fix that shortly if thats
the case, and move all the fc7 code to fc6.


Comment 23 Neil Horman 2007-05-18 14:45:30 UTC
Results from me running kdump on on my x86 fc6 PAE kernel here using
kexec-tools-1.101-69.fc7

[root@hmssabre ~]# /sbin/service kdump start
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.20-1.2952.fc6PAEkdump.img
Starting kdump:                                            [  OK  ]

So at this point both the normal kernel and the PAE kernel are inserting fine
for me with the fc7 kexec-tools package, and the kdump kernel needs to be
discontinued for x86, so it can be ignored.  I'm beginning to think that all our
reinstalls have left your configuration in some unstable state.  I'd suggest
erasing your kexec-tools package entirely (rpm -e), manually removing the
/etc/kdump and /etc/sysconfig/kdump packages, and then installing the fc7
kexec-tools package (and make sure its -69.fc7, which should be latest).  then
restarting the service.  It should just work, as it just did for me.

Comment 24 Zing 2007-05-18 15:55:12 UTC
ok, I removed kexec-tools, kernel-kdump and cleaned out all the config files,
all kdump files, rebooted, started fresh with the latest fc7 kexec-tools:

Using the default installed /etc/sysconfig/kdump i get:
# service kdump start
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
Starting kdump:                                            [FAILED]

/var/log/messages:
kdump: kexec: failed to load kdump kernel
kdump: failed to start up

the kexec command line:
/sbin/kexec  --args-linux --elf64-core-headers -p             
--command-line="ro root=LABEL=/
 irqpoll maxcpus=1"            
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE
Cannot load /boot/vmlinuz-2.6.20-1.2948.fc6PAE

I tried the non-PAE kernel and I get same error.  If it's any help, I tried the
load option to kexec and that seemed to work:

# /sbin/kexec --args-linux --elf64-core-headers -l --command-line="ro
root=LABEL=/ irqpoll maxcpus=1"            
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE
# cat /sys/kernel/kexec_loaded
1
# /sbin/kexec --args-linux --elf64-core-headers -u --command-line="ro
root=LABEL=/ irqpoll maxcpus=1"            
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE
# cat /sys/kernel/kexec_loaded
0
# cat /proc/cmdline
ro root=LABEL=/ crashkernel=128@16M

Don't know why -p wouldn't work... I noticed you have a 2.6.20-1.2952.fc6PAE
version kernel, but I can't find that on the download server... don't know if
that matters though.


Comment 25 Neil Horman 2007-05-18 17:13:58 UTC
Created attachment 155015 [details]
latest fc6 kernel

here you go.  Give it a shot and let me know how it works.  Both it and the PAE
version worked fine for me.

Comment 26 Zing 2007-05-18 19:27:20 UTC
no go...

# service kdump start
Detected /etc/kdump.conf or /boot/vmlinuz-2.6.20-1.2952.fc6 change
Rebuilding /boot/initrd-2.6.20-1.2952.fc6kdump.img
Starting kdump:                                            [FAILED]

from kexec run manually:
Cannot load /boot/vmlinuz-2.6.20-1.2952.fc6

i see the crash memory:
# cat /proc/iomem
00000000-0009d3ff : System RAM
0009d400-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c8fff : Video ROM
000c9000-000ccfff : Adapter ROM
000f0000-000fffff : System ROM
00100000-d7fb057f : System RAM
  00400000-00623830 : Kernel code
  00623831-00745493 : Kernel data
  01000000-08ffffff : Crash kernel
d7fb0580-d7fcffff : ACPI Tables
d7fd0000-d7ffffff : reserved
db000000-dcffffff : PCI Bus #04
  db000000-dcffffff : PCI Bus #05
    dcfe0000-dcfeffff : 0000:05:01.1
      dcfe0000-dcfeffff : tg3
    dcff0000-dcffffff : 0000:05:01.0
      dcff0000-dcffffff : tg3
dd000000-deffffff : PCI Bus #02
  defe0000-defeffff : 0000:02:01.0
  deff0000-deffffff : 0000:02:01.0
e0000000-efffffff : reserved
f0000000-f7ffffff : PCI Bus #01
  f0000000-f7ffffff : 0000:01:01.0
f8000000-f8ffffff : PCI Bus #01
  f8000000-f800ffff : 0000:01:01.0
  f8020000-f803ffff : 0000:01:01.0
f9000000-f90fffff : PCI Bus #02
  f9000000-f90fffff : 0000:02:01.0
f9100000-f91003ff : 0000:00:1f.1
f9100400-f910040f : 0000:00:1d.4
  f9100400-f910040f : i6300ESB timer
fec00000-ffffffff : reserved
100000000-1a7ffffff : System RAM

... and doing kexec -l runs without errors.

Comment 27 Neil Horman 2007-05-21 18:07:45 UTC
Well, I'm not sure what to tell you.  Its pretty clear when you say that -p
doesn't work but -l does, that there is one extra path in do_bzImage_load that
we go down, which must be where our error is occuring.  Oddly enough the error
doesn't occur for me on the latest kernel, and I can't imagine why it would for
you.  Please run this coimmand line manually again (modified from your comment #24):
/sbin/kexec  --args-linux --elf64-core-headers -d -p             
--command-line="ro root=LABEL=/
 irqpoll maxcpus=1"            
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE

The extra -d should give us extra debug information about why your system is
failing, and will hopefully explain why yours is failing and mine is not.  Thanks

Comment 28 Zing 2007-05-23 14:56:14 UTC
ok, here's the output, i needed to recompile with -DDEBUG, but I got this:

# /sbin/kexec  --args-linux --elf64-core-headers -d -p --command-line="ro
root=LABEL=/ irqpoll maxcpus=1" --initrd=/boot/initrd-2.6.20-1.2948.fc6.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE
bzImage is relocatable
Created backup segment at 0x8f60000
Created elf header segment at 0x8ffc000
Cannot load /boot/vmlinuz-2.6.20-1.2948.fc6PAE


Comment 29 Zing 2007-05-23 15:01:51 UTC
oops, here's the exact command you wanted to run... looks the same though:

# /sbin/kexec  --args-linux --elf64-core-headers -d -p --command-line="ro
root=LABEL=/ irqpoll maxcpus=1"
--initrd=/boot/initrd-2.6.20-1.2948.fc6PAEkdump.img
/boot/vmlinuz-2.6.20-1.2948.fc6PAE
bzImage is relocatable
Created backup segment at 0x8f60000
Created elf header segment at 0x8ffc000
Cannot load /boot/vmlinuz-2.6.20-1.2948.fc6PAE



Comment 30 Zing 2007-05-23 15:13:18 UTC
ok, i got success if I rebooted with "mem=1024M"... so this has something to do
with memory size.  I'll attach the successful debug run...

Comment 31 Zing 2007-05-23 15:14:27 UTC
Created attachment 155257 [details]
dump of debug run that worked

this worked with kernel "mem=1024M"

Comment 32 Neil Horman 2007-05-23 20:14:03 UTC
Interesting, that would explain why its working for me.  How much ram do you
have in this box.  I wonder if your ram size is overflowing the 32 bit pointers
we use for storing physical addresses.

Comment 33 Zing 2007-05-23 20:44:41 UTC
it's got 6GB... it's the machine in comment #10.  Here's what /proc/meminfo
looks like:

# cat /proc/meminfo
MemTotal:      6086708 kB
MemFree:       5899600 kB
Buffers:         15552 kB
Cached:         120052 kB
SwapCached:          0 kB
Active:          45440 kB
Inactive:       114044 kB
HighTotal:     5373632 kB
HighFree:      5224340 kB
LowTotal:       713076 kB
LowFree:        675260 kB
SwapTotal:     2031608 kB
SwapFree:      2031608 kB
Dirty:            1188 kB
Writeback:           0 kB
AnonPages:       23836 kB
Mapped:           8752 kB
Slab:            14920 kB
SReclaimable:     3744 kB
SUnreclaim:      11176 kB
PageTables:       1736 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   5074960 kB
Committed_AS:   164936 kB
VmallocTotal:   116728 kB
VmallocUsed:      3404 kB
VmallocChunk:   112700 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

fyi, it didn't matter if I was running the PAE or non-PAE kernel, kexec still
failed.  Only when I reduced the size with mem= kernel parameter did kexec succeed.

Comment 34 Neil Horman 2007-05-24 12:19:28 UTC
hmm, I wonder if there is going to be much we can do about this...

Do you have 64 bit libraries installed on this system, or is it only 32 bit.  Is
it possible for you to install the 64 bit kexec-tools package on this system?

Comment 35 Zing 2007-05-24 16:52:31 UTC
hmmm.  this only has 32 bit libraries.  When you mean install the 64 bit
kexec-tools, you mean install the x86_64 distribution right?  I can do this, if
the current 32 bit install is a dead end.

Comment 36 Neil Horman 2007-05-24 17:45:28 UTC
you shouldn't need the entire 64 bit distro, just the 64 bit kexec package and
its supporting libraries.  You should just need to install the 64 bit glibc, and
64 bit zlib, whcih should be able to co-exist with their 32 bit counterparts.

Comment 37 Zing 2007-05-24 20:07:11 UTC
1. hmm, doesn't look like rpm is happy about that:

# rpm -ivh kexec-tools-1.101-69.fc7.x86_64.rpm glibc-2.5-10.fc6.x86_64.rpm
zlib-1.2.3-3.x86_64.rpm
Preparing...                ########################################### [100%]
        package glibc-2.5-10.fc6 is intended for a x86_64 architecture
        package glibc-2.5-10.fc6 is already installed
        package zlib-1.2.3-3 is intended for a x86_64 architecture
        package zlib-1.2.3-3 is already installed
        package kexec-tools-1.101-69.fc7 is intended for a x86_64 architecture
        file /sbin/ldconfig from install of glibc-2.5-10.fc6 conflicts with file
from package glibc-2.5-10.fc6
...

i'm hesitant to --force install that.  If you have thoughts on that I can try
them...

2. In an effort to see if I could get this to work on a pure x86_64 install, I
installed fc6 x86_64 on another identical blade.  unfortunately I got this from
the srpm package in comment 17 :

kernel: crash memory driver: !page_is_ram(pfn: 0)
kdump: read on /dev/crash of 4096 bytes failed: Bad address Cannot read
/dev/crash: Bad address Cannot load /boot/vmlinux-2.6.20-1.2948.fc6kdump
kdump: kexec: failed to load kdump kernel
kdump: failed to start up

sorry to bring up two issues, i can open a new bugreport if you'd like.


Comment 38 Neil Horman 2007-05-24 20:22:34 UTC
crud, you're right, in order to have 32 and 64 bit user space packaages, you
need to be runniing the 64 bit kernel (since the 32 bit kernel can't handle 64
bit args in the system calls).  Can you install and run the 64 bit kernel?

Regarding your other problem, yes, if you could open up another bug please
against the kernel, that would be appreciated.  Its seeming more and more like
the relocatable changes have just completely borked kexec ability to function in
fc6.  We can't read /proc/kcore, and we can't read /dev/crash, for what looks to
be an adressing problem.

Comment 39 Zing 2007-05-25 13:53:42 UTC
Doesn't look like that's possible, i get:

# rpm -ivh kernel-2.6.20-1.2948.fc6.x86_64.rpm
Preparing...                ########################################### [100%]
        package kernel-2.6.20-1.2948.fc6 is intended for a x86_64 architecture

Comment 40 Neil Horman 2007-05-25 14:59:51 UTC
No, Its possible, I'm sure of that.  grrrr.  I'm starting to wonder if this is
actually 64 bit hardware you're working with.

I'll see if I can locate an x86_64 system here with more than 4GB of ram to
recreate this with.



Comment 42 Neil Horman 2007-05-29 20:15:47 UTC
Ok, I think I've got your problem re-created.  The segment map seems somehow
wrong when we have more than 1GB of RAM in a system, and it leads to the
elfcorehdr pointer in the x86 load_crashdump_segments() function pointing
outside the range defined in the memap_p array.  This in turn leads to the
subsequent delete_memmap operation failing, which leads to kexec failure.  I
need to figure out why thats happening.

Comment 44 Neil Horman 2007-05-30 18:24:48 UTC
Created attachment 155721 [details]
patch to retry locate_hole if there is a segment overlap and to limit elfcorehdr buffer to max mmap end

Ok, I think I've found a solution, at least to the problem that I recreated
here locally.  It should avoid the deletion of non-existant memory maps and
overlapping segments problems that I've found.	Please build it and give it a
try.  Thanks!

Comment 45 Zing 2007-05-30 20:48:03 UTC
Yay!

# service kdump start
bzImage is relocatable
Created backup segment at 0x8f60000
Created elf header segment at 0x8f5b000
Loaded purgatory at addr 0x8f52000
Loaded real-mode code and command line at 0x1000000
Loaded 32bit kernel at 0x1400000
initrd_addr_max is 0x1fffffff
Loaded initrd at 0x8ca5000 size 0x2aca85
Starting kdump:                                            [  OK  ]

Comment 46 Neil Horman 2007-05-31 12:14:41 UTC
do me a favor and make sure that you can still kexec boot on crash using this
patch (my test system is unavailable at the moment).  Thanks!

Comment 47 Zing 2007-05-31 14:50:16 UTC
Ok, I'm not sure if I'm doing this right but:

# echo "c" > /proc/sysrq-trigger

i can see the kexec kernel boot up, at the point where kdump gets started that
fails (this may be normal when booting kexec kernel?):

/var/log/messages:
kernel: Kernel command line: ro root=LABEL=/  irqpoll
maxcpus=1 memmap=exactmap memmap=640K@0K memmap=130412K@16384K elfcorehdr=146796
K
May 31 10:30:03 xxx-new kernel: Misrouted IRQ fixup and polling support enable
d
May 31 10:30:03 xxx-new kdump: No crashkernel parameter specified for running
kernel
May 31 10:30:03 xxx-new kernel: This may significantly impact system performan
ce
May 31 10:30:03 xxx-new kdump: failed to start up

my /etc/kdump.conf:
ext3 /dev/mainvg/varlv

and my /var/crash directory is empty (an image file of vmcore should be there?),
but the system is up and running in the kexec kernel AFAIK.

here are some lines in the kexec boot up which look suspicious (maybe):

Activating logical volumes
  7 logical volume(s) in volume group "mainvg" now active
dm_task_set_name: Device /dev/mapper/-optlv not found
Command failed
dm_task_set_name: Device /dev/mapper/-tmplv not found
Command failed
dm_task_set_name: Device /dev/mapper/-usrlv not found
Command failed
dm_task_set_name: Device /dev/mapper/-varlv not found
Command failed
dm_task_set_name: Device /dev/mapper/-wwwlv not found
Command failed
Saving to the local filesystem /dev/mainvg/varlv
e2fsck 1.38 (30-Jun-2005)
/dev/mainvg/varlv: recovering journal
/dev/mainvg/varlv: clean, 548/2048256 files, 126227/2048000 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
cp: /proc/vmcore: No such file or directory
Creating root device.
Checking root filesystem.
fsck 1.38 (30-Jun-2005)
fsck: WARNING: couldn't open /etc/fstab: No such file or directory
e2fsck 1.38 (30-Jun-2005)
/: recovering journal
/: clean, 8414/1025024 files, 129369/1024135 blocks
Mounting root filesystem.
fsck: WARNING: couldn't open /etc/fstab: No such file or directory
fsck 1.38 (30-Jun-2005)
fsck: WARNING: couldn't open /etc/fstab: No such file or directory
e2fsck 1.38 (30-Jun-2005)
/: clean, 8414/1025024 files, 129369/1024135 blocks
mount -t ext3 /dev/sda1 /sysroot
kjournald starting.  Commit interval 5 seconds
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Switching to new root and running init.

Let me know, i can test/try out other things.

Comment 48 Neil Horman 2007-05-31 15:17:32 UTC
Well, the kdump kernel boots, so thats good.  This line is fatal however:

cp: /proc/vmcore: No such file or directory

Thats where the core is read from.  This suggests that something is just not set
right in the kernel, as that file has to be there for kdump to do anything.  I
assume this is on your 64 bit hardware.  Does this same kexec-tools package that
your testing work on your 32 bit hardware?

Also, if you could add this line to /etc/kdump.conf
default_action shell

and restart/restet the service that would be great.  Doing so should drop you to
a shell in the kdump initrd, which you can use to manually verify that your
target filesystem is mounted, and that /proc/vmcore does or does not exists.  If
you could do that I would appreciate it.  Thanks!

Comment 49 Zing 2007-05-31 17:01:04 UTC
Ok, /proc/vmcore was missing on both my 64bit and a test 32bit system i just setup.

On the 32bit system I could drop into the shell and see that /proc/vmcore was
indeed not there (ps/2 attached keyboard).

On my 64bit system the keyboard is USB attached and I lose all keyboard input
access when dropping into the shell.  So I do get keyboard input without
"default shell" when the kexec kernel/initscripts run to completion, but not
when dropping into the kdump/busybox shell... not sure what's going on there... 

the messages are the same between the two... "cp: /proc/vmcore: No such file or
directory"

fyi, also the 32bit system had the kexec-tools-1.101-69.fc7 installed without
your kexec-x86.patch.  It has only ~600MB of ram.

Comment 50 Neil Horman 2007-05-31 17:56:56 UTC
Fantastic, so FC6 has another problem in the kernel then.  

Ok, here's what we should do, I think.

1) I'm comfortable with our testing that kexec is working properly with this
patch, so I'm going to check it in, and close this bz

2) The other bz that you opened (bz 241362) I am going to close, since it was
the result of a patch that is not in the build, and we didn't wind up using it.

3) The current problem with /proc/vmcore: please open a bug on that, it should
come to me, and I'll investigate why its not getting registered in the kernel.

Thanks

Comment 51 Zing 2007-05-31 19:10:10 UTC
ok, 

I opened up bug #241924 for 3.  For point 2, I think we need that still open,
because kexec-tools-1.101-69.fc7.x86_64 fails... I'll add that info.  Thanks.

Comment 52 Neil Horman 2007-05-31 20:00:11 UTC
Yeah, you're right, I forgot the reason behind why we pursued #2.  I'll use the
open bug to track that.  Thanks.

This patch is checked into -56.fc6, which will be pushed in the next few days. 
Thanks!

Comment 53 Neil Horman 2007-05-31 20:26:30 UTC
*** Bug 220143 has been marked as a duplicate of this bug. ***

Comment 54 Bug Zapper 2008-04-04 07:10:06 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 55 Bug Zapper 2008-05-06 19:33:46 UTC
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.