Bug 461000

Summary: two dumps are captured when default action is set as halt.
Product: Red Hat Enterprise Linux 5 Reporter: Hiromitsu KIKUCHI <kikuchi.hiromitsu>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: benl, hfuchi, lwang, mgahagan, nhorman, qcai, varekova
Target Milestone: beta   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-13 10:42:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the console log of the 2nd kernel.
none
the 2nd kernel was not dropped to a shell. none

Description Hiromitsu KIKUCHI 2008-09-03 09:00:49 UTC
Description of problem:

  I configured kdump.conf with the following contents:
  -------------------
  ext3    /dev/sda12   # dump partition
  path    /crash
  default halt
  -------------------

  I expected that the 2nd kernel would halt as soon as a dump had finished,
  but it didn't.
  As a result, two dumps were captured.

Version-Release number of selected component (if applicable):

  kexec-tools-1.102pre-21.el5

How reproducible:

  Always. (with above config)

Steps to Reproduce:

  1. Configure kdump.conf with halt as the default action.
  2. Run "service kdump restart".
  3. Make the system panic.

Actual results:

  The 2nd kernel does not halt until it mounts the root filesystem 
  and starts up the kdump service.

Expected results:

  The 2nd kernel is halted as soon as the dump has been finishes.

Additional info:

  As the result of the above config, the 2nd kernel proceeded as below:
  (1) dumped to the specified partition. (= /dev/sda12:/crash)
  ... (dump was finished) ...
  (2) `halt -f` was called in the 2nd kernel's initrd, but the system did not halt.
      (Therefore, the initrd proceeded with following script.)
  (3) mounted the root filesystem.
  (4) chroot to the root filesystem, and /sbin/init was called.
  (5) a number of system services started up.
  (6) (As the one of above services) kdump service was started up.
  (7) dumped to the non-specified partition (= /dev/sdaXX:/crash, /dev/sdaXX is root filesystem.)
  (8) system halted. (as <FINAM_ACTION> in /etc/rc.d/init.d/kdump)

I would like to know how to avoid this.
I'd like to halt the system at (2) regardless of success or failure, 
and I'd not like to mount the root filesystem at the 2nd kernel.

I found `halt -f` called in the 2nd kernel is produced by busybox package.
Therefore, it may be said that it's a feature about busybox.

best regards,

Comment 1 Neil Horman 2008-09-03 10:35:26 UTC
Can you please provide a capture of the serial console on this system taken during the dump capture process?  I'd like to examine the error and subsequent behavior of kdump in these conditions. Thank you.

Comment 2 Hiromitsu KIKUCHI 2008-09-04 05:21:21 UTC
Created attachment 315720 [details]
the console log of the 2nd kernel.

Thank you for your reply.
I attached the console log of the 2nd kernel.

BTW, 

>   (7) dumped to the non-specified partition (= /dev/sdaXX:/crash, /dev/sdaXX is root filesystem.)

/dev/sdaXX:/crash should have been /dev/sdaXX:/var/crash.
(the dumpdir is specified with coredir="/var/crash/<date> in the kdump service script.)

This time, I tried to reproduce the problem with the following config:
  -------------------
  core_collector makedumpfile -c -d 1
  ext3    /dev/sda12   # dump partition
  path    /crash
  default halt
  -------------------

I think the system attempted to halt with the following scripts, but it could not halt.

  the init script in initrd.kdump:
  --------------------------------
  ... snip ...
  [ $exitcode == 0 ] && halt -f
  halt -f
  echo Creating root device.
  ...
  ... snip ...
  --------------------------------

After dump and system reboot, I made sure that the two dumps were captured as follows.

[root@kikutiplex755]~# mount -t ext3 /dev/sda12 /mnt/test
[root@kikutiplex755]~#
[root@kikutiplex755]~# ls -l /mnt/test/crash/
drwxr-xr-x 2 root root 4096 Sep  4 11:09 127.0.0.1-2008-09-04-11:08:51
[root@kikutiplex755]~# ls -l /mnt/test/crash/127.0.0.1-2008-09-04-11:08:51
-rw------- 1 root root 282634982 Sep  4 11:09 vmcore
[root@kikutiplex755]~#
[root@kikutiplex755]~# ls -l /var/crash
drwxr-xr-x 2 root root 4096 Sep  4 11:10 2008-09-04-11:10
[root@kikutiplex755]~# ls -l /var/crash/2008-09-04-11:10
-r-------- 1 root root 4018638936 Sep  4 11:10 vmcore

regards,

Comment 3 Neil Horman 2008-09-04 11:02:05 UTC
looks like I may need to update the shutdown utility in busybox.  Out of curiosity does the system reboot properly with one core if you specify the default action as nothing instead of halt?

Comment 4 Hiromitsu KIKUCHI 2008-09-04 12:59:56 UTC
Created attachment 315746 [details]
the 2nd kernel was not dropped to a shell.

Yes, it rebooted properly and the dump was captured once.
Should I reported this as a busybox problem ?

BTW:
When I specified the default action as nothing and the dump failed
(eg. the size of the dump partition was not enough.),
the 2nd kernel mounted root filesystem, then it rebooted as <FINAL_ACTION> 
in kdump script.
(Thus, the 2nd kernel was not dropped to a shell. I attached this console log.)
It seem not to be a feature according to kexec-kdump-howto.txt.

best regards,

Comment 5 Neil Horman 2008-09-04 13:53:36 UTC
If you didn't erase your previous dumps, then you likely are out of space on your capture partition.  You need to remove the previous captures.

As for the default action, rebooting is the default, you need to specify shell if you want to drop to a shell in the default_action.

I'll re-assign this to busybox to have the halt command looked at.

Comment 6 Qian Cai 2008-10-10 07:46:57 UTC
It looks like busybox halt command won't work from a Kdump initramfs. I confirmed it works from a shell. I added the following debug output in the initramfs,

ls -l `which halt`
echo "halt"
halt
echo "halt -n"
halt -n

The output is,

lrwxrwxrwx    1 root     0    7 Oct 10 03:31 /sbin/halt -> busybox
halt
halt -n

None of above works. It could also be reproduced by the following Kdump configuration file,

ext3 LABEL=/boot
default halt

We use a small boot partition to simulate a failure of saving the vmcore,

Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
Activating logical volumes
  2 logical volume(s) in volume group "VolGroup00" now active
Saving to the local filesystem LABEL=/boot
e2fsck 1.38 (30-Jun-2005)
/boot: recovering journal
/boot: clean, 36/26104 files, 18134/104388 blocks
cp: Write Error: No space left on device
md: stopping all md devices.
System halted.
Creating root device.
Checking root filesystem.
fsck 1.38 (30-Jun-2005)
... enter INIT ...

Comment 9 Ben Levenson 2008-10-10 15:21:37 UTC
oops. fixing the needinfo flag. sorry for the confusion.

Comment 10 Qian Cai 2008-10-13 05:05:52 UTC
I doubt this is a busybox bug. It is a known issue,

https://bugzilla.redhat.com/show_bug.cgi?id=413921

Ivan said it is not a Kernel bug either, so is it working as expected?
If so, we'll probably need to modify kexec-tools to use the right
command to halt the system. 

If I modified /sbin/mkdumprd to replace "halt -f" to "poweroff -f". It could halt the system.

...
Saving to the local filesystem /dev/sda1
e2fsck 1.38 (30-Jun-2005)
/boot: recovering journal
/boot: clean, 42/26104 files, 27988/104388 blockkjournald starting.  Commit interval 5 seconds
s
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
cp: Write Error: No space left on device
md: stopping all md devices.
Synchronizing SCSI cache for disk sda: 
sd 0:0:0:0: [sda] Stopping disk
Power down.
acpi_power_off called

Comment 11 Neil Horman 2008-10-13 10:42:03 UTC
Ivanas comments simply don't make sense to me.  Halt halts the system.  It does not say its halting the system , followed by a return to the console prompt without actually halting the system.  To do so is broken  I've re-opened ivans bug on the subject and am closing this as a dupe of that.  Quite simply, halt has to do what halt says it will do.

*** This bug has been marked as a duplicate of bug 413921 ***