815582 – v7 test suite kdump.py regex for vmcore path != path specified by /etc/init.d/kdump

Bug 815582 - v7 test suite kdump.py regex for vmcore path != path specified by /etc/init.d/kdump

Summary: v7 test suite kdump.py regex for vmcore path != path specified by /etc/init.d...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Hardware Certification Program
Classification:	Retired
Component:	Test Suite (tests)
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Caspar Zhang
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	856039
TreeView+	depends on / blocked

Reported:	2012-04-24 00:15 UTC by Gregory Fowler
Modified:	2018-11-30 22:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-02-01 18:20:12 UTC
Embargoed:

Attachments	(Terms of Use)
kdump initrd (3.94 MB, application/octet-stream) 2012-05-08 19:59 UTC, Gregory Fowler	no flags	Details
result (8.87 MB, application/x-rpm) 2012-05-08 20:00 UTC, Gregory Fowler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0222	0	normal	SHIPPED_LIVE	v7 bug fix and enhancement update	2013-02-01 23:18:53 UTC

Description Gregory Fowler 2012-04-24 00:15:45 UTC

Description of problem:
During testing of kdump we find that
if I leave the path as /var/crash and do not specify a UUID or Label or blockdev then the kdump test does not pass.

Version-Release number of selected component (if applicable):
The very latest

How reproducible:
100%

Steps to Reproduce:
1. install v7
2. enable kdump
3.configure kdump to dump to /var/crash, do not specify UUID, blockdev or LABEL
4. execute the kdump test
5. cd /var/crash
6. You cannopt find [ipaddress]-YYY-MM-DD-HH:MM:SS whichthe kdump.py script looks for.

Actual results:
1 You can only find what the /etc/init.d/kdump script created:
same as above but no prefix: "[ipaddress]-" and no suffix: ":SS"

Expected results:

When the kdump test suite is run. it finds the vmcore at the default location created by kdump service. kdump service does not normally create
dirs that match /ipadd-YYYY-MM-DD-HH:MM:SS/

Additional info:

In the kdump.conf if one simple dumps to /var/crash
and does not specify blockdev, label, uuid , etc. you get the path to vmcore that matches the init.d service.

This is true 100%.All of os here get this failure.
many folks have learned to get around this failure in v7 by using some kind of undocumented workaround built into makedumpfile binary. But no one has requested documentation of this or shared with others until I asked about it.

There seems to be an undocumented override to /etc/init.d/kdump script path.

But in any case it is very clear that the regex in kdump.py does not match what the defaults are in /etc/init.d/kdump script.

Comment 1 Gregory Fowler 2012-04-24 00:21:32 UTC

If you enable kdump with these 2 config files and dump core.
there is no way kdump.py from v7 will pass .
Dump core and look in /var/crash. you will see
YYYY-mm-dd-hh:mm  just like the service script says and unlike the kdump.py looks for.


/etc/kdump.conf
    #raw /dev/sda5
    #ext4 /dev/sda3
    #ext4 LABEL=/boot
    #ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937
    #net my.server.com:/export/tmp
    #net user.com
    #path /var/crash
    #core_collector cp --sparse=always
    #extra_bins /bin/cp
    #link_delay 60
    #kdump_post /var/crash/scripts/kdump-post.sh
    #extra_bins /usr/bin/lftp
    #disk_timeout 30
    #extra_modules gfs2
    #options modulename options
    #default shell
    path /var/crash
    core_collector makedumpfile -c --message-level 1 -d 31
    link_delay 60

My /etc/sysconfig/kdump   

    KDUMP_KERNELVER=""

    # The kdump commandline is the command line that needs to be passed off to
    # the kdump kernel.  This will likely match the contents of the grub kernel
    # line.  For example:
    #   KDUMP_COMMANDLINE="ro root=LABEL=/"
    # If a command line is not specified, the default will be taken from
    # /proc/cmdline
    KDUMP_COMMANDLINE=""

    # This variable lets us append arguments to the current kdump commandline
    # As taken from either KDUMP_COMMANDLINE above, or from /proc/cmdline
    KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 reset_devices cgroup_disable=memory"

    # Any additional /sbin/mkdumprd arguments required.
    MKDUMPRD_ARGS=""

    # Any additional kexec arguments required.  In most situations, this should
    # be left empty
    #
    # Example:
    #   KEXEC_ARGS="--elf32-core-headers"
    KEXEC_ARGS=""

    #Where to find the boot image
    KDUMP_BOOTDIR="/boot"

    #What is the image type used for kdump
    KDUMP_IMG="vmlinuz"

    #What is the images extension.  Relocatable kernels don't have one
    KDUMP_IMG_EXT=""

Comment 2 Gregory Fowler 2012-04-24 00:28:06 UTC

from teh kdump.py
this comment and the regex are not correct as the /etc/init.d/kdump script will show.

###KDUMP.py
...


# find the vmcore image file matching the timestamp 
# vmcore directories are like this: 127.0.0.1-2011-03-10-13:18:27
vmcoreDirectoryPattern = re.compile("(?P<ipaddr>[0-9]+\.[0-9]+\.[0-9]+)-(?P<date>[0-9]+-[0-9]+-[0-9]+)-(?P<time>[0-9]+:[0-9]+:[0-9]+)") 


###/etc/init.d/kdump
function save_core() {
 local kdump_path
 kdump_path=`grep ^path $KDUMP_CONFIG_FILE | cut -d' ' -f2-`
 if [ -z "$kdump_path" ]; then
 coredir="/var/crash/`date +"%Y-%m-%d-%H:%M"`"
 else
 coredir="${kdump_path}/`date +"%Y-%m-%d-%H:%M"`"
 fi
 mkdir -p $coredir
...
...
...

Comment 3 Gregory Fowler 2012-04-24 00:34:04 UTC

At this point of realization...  i went to makedumpfile and found it to be an ELF and not a script.
So...  did not feel like digging through sources  but  the only reason this test will pass is when users hit upon the undocumented override in the makedumpfile logic

(this is my guess)

I think there is logic in there that puts 
preformat strings of 
%host-%date-%time  
in the path to vmcore based on  things like 
 device = LABEL or UUID or blockdev

sine it is common to spec these in setting up kdump and not rely on 
path=/var/crash  on root filesystem,  you are not getting a lot of bug reports on this....

(like i said, a guess...)

Maybe tonight Ill lookat sources for makedumpfile...
but certs have me worn out.

Comment 4 Greg Nichols 2012-04-24 00:51:32 UTC

If you leave the "path" variable commented-out in kdump.conf, so it defaults to /var/crash rather than explicitly setting it, does the test pass?

Comment 5 Greg Nichols 2012-04-24 00:54:18 UTC

Also, could you attach the test logs?

Thanks!

Comment 6 Gregory Fowler 2012-04-24 01:11:26 UTC

OK  ill give it a shot. 
Some of the other folks here said they just use 
extN  LABEL  or UUID or whatever to work around this.
But I wanted to stress that multiple folks have seen this...

I have not verified that works... but during certs i create /  large enough to host full ram dump for vmcore so never bothered.
(my last cert was before kdump was written into the v7 tests)

Ill try a couple of runs and gather output.

Comment 7 Gregory Fowler 2012-04-24 02:23:12 UTC

With no "path /var/crash"
v7 continue prints this to console....

Looking for vmcore image directories under /var/crash
Error: could not locate vmcore file
copying attachments...
checking directory /var/log/v7/runs/1/kdump
Skipping output.log
saveOutput: /var/log/v7/runs/1/kdump/output.log
Return value was 1
saved to /var/v7/results.xml


yet /var/crash has this vmcore

[root@SUT ~]# find /var/crash
/var/crash
/var/crash/2012-04-23-22:04
/var/crash/2012-04-23-22:04/vmcore


[root@SUT kdump]# cat output.log
<output>
Test Parameters: OUTPUTFILE=/var/log/v7/runs/1/kdump/output.log DEVICE=local TESTSERVER=unknown RUNMODE=normal DEBUG=off UDI=
<output name="initialize">
Checking required packages:
kexec-tools-2.0.0-209.el6.x86_64
crash-5.1.8-1.el6.x86_64
kernel-debuginfo-2.6.32-220.el6.x86_64
Checking kdump configuration
Found crashkernel=256M boot parameter
Kernel panic reboot timeout is 0
Setting to 1 sec.
core_collector currently set to "makedumpfile -c -d 31"
link_delay set to 60
Checking kdump service
kdump is running
The test will now cause a kernel panic to exercise kdump
Ready to restart? (y|n) response: y
</output>
Syncing disks
Test Parameters: OUTPUTFILE=/var/log/v7/runs/1/kdump/output.log DEVICE=local TESTSERVER=unknown RUNMODE=auto DEBUG=off UDI= INCOMPLETE=1
<output name="verify">
reboot took 00:07:00
method: panic
kernel: 2.6.32-220.el6.x86_64
Apr 23 22:04:49 localhost kernel: Linux version 2.6.32-220.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011
Apr 23 22:08:23 localhost kernel: Linux version 2.6.32-220.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011
Error: system rebooted 2 times
Looking for vmcore image directories under /var/crash
Error: could not locate vmcore file
</output>
</output>

Comment 9 Rob Landry 2012-04-26 21:17:41 UTC

Can the full results package be attached which includes the above run?

Comment 13 Rob Landry 2012-04-26 21:48:09 UTC

Also a copy of the kdump initrd (/boot/initrd-2.6.32-220.el6.x86_64kdump.img).

Comment 15 Gregory Fowler 2012-05-08 19:59:28 UTC

Created attachment 583068 [details]
kdump initrd

kdump initrd

Comment 16 Gregory Fowler 2012-05-08 20:00:49 UTC

Created attachment 583069 [details]
result

Comment 17 Gregory Fowler 2012-05-08 20:06:48 UTC

Hi
requested info has been attached.

Question:
With a fresh install of ol 
reboot
config kdump in firstboot
reboot with kdump active
 and then a fresh install of v7,  
then v7 run --test kdump --device local

are you saying you do not see this mismatch between initrd scripts to create path to vmcore  vs v7 regexp for path to find vmcore?

Comment 18 Rob Landry 2012-05-08 21:00:46 UTC

(In reply to comment #17)

> are you saying you do not see this mismatch between initrd scripts to create
> path to vmcore  vs v7 regexp for path to find vmcore?


Looks like the kdump init scripts and the kdump initrd do not agree with each other.  The attached initrd shows:

  mkdir -p /mnt//var/crash/127.0.0.1-$DATE
  VMCORE=/mnt//var/crash/127.0.0.1-$DATE/vmcore
  export VMCORE
  display_mem_usage
  load_selinux_policy /mnt
  makedumpfile -c -d 31 /proc/vmcore $VMCORE-incomplete >/dev/null

...this is the only reference to makedumpfile in the init of the initrd.  This should mean if kexec boots a 2nd kernel using this initrd the vmcore should end up where v7 expects to find it.

Comment 19 Gregory Fowler 2012-05-08 21:43:21 UTC

This is exactly what i am talking about.
A hidden requirement or dependency on a certain kdump config.

notice that these lines come after 

...
...

echo Saving to the local filesystem LABEL=Dump
DUMPDEV=LABEL=Dump
IS_LABEL=`echo $DUMPDEV | grep LABEL`
IS_UUID=`echo $DUMPDEV | grep UUID`
if [ -n "$IS_LABEL" -o -n "$IS_UUID" ]
then
  DUMPDEV=`findfs "$DUMPDEV"`
fi
fsck.ext3 -y $DUMPDEV
mount -t ext3 $DUMPDEV /mnt



Now lets look at how one can ever hope to hit the codeblock you highlighted....

This seems strange  and I do not understand....

we set DUMPDEV to LABEL=Dump
then set IS_LABEL to non NULL
then set IS_UUID
then we test  IS_LABEL for non NULL  of course it passes

Then set DUMPDEV=`findfs LABEL=Dump`


There is no filesystem with that label
so the fsck and the mount will fail

when they fail ** we skip the codeblock you highlighted.**

Our initscript will never hit this codeblock in teh default circumstance.

i think this code gets executed only if you give a raw device as the dump dev in the /etc/kdump.conf.

your v7 test is dependent on folks using some kdump config that triggers entry into the codeblock you highlighted.

This should be documented ...  BUT i really think your v7 regex  should be more robust to handle 
IP-%DATE
or 
Date with no trailing :SS

as the path to vmcore.
This allows simple configs where folks have enough free HD space on root fs to store the vmcore!!

Comment 20 Gregory Fowler 2012-05-08 21:46:40 UTC

> notice that these lines come after 
Should have been  Notice that the codeblock you highlighted comes after the following....
> 
> ...
> ...
> 
> echo Saving to the local filesystem LABEL=Dump
> DUMPDEV=LABEL=Dump
> IS_LABEL=`echo $DUMPDEV | grep LABEL`
> IS_UUID=`echo $DUMPDEV | grep UUID`
> if [ -n "$IS_LABEL" -o -n "$IS_UUID" ]
> then
>   DUMPDEV=`findfs "$DUMPDEV"`
> fi
> fsck.ext3 -y $DUMPDEV
> mount -t ext3 $DUMPDEV /mnt
>

Comment 21 Gregory Fowler 2012-05-08 21:54:09 UTC

In other words....
V7 will find the vmcore if we can mount  LABEL=Dump on some blockdevice .
This can only happen in a very specific config for kdump.

I have lots of HD space and would like to dump core to my root filesystem
I will never hit this codeblock unless i am told in documentation to do what it takes to get some raw dev formatted with ext3 and labeld as "Dump"

That seems to mean  /etc/kdump.conf  must have some config that forces this to happen.  probably raw /dev/<<BLOCKDEV>>
**guessing.. i'd have to really look at makedumpfile sources..."

buti t is clear that most folks hit this bug and never report it.  They just eventually hit on the right kdump.conf directive to enter that codeblock.



fsck.ext3 -y $DUMPDEV
mount -t ext3 $DUMPDEV /mnt
if [ $? == 0 ]
then
  mkdir -p /mnt//crash/127.0.0.1-$DATE
  VMCORE=/mnt//crash/127.0.0.1-$DATE/vmcore
  export VMCORE
  display_mem_usage
  load_selinux_policy /mnt
  makedumpfile -d 31 -c /proc/vmcore $VMCORE-incomplete >/dev/null
  exitcode=$?
  if [ $exitcode == 0 ]
  then
      mv $VMCORE-incomplete $VMCORE
      echo -e "\\033[0JSaving core complete"
  fi
  sync
  [ -e /mnt/selinux ] && umount /mnt/selinux
  umount /mnt
  if [ $exitcode == 0 ]
  then
    reboot -f
  fi
fi

Comment 22 Rob Landry 2012-05-09 14:41:37 UTC

FWIW, the IP address in the initrd is being added by the /sbin/mkdumprd script and the IP address missing from the kdump init scripts is a bug being addressed in 6.3 (Bug#755760).

Comment 23 Rob Landry 2012-05-09 14:43:46 UTC

Added the HP group as 755760 is HP private.  Also forgot to mention I have not looked to see if the patch to 755760 is in a section of code that is or isn't reached when $path is configured.

Comment 24 Gregory Fowler 2012-05-16 16:13:53 UTC

Hi
basically  in looking at /sbin/mkdumprd and /etc/init.d/kdump, we can conclude:

If the vmcore is written by the code in the initramfs init script,  you will have the path that matches 127.0.0.1-YYYY-MM-DD-HH:MM:SS.   In order to to do this , there must be a filesystem with a LABEL="Dump" and it must be mounted rw from within the initramfs  **before pivot root to real root fs** and the start of OS services

If the tests  fail and certain criteria are not met, then the vmcore cannot be generated from init ruunning in the initramfs.  We skip that code , do a pivot root and start init from the real root device.  then we rely on the service kdump to generate the vmcore.  And as we noted, the path to vmcore created by the kdumpo service is  NOT 127.0.0.1-YYYY-MM-DD-HH:MM:SS
It is YYYY-MM-DD-HH:MM  instead.

the v7 kdump test is relying upun the assumption that the vmcore is generated by code in the initramfs.  Which implies that a dump device must be a block dev  that is extX and and has LABEL=DUMP.  


This is not documented in the V7  test kit documentation.


in general  the OS kdump script and the initramfs code to gen vmcore files should have paths that match.  

So ...
1. V7 needs a documentation edit  and 
2. kexec-tools  neets a patch to the /etc/init.d/kdump script so that the path to vmcore is the same  if the dump happens before pivot root from initramfs or after pivot root from kdump service.

Comment 25 Gregory Fowler 2012-05-16 16:24:00 UTC

As of the latest release
https://rhn.redhat.com/rhn/software/packages/details/Overview.do?pid=704878

there is still a mismatch between /etc/init.d/kdump  code and 
initramfs  initscript code :
to gen the path to the vmcore file.

Comment 27 Jason Willeford 2012-07-19 15:57:10 UTC

Caspar,
We know why the default dump mechanism is failing, see comment 21 & 24, it was done on purpose.  The issue is the directory naming convention does not match between a vmcore created by initram and a core created by userspace.  Looking in  the code I've discovered a bug with the date format in function save_core() in /etc/init.d/kdump.  There isn't a :%S appended to the core file format.  Since the directory was created without the :SS at the end, v7 couldn't locate the core file and failed the test.  A patch was added to kdump for 6.3, see 755760, and I'm not sure if that corrected this issue or not.

Comment 36 errata-xmlrpc 2013-02-01 18:20:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0222.html

Note You need to log in before you can comment on or make changes to this bug.