Description of problem: During testing of kdump we find that if I leave the path as /var/crash and do not specify a UUID or Label or blockdev then the kdump test does not pass. Version-Release number of selected component (if applicable): The very latest How reproducible: 100% Steps to Reproduce: 1. install v7 2. enable kdump 3.configure kdump to dump to /var/crash, do not specify UUID, blockdev or LABEL 4. execute the kdump test 5. cd /var/crash 6. You cannopt find [ipaddress]-YYY-MM-DD-HH:MM:SS whichthe kdump.py script looks for. Actual results: 1 You can only find what the /etc/init.d/kdump script created: same as above but no prefix: "[ipaddress]-" and no suffix: ":SS" Expected results: When the kdump test suite is run. it finds the vmcore at the default location created by kdump service. kdump service does not normally create dirs that match /ipadd-YYYY-MM-DD-HH:MM:SS/ Additional info: In the kdump.conf if one simple dumps to /var/crash and does not specify blockdev, label, uuid , etc. you get the path to vmcore that matches the init.d service. This is true 100%.All of os here get this failure. many folks have learned to get around this failure in v7 by using some kind of undocumented workaround built into makedumpfile binary. But no one has requested documentation of this or shared with others until I asked about it. There seems to be an undocumented override to /etc/init.d/kdump script path. But in any case it is very clear that the regex in kdump.py does not match what the defaults are in /etc/init.d/kdump script.
If you enable kdump with these 2 config files and dump core. there is no way kdump.py from v7 will pass . Dump core and look in /var/crash. you will see YYYY-mm-dd-hh:mm just like the service script says and unlike the kdump.py looks for. /etc/kdump.conf #raw /dev/sda5 #ext4 /dev/sda3 #ext4 LABEL=/boot #ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937 #net my.server.com:/export/tmp #net user.com #path /var/crash #core_collector cp --sparse=always #extra_bins /bin/cp #link_delay 60 #kdump_post /var/crash/scripts/kdump-post.sh #extra_bins /usr/bin/lftp #disk_timeout 30 #extra_modules gfs2 #options modulename options #default shell path /var/crash core_collector makedumpfile -c --message-level 1 -d 31 link_delay 60 My /etc/sysconfig/kdump KDUMP_KERNELVER="" # The kdump commandline is the command line that needs to be passed off to # the kdump kernel. This will likely match the contents of the grub kernel # line. For example: # KDUMP_COMMANDLINE="ro root=LABEL=/" # If a command line is not specified, the default will be taken from # /proc/cmdline KDUMP_COMMANDLINE="" # This variable lets us append arguments to the current kdump commandline # As taken from either KDUMP_COMMANDLINE above, or from /proc/cmdline KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 reset_devices cgroup_disable=memory" # Any additional /sbin/mkdumprd arguments required. MKDUMPRD_ARGS="" # Any additional kexec arguments required. In most situations, this should # be left empty # # Example: # KEXEC_ARGS="--elf32-core-headers" KEXEC_ARGS="" #Where to find the boot image KDUMP_BOOTDIR="/boot" #What is the image type used for kdump KDUMP_IMG="vmlinuz" #What is the images extension. Relocatable kernels don't have one KDUMP_IMG_EXT=""
from teh kdump.py this comment and the regex are not correct as the /etc/init.d/kdump script will show. ###KDUMP.py ... # find the vmcore image file matching the timestamp # vmcore directories are like this: 127.0.0.1-2011-03-10-13:18:27 vmcoreDirectoryPattern = re.compile("(?P<ipaddr>[0-9]+\.[0-9]+\.[0-9]+)-(?P<date>[0-9]+-[0-9]+-[0-9]+)-(?P<time>[0-9]+:[0-9]+:[0-9]+)") ###/etc/init.d/kdump function save_core() { local kdump_path kdump_path=`grep ^path $KDUMP_CONFIG_FILE | cut -d' ' -f2-` if [ -z "$kdump_path" ]; then coredir="/var/crash/`date +"%Y-%m-%d-%H:%M"`" else coredir="${kdump_path}/`date +"%Y-%m-%d-%H:%M"`" fi mkdir -p $coredir ... ... ...
At this point of realization... i went to makedumpfile and found it to be an ELF and not a script. So... did not feel like digging through sources but the only reason this test will pass is when users hit upon the undocumented override in the makedumpfile logic (this is my guess) I think there is logic in there that puts preformat strings of %host-%date-%time in the path to vmcore based on things like device = LABEL or UUID or blockdev sine it is common to spec these in setting up kdump and not rely on path=/var/crash on root filesystem, you are not getting a lot of bug reports on this.... (like i said, a guess...) Maybe tonight Ill lookat sources for makedumpfile... but certs have me worn out.
If you leave the "path" variable commented-out in kdump.conf, so it defaults to /var/crash rather than explicitly setting it, does the test pass?
Also, could you attach the test logs? Thanks!
OK ill give it a shot. Some of the other folks here said they just use extN LABEL or UUID or whatever to work around this. But I wanted to stress that multiple folks have seen this... I have not verified that works... but during certs i create / large enough to host full ram dump for vmcore so never bothered. (my last cert was before kdump was written into the v7 tests) Ill try a couple of runs and gather output.
With no "path /var/crash" v7 continue prints this to console.... Looking for vmcore image directories under /var/crash Error: could not locate vmcore file copying attachments... checking directory /var/log/v7/runs/1/kdump Skipping output.log saveOutput: /var/log/v7/runs/1/kdump/output.log Return value was 1 saved to /var/v7/results.xml yet /var/crash has this vmcore [root@SUT ~]# find /var/crash /var/crash /var/crash/2012-04-23-22:04 /var/crash/2012-04-23-22:04/vmcore [root@SUT kdump]# cat output.log <output> Test Parameters: OUTPUTFILE=/var/log/v7/runs/1/kdump/output.log DEVICE=local TESTSERVER=unknown RUNMODE=normal DEBUG=off UDI= <output name="initialize"> Checking required packages: kexec-tools-2.0.0-209.el6.x86_64 crash-5.1.8-1.el6.x86_64 kernel-debuginfo-2.6.32-220.el6.x86_64 Checking kdump configuration Found crashkernel=256M boot parameter Kernel panic reboot timeout is 0 Setting to 1 sec. core_collector currently set to "makedumpfile -c -d 31" link_delay set to 60 Checking kdump service kdump is running The test will now cause a kernel panic to exercise kdump Ready to restart? (y|n) response: y </output> Syncing disks Test Parameters: OUTPUTFILE=/var/log/v7/runs/1/kdump/output.log DEVICE=local TESTSERVER=unknown RUNMODE=auto DEBUG=off UDI= INCOMPLETE=1 <output name="verify"> reboot took 00:07:00 method: panic kernel: 2.6.32-220.el6.x86_64 Apr 23 22:04:49 localhost kernel: Linux version 2.6.32-220.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 Apr 23 22:08:23 localhost kernel: Linux version 2.6.32-220.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 Error: system rebooted 2 times Looking for vmcore image directories under /var/crash Error: could not locate vmcore file </output> </output>
Can the full results package be attached which includes the above run?
Also a copy of the kdump initrd (/boot/initrd-2.6.32-220.el6.x86_64kdump.img).
Created attachment 583068 [details] kdump initrd kdump initrd
Created attachment 583069 [details] result
Hi requested info has been attached. Question: With a fresh install of ol reboot config kdump in firstboot reboot with kdump active and then a fresh install of v7, then v7 run --test kdump --device local are you saying you do not see this mismatch between initrd scripts to create path to vmcore vs v7 regexp for path to find vmcore?
(In reply to comment #17) > are you saying you do not see this mismatch between initrd scripts to create > path to vmcore vs v7 regexp for path to find vmcore? Looks like the kdump init scripts and the kdump initrd do not agree with each other. The attached initrd shows: mkdir -p /mnt//var/crash/127.0.0.1-$DATE VMCORE=/mnt//var/crash/127.0.0.1-$DATE/vmcore export VMCORE display_mem_usage load_selinux_policy /mnt makedumpfile -c -d 31 /proc/vmcore $VMCORE-incomplete >/dev/null ...this is the only reference to makedumpfile in the init of the initrd. This should mean if kexec boots a 2nd kernel using this initrd the vmcore should end up where v7 expects to find it.
This is exactly what i am talking about. A hidden requirement or dependency on a certain kdump config. notice that these lines come after ... ... echo Saving to the local filesystem LABEL=Dump DUMPDEV=LABEL=Dump IS_LABEL=`echo $DUMPDEV | grep LABEL` IS_UUID=`echo $DUMPDEV | grep UUID` if [ -n "$IS_LABEL" -o -n "$IS_UUID" ] then DUMPDEV=`findfs "$DUMPDEV"` fi fsck.ext3 -y $DUMPDEV mount -t ext3 $DUMPDEV /mnt Now lets look at how one can ever hope to hit the codeblock you highlighted.... This seems strange and I do not understand.... we set DUMPDEV to LABEL=Dump then set IS_LABEL to non NULL then set IS_UUID then we test IS_LABEL for non NULL of course it passes Then set DUMPDEV=`findfs LABEL=Dump` There is no filesystem with that label so the fsck and the mount will fail when they fail ** we skip the codeblock you highlighted.** Our initscript will never hit this codeblock in teh default circumstance. i think this code gets executed only if you give a raw device as the dump dev in the /etc/kdump.conf. your v7 test is dependent on folks using some kdump config that triggers entry into the codeblock you highlighted. This should be documented ... BUT i really think your v7 regex should be more robust to handle IP-%DATE or Date with no trailing :SS as the path to vmcore. This allows simple configs where folks have enough free HD space on root fs to store the vmcore!!
> notice that these lines come after Should have been Notice that the codeblock you highlighted comes after the following.... > > ... > ... > > echo Saving to the local filesystem LABEL=Dump > DUMPDEV=LABEL=Dump > IS_LABEL=`echo $DUMPDEV | grep LABEL` > IS_UUID=`echo $DUMPDEV | grep UUID` > if [ -n "$IS_LABEL" -o -n "$IS_UUID" ] > then > DUMPDEV=`findfs "$DUMPDEV"` > fi > fsck.ext3 -y $DUMPDEV > mount -t ext3 $DUMPDEV /mnt >
In other words.... V7 will find the vmcore if we can mount LABEL=Dump on some blockdevice . This can only happen in a very specific config for kdump. I have lots of HD space and would like to dump core to my root filesystem I will never hit this codeblock unless i am told in documentation to do what it takes to get some raw dev formatted with ext3 and labeld as "Dump" That seems to mean /etc/kdump.conf must have some config that forces this to happen. probably raw /dev/<<BLOCKDEV>> **guessing.. i'd have to really look at makedumpfile sources..." buti t is clear that most folks hit this bug and never report it. They just eventually hit on the right kdump.conf directive to enter that codeblock. fsck.ext3 -y $DUMPDEV mount -t ext3 $DUMPDEV /mnt if [ $? == 0 ] then mkdir -p /mnt//crash/127.0.0.1-$DATE VMCORE=/mnt//crash/127.0.0.1-$DATE/vmcore export VMCORE display_mem_usage load_selinux_policy /mnt makedumpfile -d 31 -c /proc/vmcore $VMCORE-incomplete >/dev/null exitcode=$? if [ $exitcode == 0 ] then mv $VMCORE-incomplete $VMCORE echo -e "\\033[0JSaving core complete" fi sync [ -e /mnt/selinux ] && umount /mnt/selinux umount /mnt if [ $exitcode == 0 ] then reboot -f fi fi
FWIW, the IP address in the initrd is being added by the /sbin/mkdumprd script and the IP address missing from the kdump init scripts is a bug being addressed in 6.3 (Bug#755760).
Added the HP group as 755760 is HP private. Also forgot to mention I have not looked to see if the patch to 755760 is in a section of code that is or isn't reached when $path is configured.
Hi basically in looking at /sbin/mkdumprd and /etc/init.d/kdump, we can conclude: If the vmcore is written by the code in the initramfs init script, you will have the path that matches 127.0.0.1-YYYY-MM-DD-HH:MM:SS. In order to to do this , there must be a filesystem with a LABEL="Dump" and it must be mounted rw from within the initramfs **before pivot root to real root fs** and the start of OS services If the tests fail and certain criteria are not met, then the vmcore cannot be generated from init ruunning in the initramfs. We skip that code , do a pivot root and start init from the real root device. then we rely on the service kdump to generate the vmcore. And as we noted, the path to vmcore created by the kdumpo service is NOT 127.0.0.1-YYYY-MM-DD-HH:MM:SS It is YYYY-MM-DD-HH:MM instead. the v7 kdump test is relying upun the assumption that the vmcore is generated by code in the initramfs. Which implies that a dump device must be a block dev that is extX and and has LABEL=DUMP. This is not documented in the V7 test kit documentation. in general the OS kdump script and the initramfs code to gen vmcore files should have paths that match. So ... 1. V7 needs a documentation edit and 2. kexec-tools neets a patch to the /etc/init.d/kdump script so that the path to vmcore is the same if the dump happens before pivot root from initramfs or after pivot root from kdump service.
As of the latest release https://rhn.redhat.com/rhn/software/packages/details/Overview.do?pid=704878 there is still a mismatch between /etc/init.d/kdump code and initramfs initscript code : to gen the path to the vmcore file.
Caspar, We know why the default dump mechanism is failing, see comment 21 & 24, it was done on purpose. The issue is the directory naming convention does not match between a vmcore created by initram and a core created by userspace. Looking in the code I've discovered a bug with the date format in function save_core() in /etc/init.d/kdump. There isn't a :%S appended to the core file format. Since the directory was created without the :SS at the end, v7 couldn't locate the core file and failed the test. A patch was added to kdump for 6.3, see 755760, and I'm not sure if that corrected this issue or not.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0222.html