Bug 243118
Summary: | kexec-tools package needs update to work with xen | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gerd Hoffmann <kraxel> | ||||||||||||
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 5.0 | CC: | anderson, bstein, ddomingo, djuran, dzickus, hbrock, jan.kratochvil, jarod, jfeeney, nhorman, nobody+mkumar, tao, vgoyal, xen-maint | ||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | RHBA-2007-0548 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2007-11-07 18:03:00 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | 212843, 244301 | ||||||||||||||
Bug Blocks: | |||||||||||||||
Attachments: |
|
Comment 1
Gerd Hoffmann
2007-06-07 12:52:11 UTC
One more thing: in the xen case the crashkernel= cmds line is passed to the xen kernel not the linux kernel and thus it isn't visible in /proc/cmdline. The sanity check in /etc/init.d/kdump fails due to that. Suggested fix: look for a sane crash kernel region in /proc/iomem instead like /sbin/kexec does. So we pretty clearly need the patch above in Comment #1 for this to work, but the script errors described in the initial comment should have been fixed by now. if you would please test with the latest kexec-tools package (kexec-tools-1.101-173.el5) to confirm that those script errors are resolved, I'll pull in the additional kexec patch refereced above. Thanks! Created attachment 156504 [details]
patch to enable xen crashdumps
I acutally take back what I said before. Looking at the upstream kexec-tools,
I think there is much more to xen support than the referenced patch. There is
quite a bit of infrastrucutre in place upstream for this, which can be
backported, but its not quite as simple as one patch. Also, based on the
initial comment, our xen (dom0) kernels have no support in them for kexec yet
(as evidenced by the lack of /sys/kernel/kexec_crash_loaded). Until the kernel
inherits kexec support from upstream, I'm not sure theres a whole lot of worth
incorporating this, as there will be no way to test our kexec with our kernel.
I'd say at this point, lets test with this patch in place, to verify that it
doesn't cause any regressions in our xen kernel as it is, verify that the
latest kdump initscript doesn't fail in the way described, and then lets wait
until our kernel gets kexec-support in xen to square away any remaining edges
from this backport.
Ok, I'm still catching up on this bug, I see where Gerd Has posted the upstream xen kdump patches. To be honest, I'm not thrilled with us taking these patches so close to the 5.1 submit deadline (We should incorporate them right after we release to maximize testing). But if its a 5.1 requirement I don't know what else we can do. Try the patch I uploaded and see if it does what we need it to do. If it misses the mark, I'll load a xen kernel on my debug system in the AM and fish the rest of the xen bits out of upstream Doesn't build for me. Patch incomplete maybe? It's a fresh distcvs checkout (173) plus comment 4 patch. gcc -Wall -g -fno-strict-aliasing -I./include -I./util_lib/include -DVERSION='"1.101"' -DRELEASE_DATE='"15 February 2005"' -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_ZLIB_H=1 -Ikexec/arch/x86_64/include -o /home/kraxel/BUILD/kexec-tools-1.101/objdir-x86_64-redhat-linux-gnu/kexec/crashdump-xen.o -c kexec/crashdump-xen.c kexec/crashdump-xen.c:35: warning: ‘struct crash_elf_info’ declared inside parameter list kexec/crashdump-xen.c:35: warning: its scope is only this definition or declaration, which is probably not what you want kexec/crashdump-xen.c: In function ‘xen_architecture’: kexec/crashdump-xen.c:37: error: dereferencing pointer to incomplete type kexec/crashdump-xen.c: In function ‘xen_get_nr_phys_cpus’: kexec/crashdump-xen.c:106: warning: statement with no effect kexec/crashdump-xen.c:92: warning: unused variable ‘match’ make: *** [/home/kraxel/BUILD/kexec-tools-1.101/objdir-x86_64-redhat-linux-gnu/kexec/crashdump-xen.o] Error 1 Created attachment 156565 [details]
new version of patch
Sorry, forgot to backup one of the files that needed to be changed, so it
didn't get picked up in the diff. New patch attached
Patch is incomplete too. xen infrastructure is there now, it also builds, but fails to load the crash kernel because the important chunk linked in comment #1 isn't included. Created attachment 156570 [details]
new patch
dang, my bad. Here it is, fixed.
now it works, thanks. Ok, then we just need to get this pm and qa acked for me to commit. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Thanks James! Fixed in -174.el5 Can you clarify the exact setup procedure for xen kernels? I'm running the 2.6.18-20.el5.kraxel.6xen and kexec-tools-1.101-174.el5. I've modified the /etc/sysconfig/kdump file to use the stock kernel as the kdump kernel: KDUMP_KERNELVER="2.6.18-20.el5.kraxel.6" But on both x86 and x86_64, I get: kdump: Cannot load /boot/vmlinuz-2.6.18-20.el5.kraxel.6 kdump: kexec: failed to load kdump kernel kdump: failed to start up On both machines, /boot/vmlinuz-2.6.18-20.el5.kraxel.6 and /boot/initrd-2.6.18-20.el5.kraxel.6kdump.img files exist, and I'm setting crashkernel=96M@16M (which works for the the non-xen kernels). > and I'm setting crashkernel=96M@16M (which works for the > the non-xen kernels) BTW, I left the crashkernel=96M@16M on the vmlinuz line in grub, which I now see is the wrong thing to do, since the kernel logs show the message from parse_cmdline_early(): "Ignoring crashkernel command line, parameter will be supplied by xen" But moving the crashkernel=96M@16M line to the "/xen.gz-2.6.18-20.el5.kraxel.6" kernel line in grub.conf, at least on x86_64, results in what Jaron reports in BZ #243880 "[RHEL5.1 Xen Kdump] Panic: unable to reserve kdump memory": https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=243880 You'll need to load @32M, not @16M, due to the memory layout of the hypervisor. Beyond that, "sh -x /etc/rc.d/init.d/kdump start" is the easiest way to find out what's going wrong with the kdump script. Ok, excellent -- thanks, that loads OK on x86_64. So just to be clear, the state of the kexec-tools now is that the crashkernel= line needs to be placed on *both* the xen-gz and vmlinuz lines, because the init.d/kdump script parses /proc/cmdline to get the parameters. (In reply to comment #18) > Ok, excellent -- thanks, that loads OK on x86_64. Likewise here. > So just to be clear, the state of the kexec-tools now is that the > crashkernel= line needs to be placed on *both* the xen-gz and vmlinuz > lines, because the init.d/kdump script parses /proc/cmdline to get > the parameters. I believe there was a suggestion to have the kdump initscript parse /proc/iomem instead, not sure if that has been investigated just yet. Neil? Right, parsing /proc/iomem would be far superior: it's just asking for trouble if we expect to parse both the xen and vmlinuz lines for this info. Another thing -- has anybody actually been able to analyze the resultant xen vmcores? I was successful in *creating* an x86 xen vmcore: # strings vmcore | grep "Linux ver" Linux version 2.6.18-20.el5.kraxel.6xen (root.boston.redhat.com) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-52)) #1 SMP Fri Jun 8 15:43:18 EDT 2007 # But the vmcore appears to be missing the missing the NT_PRSTATUS, and the xen-specific XEN_ELFNOTE_CRASH_INFO and XEN_ELFNOTE_CRASH_REGS notes sections: # readelf -a vmcore ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: CORE (Core file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 64 (bytes into file) Start of section headers: 0 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 5 Size of section headers: 0 (bytes) Number of section headers: 0 Section header string table index: 0 There are no sections in this file. There are no sections in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0 LOAD 0x0000000000000158 0x00000000c0000000 0x0000000000000000 0x00000000000a0000 0x00000000000a0000 RWE 0 LOAD 0x00000000000a0158 0x00000000c0100000 0x0000000000100000 0x0000000001f00000 0x0000000001f00000 RWE 0 LOAD 0x0000000001fa0158 0x00000000c8000000 0x0000000008000000 0x0000000030000000 0x0000000030000000 RWE 0 LOAD 0x0000000031fa0158 0xffffffffffffffff 0x0000000038000000 0x0000000007ee0000 0x0000000007ee0000 RWE 0 There is no dynamic section in this file. There are no relocations in this file. There are no unwind sections in this file. No version information found in this file. # ... and so the crash utility cannot handle it, i.e., it doesn't even recognize it as a xen kdump dumpfile. Here's the output from a sample x86 xen vmcore that I used for development, which was given to me by Magnus Damm. Note the extra xen sections at the end of the readelf output: # readelf -a \ vmcore-12733-i386-kexec-tools-testing-b5c22baac1a632363a91da666886bb0ae285bd67 ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: CORE (Core file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 64 (bytes into file) Start of section headers: 0 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 5 Size of section headers: 0 (bytes) Number of section headers: 0 Section header string table index: 0 There are no sections in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000000158 0x0000000000000000 0x0000000000000000 0x00000000000001bc 0x00000000000001bc 0 LOAD 0x0000000000000314 0x00000000c0000000 0x0000000000000000 0x00000000000a0000 0x00000000000a0000 RWE 0 LOAD 0x00000000000a0314 0x00000000c0100000 0x0000000000100000 0x0000000001f00000 0x0000000001f00000 RWE 0 LOAD 0x0000000001fa0314 0x00000000c6000000 0x0000000006000000 0x0000000032000000 0x0000000032000000 RWE 0 LOAD 0x0000000033fa0314 0xffffffffffffffff 0x0000000038000000 0x00000000077f0000 0x00000000077f0000 RWE 0 There is no dynamic segment in this file. There are no relocations in this file. There are no unwind sections in this file. No version information found in this file. Notes at offset 0x00000158 with length 0x000001bc: Owner Data size Description CORE 0x00000090 NT_PRSTATUS (prstatus structure) Xen 0x00000010 Unknown note type: (0x01000002) Xen 0x00000024 Unknown note type: (0x01000001) CORE 0x00000090 NT_PRSTATUS (prstatus structure) Xen 0x00000010 Unknown note type: (0x01000002) # There's an NT_PRSTATUS Notes section for each of 2 cpus, a single XEN_ELFNOTE_CRASH_INFO (0x01000001) sections and two XEN_ELFNOTE_CRASH_REGS (0x01000002), also 1 per cpu. The XEN_ELFNOTE_CRASH_INFO Note is what's crucial, as it contains the key to translating the dom0 pfns into the physical memory described by the PT_LOAD segments. I'm under the understanding that those notes get set up at kexec_load time while running in the first kernel, and should be sitting there for the secondary kernel to export in /proc/vmcore. Looks like we have to pull more xen support bits into kexec-tools. When compiling the xen-tools-testing tree as-is (see comment #1) and use the resulting kexec binary, then the generated vmcore actually has the notes. Looking ... Created attachment 157006 [details]
additional bits for xen support
Tested on i386, will look at x86_64 now, stay tuned ...
Created attachment 157015 [details]
64bit bits
Gerd, please don't post to bz's after they're in modified state, otherwise I tend to loose track of them (I filter them out in my bz view). If you could open a new bz with these patches, I'd be happy to incorporate them. Thanks! New bz to track this is bug 244301. Note that we appear to still need additional patches. Or at least my test boxes do. I can't even get a dump with the non-xen kraxel kernel on two boxes that work fine w/the 5.0GA kernel... An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0548.html |