Bug 588609
Summary: | To kexec the pxeboot image can cause system hang. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Kirby Zhou <kirbyzhou> | ||||
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | amwang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-07-14 12:14:29 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Kirby Zhou
2010-05-04 04:44:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This may be a duplicate of another bug I have how are you crashing the kernel after you load the panic kernel? Also, why are you manually running kexec rather than using the service script? Can you provide a full console log of the reproducing environment during a crash? Thanks! I donot what do you mean about "using the service script". I just want to use kexec to reinstall a RHEL6 system through ssh connection, but it failed. To find the reason, I tested the simplest situation: A brandly new installed RHEL6 box (x86_64), just do kexec with simplest arguments, watching it from the console. The system is successfully shutdown by kexec, but hang after the message "Starting the new kernel" Ok, so You're not using kexec as part of the kdump service. In that case can you try loading the kdump kernel first (via kexec -l), then running the kexec -e to execute it? Also, if you're just trying to get a RHEL6 system installed over ssh, there are far simpler ways to do it. Why not just proxy a vnc session over an ssh tunnel? " Also, if you're just trying to get a RHEL6 system installed over ssh, there are far simpler ways to do it. Why not just proxy a vnc session over an ssh tunnel? " I think you misunderstanding my requirements. I want to automaticly reinstall a system WITHOUT touching the keyboard, the monitor, and cdrom drive of the PC, and donot touch any part of the DHCP system. Starting a vnc session is easy, the diffculty is how to boot the installer. Ah, I see, you're trying to pxeboot a system from a running kernel to re-install the system. I get it now. My previous suggestion still stands though, please try using the kexec -l command to load the kernel, then follow it with kexec -e to actually execute the new kernel. 'kexec -l' which kernel? the pxeboot kernel or something else? kexec -l vmlinuz --initrd=initrd.img kexec -e Just a "Starting the new kernel", then hang. This time, system is not automatically shut down before "Starting the new kernel" ok, can you try attaching a serial console or a monitor and enabling earlyprintk in the kernel (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/kernel-parameters.txt;h=839b21b0699ac10a1991c47455cddd05ced6491b;hb=HEAD) That will give us a better idea of how far its getting during the boot process Also, if you're trying this from an X session, make sure that you're just on a VT so that we can be sure the video mode is such that you can get console output on your monitor. I donot use X. [root@djt-18-97 ~]# cat /proc/cmdline ro root=/dev/mapper/vgroot-lvroot rd_LVM_LV=vgroot/lvroot rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet earlyprintk=vga [root@djt-18-97 ~]# kexec --initrd=/boot/install/initrd.img -l /boot/install/vmlinuz --command-line='earlyprintk=vga' [root@djt-18-97 ~]# kexec -e Wait for 1 minute with nothing output to screen, then "Starting the new kernel" and HANG. That means you're not getting into the new kernel at all. The system is hanging sometime prior to jumping into the new kernel. I wonder if the problem here isn't that we're loading the kdump kernel into a location greater then 4GB. Given that you have 48GB of ram, I can imagine that is entirely possible. And since you're using the facility directly, instead of for a crash kernel (as its nominally used), I would not be at all suprised if thats happening. Unfortunately, its difficult to tell with -l where the kernel / initramfs will reside becuase the memory is allocated dynamically. And if thats the problem I'm not sure theres anythign we'll be able to do about it. The test for this should be pretty straightforward. You should try either of the following: 1) boot the system with mem=4G, and attempt your test above OR 1) boot with this on the command line: crashkernel=128M@1G 2)load the pxeboot image using the -p option rather than the -l option 3) crash the kernel with this command: echo c > /proc/sysrq-trigger I would prefer the second method as it guarantees the placement of the kenrel and initramfs in memory. But either should work. If you run these tests and you manage to boot your pxe kernel, that should be indicative of the fact that straightforward use of the kexec utility in this system is placing the kernel in a bad location. Failed with method 2 128M@1G failed, 256M@16 also failed with the same message. ]# kexec --initrd=/boot/install/initrd.img -p /boot/install/vmlinuz --command-line='earlyprintk=vga' Could not find a free area of memory of 395000 bytes... ]# dmesg | fgrep crash Reserving 256MB of memory at 1024MB for crashkernel (System RAM: 51200MB) Kernel command line: ro root=/dev/mapper/vgroot-lvroot LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet earlyprintk=vga crashkernel=256M@1G crash memory driver: version 1.0 ]# free total used free shared buffers cached Mem: 49290352 1451012 47839340 0 58816 278200 -/+ buffers/cache: 1113996 48176356 Swap: 8388600 0 8388600 Sorry, I forogot this system profile, you'll actually need to reserve a larger amount of ram (more along the lines of 512M-1024M to get all the space you need. ]# kexec --initrd=/boot/install/initrd.img -p /boot/install/vmlinuz --command-line='earlyprintk=vga' Could not find a free area of memory of 395000 bytes... ]# cat /proc/cmdline ro root=/dev/mapper/vgroot-lvroot LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet earlyprintk=vga crashkernel=768M@1G mem=3500M (In reply to comment #15) > ]# kexec --initrd=/boot/install/initrd.img -p /boot/install/vmlinuz > --command-line='earlyprintk=vga' > Could not find a free area of memory of 395000 bytes... > ]# cat /proc/cmdline > ro root=/dev/mapper/vgroot-lvroot LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 > KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet earlyprintk=vga crashkernel=768M@1G > mem=3500M How does your /proc/iomem look like? Perhaps the memory your reserved for crashkernel failed this time. Withou crashkernel= and mem= I think there is no problem with 768M@1G 00100000-7d90bfff : System RAM 01000000-014c9f4e : Kernel code 014c9f4f-01899b2f : Kernel data 019c2000-01c1bd17 : Kernel bss ]# cat /proc/iomem 00000000-0009c3ff : System RAM 0009c400-0009ffff : reserved 000e0000-000fffff : reserved 00100000-7d90bfff : System RAM 01000000-014c9f4e : Kernel code 014c9f4f-01899b2f : Kernel data 019c2000-01c1bd17 : Kernel bss 7d90c000-7d9bbfff : reserved 7d9bc000-7f68efff : System RAM 7f68f000-7f6defff : reserved 7f6df000-7f7defff : ACPI Non-volatile Storage 7f7df000-7f7fefff : ACPI Tables 7f7ff000-7f7fffff : System RAM 7f800000-8fffffff : reserved 80000000-8fffffff : PCI MMCONFIG 0 [00-ff] 80000000-8fffffff : pnp 00:0a 90000000-900fffff : PCI Bus 0000:1a 90000000-9003ffff : 0000:1a:00.0 90100000-902fffff : PCI Bus 0000:01 90300000-904fffff : PCI Bus 0000:01 92000000-95ffffff : PCI Bus 0000:0b 92000000-93ffffff : 0000:0b:00.0 92000000-93ffffff : bnx2 94000000-95ffffff : 0000:0b:00.1 94000000-95ffffff : bnx2 96000000-96ffffff : PCI Bus 0000:06 96000000-96ffffff : PCI Bus 0000:07 96000000-96ffffff : 0000:07:00.0 96000000-96ffffff : matroxfb FB 97000000-978fffff : PCI Bus 0000:06 97000000-978fffff : PCI Bus 0000:07 97000000-977fffff : 0000:07:00.0 97800000-97803fff : 0000:07:00.0 97800000-97803fff : matroxfb MMIO 97900000-979fffff : PCI Bus 0000:1a 97900000-9793ffff : 0000:1a:00.0 97900000-9793ffff : megasas: LSI 97940000-97943fff : 0000:1a:00.0 97940000-97943fff : megasas: LSI 97a00000-97a03fff : 0000:00:16.0 97a00000-97a03fff : ioatdma 97a04000-97a07fff : 0000:00:16.1 97a04000-97a07fff : ioatdma 97a08000-97a0bfff : 0000:00:16.2 97a08000-97a0bfff : ioatdma 97a0c000-97a0ffff : 0000:00:16.3 97a0c000-97a0ffff : ioatdma 97a10000-97a13fff : 0000:00:16.4 97a10000-97a13fff : ioatdma 97a14000-97a17fff : 0000:00:16.5 97a14000-97a17fff : ioatdma 97a18000-97a1bfff : 0000:00:16.6 97a18000-97a1bfff : ioatdma 97a1c000-97a1ffff : 0000:00:16.7 97a1c000-97a1ffff : ioatdma 97a21000-97a213ff : 0000:00:1d.7 97a21000-97a213ff : ehci_hcd 97a21400-97a217ff : 0000:00:1a.7 97a21400-97a217ff : ehci_hcd 97a21800-97a218ff : 0000:00:1f.3 fc000000-fcffffff : pnp 00:0a fe710000-fe711fff : pnp 00:0a fe800000-fe9fffff : pnp 00:0a fea00000-feafffff : pnp 00:0a feb00000-febfffff : pnp 00:0a fec00000-fec00fff : IOAPIC 0 fec80000-fec80fff : IOAPIC 1 fed00000-fed003ff : HPET 0 fed00000-fed003ff : pnp 00:05 fed1c000-fed1ffff : reserved fed1c000-fed1ffff : pnp 00:0a fee00000-feefffff : pnp 00:0a fee00000-fee00fff : Local APIC ff800000-ffffffff : reserved ffc00000-ffffffff : pnp 00:0a 100000000-c7fffffff : System RAM (In reply to comment #17) > Withou crashkernel= and mem= > I think there is no problem with 768M@1G > > ]# cat /proc/iomem > > 00000000-0009c3ff : System RAM > 0009c400-0009ffff : reserved > 000e0000-000fffff : reserved > 00100000-7d90bfff : System RAM > 01000000-014c9f4e : Kernel code > 014c9f4f-01899b2f : Kernel data > 019c2000-01c1bd17 : Kernel bss > 7d90c000-7d9bbfff : reserved > 7d9bc000-7f68efff : System RAM 28M here. > 7f7ff000-7f7fffff : System RAM Only 4M here. > 100000000-c7fffffff : System RAM 42G@4G here. So probably kexec loaded the kernel into the last memory area which is @4G. I think this is similar to Bug 580843, currently there are some problems if we load kernel into memory higher than 4G. Vitaly is working on this. Note, comment #18 is for your 'kexec -l' case. For 'kexec -p' case, that should be a bug of kexec, I can reproduce it here, 'service kdump start' failed too. I am checking the code. Hi, does appending "nomodeset" to your kernel commandline help? I noticed KMS doesn't work with kdump. Created attachment 417292 [details]
nomodeset do not help.
nomodeset do not help.
~]# cat /proc/cmdline
ro root=/dev/mapper/vgroot-lvroot LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet earlyprintk=vga nomodeset
]# kexec --initrd=/boot/install/initrd.img /boot/install/vmlinuz --command-line='earlyprintk=vga nomodeset'
can you try reserving the memory without specifying a location? That should give the kernel the freedom to find a sufficiently sized hole It seems workable now. #] kexec --initrd=/boot/initramfs-2.6.32-37.el6.x86_64.img --command-line='ro root=/dev/mapper/vgroot-lvroot rd_LVM_LV=vgroot/lvroot rd_LVM_LV=vgroot/lvswap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet' /boot/vmlinuz-2.6.32-37.el6.x86_64 After a short while, the system is up again. Ok, I expect you are somehow trampling on system space that the kernel or hardware is using for something else (usually ACPI space). Since we don't really use the precise locaiton syntax anymore for just this reason, I think we can close this as working |