LTC Owner is: jstultz.com LTC Originator is: ankigarg.com Problem description: Kdump kernel panics while loading EDAC modules Provide output from "uname -a", if possible: Linux llm39.in.ibm.com 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:14 EST 2007 x86_64 x86_64 x86_64 GNU/Linux Hardware Environment Machine type (p650, x235, SF2, etc.): LS20/LS21 Cpu type (Power4, Power5, IA-64, etc.):Dual Core AMD Opteron(tm) Processor 275 Is this reproducible? Yes. service kdump start echo c > /proc/sysrq-trigger --- Had posted the attached patch to the edac mailing list. The maintainer, Doug Thompson has agreed on the fix and would be picking it up in the next release of k8_edac module. Following is the link to the discussion on the mailing list: http://sourceforge.net/mailarchive/forum.php?thread_name=20070424100935.GA3039%40in.ibm.com&forum_name=bluesmoke-devel
Created attachment 155341 [details] Kernel boot log
Created attachment 155342 [details] fix_kdump_panic_k8_edac.patch Fix for kdump panic due to k8_edac modules.
Hmmmm. My tree doesn't have a drivers/edac/k8_edac.c... Am I missing another patch to create k8_edac.c? Clark
changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mannthey.com ------- Additional Comments From mannthey.com (prefers email at kmannth.com) 2007-05-25 14:29 EDT ------- Clark, The k8_edac driver is in RHEL5 only, it is not in mainline. That code only currently exists in the EDAC cvs tree. I have had some issues with the driver that are on my todo to fix. I will be asking Redhat to pickup the k8_edac fixed driver as part of our userspace ECC error detection work within the next 2 weeks.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-02 06:19 EDT ------- Clark, Sorry for not making things clear. Here are the details: On RHEL5, the kernel is relocatable. Thus the same kernel is used as the kdump kernel as well. No extra kernel need be shipped. But for -rt, the support for relocatable kernel is not yet in. This would require an extra rpm be shipped for the kdump kernel. But to cut down on the extra effort to build and maintain an additional kernel rpm from RedHat's and our perspective, it would be nice to use RHEL5 kernel itself as the kdump kernel for -rt, made possible because of relocatable kernel support. But, RHEL5 kernel panics on LS20/21 due to EDAC drivers, which as Keith mentioned is not mainline but present in RHEL5. With this patch, RHEL5 kernel would work absolutely fine with -rt as the first kernel. Also, this would fix RHEL5 on LS20/21.
None of this answers the question *WHY* did the system log a corrupt processor context, that according to the docs I have is a serious hardware failure event and it would be nice to know why such an event is "just lying around" when the kdump kernel is loaded ?
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-05 05:38 EDT ------- There is some description in RedHat Issue Tracker No. 119116. The following is from the findings by Chandru from the IS team. The k8_edac module checks for memory context of the processor. In the kdump kernel, the kernel tries to access memory outside of its own (while copying /proc/vmcore, it accesses the memory of the first kernel, which is outside of what it is allowed). This is reported by the EDAC modules as corrupt memory context and panics. Hope that helps.
This doesn't make any sense. It's at odds with the documentation and at odds with tested behaviour of fault handling on those processors. Accessing the memory of the first kernel shouldn't be causing the CPU to log an unrecoverable CPU error, and if it does the kernel MCE traps ought to be panicing when it does this. What exactly is being touched when this occurs - are you erroneously copying I/O mappings and thus confusing hardware (which needs to be fixed properly) or what ? What occurs if you check the the bit every time you copy a page of the old kernel - what bus address range is triggering the fault and what is located there ?
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-05 06:09 EDT ------- Chandru, could you pl provide some details on the copying of /proc/vmcore
----- Additional Comments From chandru.s.com 2007-06-06 05:03 EDT ------- From earlier investigation, edac was in a loop within the following code running every poll_msec time interval. ----------- EDAC DEBUG: do_edac_check() EDAC DEBUG: check_mc_devices() EDAC DEBUG: k8_check() EDAC DEBUG: k8_check() EDAC DEBUG: do_pci_parity_check() ----------- Once we attempt to copy /proc/vmcore ( via 'cp /proc/vmcore <destination> ), we used to be successful in copying a partial vmcore of different sizes at different runs (mostly because of running out of poll interval ) and the above loop would detect an error condition during the next polling cycle and log the 'GART TLB' and 'processor context corrupt' error and call panic. Probably need to find a way to check the condition (regs->nbsh & BIT(25) ) every time a page of the old kernel is copied.
Or wait twice the poll time after each page during the copy so you know about which page was hit. My guess is still that there are I/O space mappings or ACPI mappings which are being copied and causing the CPU faults. If so these need to be fixed not the edac code.
----- Additional Comments From mannthey.com (prefers email at kmannth.com) 2007-06-06 14:04 EDT ------- I looking into this I have the following thoughts. 1. We do need to track down why the error is happening. The MCE Processor framework thinks something is wrong. 2. Changing this error to a prink is the right thing to do. As stated in the mailing list the error that is being raised is a Processor Context Corrupt and poll this bit may not make alot of sense and panic is too heavy handed. Eric Bebiederm explains the situation the best. " Re: [PATCH] Fix to make k8_edac kdump aware From: <ebiederm@xm...> - 2007-05-07 06:24 Doug Thompson <norsk5@ya...> writes: > In another email from Eric, he did point out that the PCC error really > does NOT have much information on what it really means. I am leaning to > just PULL the check of the PCC bit, and thus pull the logging and the > panic as well. > > AMD's BKDG does not give much information on it really. PCC is extremely well defined. Processor Context Corrupt means that the machine check handler does not have enough information to resume the instruction stream, the exception interrupted. It is part of the generic machine check infrastructure. However if you are not in a machine check handler PCC is much less meaningful. The notion that you can't return to the interrupted exception stream doesn't mean much when you haven't interrupted an exception stream. All you know is that the error is BAD. So the ambiguity of PCC comes from the fact that we are polling. Doug does that make sense? Eric " This change to a prink from the panic has been changed in the current edac/blusmoke cvs tree.
----- Additional Comments From mannthey.com (prefers email at kmannth.com) 2007-06-06 21:26 EDT ------- Hmm I was able to dd /proc/vmcore without and issue for the kexec kernel context. I was booted to a shell after the panic and I did the following. root:/> dd if=/proc/vmcore of=/dev/null bs=512 16532583+1 records in 16532583+1 records out root:/> cat /proc/version Linux version 2.6.18-8.el5 (brewbuilder.redhat.com) I have tried a few other things but have not been able to recreated the EDAC panic. I am on an LS21. I tired moving the kexec kernel hole to 128M@16M but kexec fails to load the kernel. How can I recreate this issue?
------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-07 06:28 EDT ------- (In reply to comment #20) > > I tired moving the kexec kernel hole to 128M@16M but kexec fails to load the > kernel. How can I recreate this issue? Keith, To recreate, instead of dropping into a shell, enable copying of vmcore to a particular location, like you could uncomment the 'path /var/crash/' option in /etc/kdump.conf Moreover, we would want this option to be working, as in this case the user would not need to do anything extra to store the dump. I tried the above on a LS21.
changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|block |high ------- Additional Comments From dvhltc.com 2007-06-07 19:39 EDT ------- Dropping Severity to high as IBM has a workaround for internal use while it's being worked.
changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P1 |P2 ------- Additional Comments From dvhltc.com 2007-06-07 19:43 EDT ------- Dropping prio to P2 as we have a workaround for now.
Is the attached patch, changing the panic to a printk, the only thing needed here?
------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-20 07:46 EDT ------- (In reply to comment #29) > > ------- Additional Comments From streeter 2007-06-19 14:46 EST ------- > Is the attached patch, changing the panic to a printk, the only thing needed here? > I have tested it out and this seems to be fixing the issue for me. I have tried it on a LS20 and a LS21.
I don't understand this bug report at all. It appears to be a bug in something that is not in our RT kernel. What do you expect us to do about it?
----- Additional Comments From mannthey.com (prefers email at kmannth.com) 2007-06-20 18:39 EDT ------- I beleive this bug was filed to keep track of the issue more than anything else as we were without a kdump solution for RHEL5-RT. The fix was just to change the bug to a prink and we beleive 2.6.18-23.el5 has the patch we need. We have tested kdump and the EDAC error has been fixed (or masked if you will) so it should be safe to close this bug. There are still outstanding issues of why the error was raised in the first place but we have been a little scattered about understanding the root cause.
The patch you presented is utterly bogus. You've completely failed to do the neccessary trivial debugging to understand why it occurs and the results could be really messy in future if the bug is (as I suspect) that you are writing out MMIO mapped pages from the old kernel. You've not fixed a bug. The CPU is still reporting you did something terrible and undefined and unsafe. You've papered over it and prayed. That kind of horrible hack doesn't belong in an enterprise grade Linux product. Guy: Please ensure this horrible hack doesn't get into the kernel.
------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-21 08:01 EDT ------- The(In reply to comment #35) > > We have tested kdump and the EDAC error has been fixed (or masked if you will) > so it should be safe to close this bug. On most of the machines I tested, EDAC issue is still seen. So, the patch is required. But, yes we could close this particular bug, as the same is now being actively tracked in 33374 for inclusion of the patch into plain RHEL5. Keith, I have pinged Redhat to point me to the sources of -23 kernel so that I could verify if the patch is in, as this kernel fails to boot even as the first kernel on many machines.
----- Additional Comments From smaneesh.com (prefers email at maneesh.com) 2007-06-25 08:22 EDT ------- As per the latest email exchanges, it seems that RH is now working around this issue by using "reset_devices" flag. IIUC, the plan is to check this flag in edac driver while kdump boot and ignore the error. Meanwhile, I tried the edac driver code from upstream (http://sourceforge.net/projects/bluesmoke/), edac-2007-may-2 release on RHEL5-RT kernel, I could copy the kdump without any error message or panic. I am looking at the diffs but there seems to be large restructuring has been done. Currently trying to narrow down to minimum changes. Rt would be helpful if some edac expert can also look at these changes.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-25 08:34 EDT ------- Ok, I updated my findings in the wrong bug. So as was suggested, tried kdump with iommu=off and swiotlb=force parameters independently set and also together. But the issue persisted.
Upstream contains the erroneous panic/printk change
From the l/k list its now looking like the issue is not programs referencing the GART but the fact GART mappings are in use by existing drivers during the kexec/kdump and the kdump kernel then invalidates the mappings on them.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-25 08:34 EDT ------- Attaching edac debug messages obtained from the second kernel, before panic is triggered.
Created attachment 157749 [details] dmesg_kdump_kernel_edac_debug_on
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-25 08:35 EDT ------- edac_debug_messages_from_second_kernel
------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-25 08:39 EDT ------- (In reply to comment #38) > As per the latest email exchanges, it seems that RH is now working around this > issue by using "reset_devices" flag. IIUC, the plan is to check this flag in > edac driver while kdump boot and ignore the error. But this approach has not been ACKed by the community and is indeed a wrong usage of the the reset_devices flag. Infact, by converting the panic to printk, atleast we would know that went amiss, but by completely avoiding it we would miss that piece of information. Also, I would like to point out that the AMD documentation says that if the 25th bit of the Northbridge Status High register is 1, then the processor context _might_ be corrupted. So, it is possible that it is a false alarm. Investigating further to obtain some more information from the EDAC code.
changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P2 |P1 ------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-25 08:50 EDT ------- bumping prio as the code freeze is on 27 June.
------- Additional Comments From smaneesh.com (prefers email at maneesh.com) 2007-06-25 12:14 EDT ------- (In reply to comment #43) > ----- Additional Comments From alan 2007-06-25 08:39 EST ------- > Upstream contains the erroneous panic/printk change > > > -- I have verified the source code and also tested "edac-2007-may-2" release from soureforge link I mentioned and panic/printk change is _not_ yet there.
----- Additional Comments From mannthey.com (prefers email at kmannth.com) 2007-06-25 13:03 EDT ------- Maneesh, The panic/printk change was in cvs (last week). May 2 is too old. Please see the current cvs tree and the thread from comment #4.
------- Additional Comments From smaneesh.com (prefers email at maneesh.com) 2007-06-25 14:03 EDT ------- (In reply to comment #47) > Maneesh, > The panic/printk change was in cvs (last week). May 2 is too old. Please see > the current cvs tree and the thread from comment #4. Keith, agreed but May 2 release is more recent then RHEL5 level and does work well. I picked up edac-2007-may-2 release, as this is the last stable release for edac from sourceforge site.
----- Additional Comments From smaneesh.com (prefers email at maneesh.com) 2007-06-26 13:27 EDT ------- I could narrow it down to one patch, driver-edac-add-nmi.patch from edac-2007-may-2 release which made the difference. But close analysis of the code revealed that the error condition was bypassed there also. It has made main error checking loop as conditional, in edac_kernel_thread() routine. if(edac_assert_error_check_and_clear()) do_edac_check(); And the code for enabling the assert is missing for K8 chipset. So, the assert never gets fired. IOW, no point backporting upstream edac code. Ankita, any more information from your debugging?
------- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-26 14:15 EDT ------- (In reply to comment #49) > I could narrow it down to one patch, driver-edac-add-nmi.patch from > edac-2007-may-2 release which made the difference. But close analysis of the > code revealed that the error condition was bypassed there also. It has made main > error checking loop as conditional, in edac_kernel_thread() routine. > > if(edac_assert_error_check_and_clear()) > do_edac_check(); > > And the code for enabling the assert is missing for K8 chipset. So, the assert > never gets fired. IOW, no point backporting upstream edac code. > Yeah, so here also we are trying to bypass the checking! > Ankita, any more information from your debugging? > I tried to print the addresses that 'copy vmcore' file was trying to access. At the point the EDAC error showed up, the addresses were very much within the System RAM range.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-26 14:28 EDT ------- Came across a few relevant commandline options which I need to try next.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-27 07:14 EDT ------- Since the panic shows up only at the time of reading the vmcore file, trying to get more information on the addresses accessed. Just by printing the address might miss the offending one. Using other aids to do so.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-27 09:41 EDT ------- Ok so here is some debug data that I managed to collect. At the time of copying the vmcore file, as each page is accessed, I call do_edac_check routine to perform status check and also save the value of the page addr into a global variable. When the panic situation is reported, print the global addr value. Also pasted is the output of /proc/iomem & /proc/meminfo from the first kernel. [root@llm38 ~]# cat ~ankita/latest_iomem 00000000-0009d3ff : System RAM 0009d400-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c8fff : Video ROM 000c9000-000ca5ff : Adapter ROM 000f0000-000fffff : System ROM 00100000-edfcddbf : System RAM 00200000-0045a997 : Kernel code 0045a998-0059052f : Kernel data 01000000-08ffffff : Crash kernel edfcddc0-edfcffff : ACPI Tables edfd0000-edffffff : reserved ee000000-efffffff : PCI Bus #02 effa0000-effbffff : 0000:02:02.0 effc0000-effdffff : 0000:02:02.0 effe0000-effeffff : 0000:02:01.1 effe0000-effeffff : tg3 efff0000-efffffff : 0000:02:01.0 efff0000-efffffff : tg3 f0000000-fcffffff : PCI Bus #01 f0000000-f7ffffff : 0000:01:04.0 f8000000-f801ffff : 0000:01:04.0 fd000000-feafffff : PCI Bus #01 feae0000-feaeffff : 0000:01:04.0 feafe000-feafefff : 0000:01:00.1 feafe000-feafefff : ohci_hcd feaff000-feafffff : 0000:01:00.0 feaff000-feafffff : ohci_hcd feb00000-febfffff : PCI Bus #02 feb00000-febfffff : 0000:02:02.0 fec00000-ffffffff : reserved 100000000-151ffffff : System RAM [root@llm38 ~]# cat ~ankita/latest_meminfo MemTotal: 4950964 kB MemFree: 4693984 kB Buffers: 15168 kB Cached: 169424 kB SwapCached: 0 kB Active: 62376 kB Inactive: 157244 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 4950964 kB LowFree: 4693984 kB SwapTotal: 2040244 kB SwapFree: 2040244 kB Dirty: 168 kB Writeback: 0 kB AnonPages: 35008 kB Mapped: 9120 kB Slab: 17104 kB PageTables: 3516 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 4515724 kB Committed_AS: 76636 kB VmallocTotal: 34359738367 kB VmallocUsed: 1360 kB VmallocChunk: 34359736783 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Data regarding the page addresses is attached. Inidicates 32 contiguous page accesses.
Created attachment 158007 [details] llm38_edac_addr_printk_2.log
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-27 09:44 EDT ------- addresses of vmcore page access that result in edac error The format is: NorthBridge ERROR: mci(0xffff810008bf4000) node(1) ErrAddr(0x00000000-37c90008) nbsh(0xa6000002) nbsl(0x0005001b) EDAC k8 MC1: GART TLB errorr: transaction type(generic), cache level(generic) EDAC k8 MC1: extended error code: GART error MC1: processor context corruptthe addr value is : 201337056 ^^^^^^^^^ is the page address.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-06-28 01:08 EDT ------- From the log messages, the addresses being accessed while reading vmcore are within the System RAM areas from the first kernel.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-07-02 01:31 EDT ------- Here is the thread that was started on the EDAC mailing list to get some information on the possible causes for the processor context corruption in t=kdump context. --- Ankita Garg <ankita.com> wrote: > On Wed, Jun 27, 2007 at 10:39:40AM -0700, Doug Thompson wrote: > > Hi, > > > Yes, we had a conversation this issue. > > The panic for a MCC error has been removed/changed to a warning instead. > > > > Thanks Doug for your response. Yes, I am aware that the panic call has been changed to > a warning message in the k8 module when checking for the 25th bit. ok, great But at this > time we want to debug this further from kdump perspective, to try and > see if it is really doing something that it should not, for e.g, trying > to map pages non-RAM pages (IOMMU pages, etc). For this, it would help > if someone could point out some example scenarios that might lead to the > hardware setting the 25th bit of the Northbridge Satus High register, > indicating the context corruption. like other "bits", there is NOT much doc on its semantics, as you probably know. While at Linux Networx, we would get some PCC events during system burnin, but it was infrequent and we had no known mechanism to trigger this event. Are you at OLS this week? I attended Vivek's paper presentation this morning, friday If you are or one of your guys, we can meet. Yet Eric B knows just as much as well, but we can discuss options doug t > > This could provide us a good starting point. > > > With Kdump, we think the changing of the kernels (via kexec) sets/alters/mods the bit. > > so we concluded to remove that panic check from the k8 module. > > > > If you are building from source, go and comment out that panic call when the bit 25 is > checked. > > > > doug t > > > > > > --- Ankita Garg <ankita.com> wrote: > > > > > Hi all, > > > > > > When trying to copy the vmcore file while in the kdump kernel, I hit the > > > following panic: > > > > > > NorthBridge ERROR: mci(0xffff8100086cf000) node(0) > > > ErrAddr(0x00000000-37f00030) nbsh(0xa6000001) nbsl(0x0005001b) > > > EDAC k8 MC0: GART TLB errorr: transaction type(generic), cache > > > level(generic) > > > EDAC k8 MC0: extended error code: GART error > > > Kernel panic - not syncing: MC0: processor context corrupt > > > > > > AMD documentation mentions that the 25th bit of the Northbridge Status High Register > > > indicates a probable processor context corruption. > > > > > > Could someone please provide some information on the possible reasons of > > > processor context getting corrupt ? in general or in the kdump scenario? > > > > -- > Regards, > Ankita Garg (ankita.com) > Linux Technology Center > IBM India Systems & Technology Labs, > Bangalore, India >
The patch in bug #237950 comment #38 appears to be what we will be putting into RHEL5.1. This is also being worked upstream (primarily by IBM), so a better approach may come from that. Currently this is closed/notabug, which is wrong, so I am reopening.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-07-03 14:09 EDT ------- Trying the approach of shutting down GART when doing kexec (patch for this has been posted on LKML: http://lkml.org/lkml/2007/6/25/242). For kdump kernel, the shutdown action needs to be performed in the second kernel. Also, the second kernel uses swiotlb to use software iommu.
----- Additional Comments From ankigarg.com (prefers email at ankita.com) 2007-07-31 10:02 EDT ------- The fix is in RHEL5 U1 kernel. Verified that the kdump kernel no longer pacnis on our hardware.
------- Comment From ankigarg.com 2007-08-20 08:00 EDT------- Will verify with an LS21 and confirm. After which we could close this bug.
Verified that the patch in bug #237950 comment #38 is in RHEL5.1 kernel. But the error still persists as this patch requires the kdump kernel to be passed the 'reset_devices=1' parameter. Found that RHEL5.1 kexec-tools rpm does not pass this parameter in /etc/sysconfig/kdump file. This needs to be fixed before we can close this bug. Sample patch attached.
------- Comment From ankigarg.com 2007-10-05 01:04 EDT------- Updated in RH bugzilla: Comment #51 From Ankita Garg (ankita.com) on 2007-10-05 01:03 EST [reply]
Created attachment 217061 [details] Pass reset_devices=1 parameter to kdump kernel
Is there a reason to universally set reset_devices, or should it simply be documented as necessary for some controllers?
------- Comment From ankigarg.com 2007-10-08 07:17 EDT------- It has been agreed in mainline to use reset_devices flag in the kdump kernel for signaling the various devices of the context and to reset accordingly. Currently, aacraid driver is using this flag (in current mainline) and more drivers are expected to use it in future. Besides, for EDAC driver, we need this flag being passed to the kdump kernel.
I believe Neil Horman will be adding the reset_devices argument to the command line argument in /etc/sysconfig/kdump by default, but I note that it doesn't appear to be there in the version to be released with the RHEL5.1 kexec-tools user-package errata. Perhaps it's queued for RHEL5.2? But I've added Neil to the cc: list for his take on the matter.
I do have it queued for 5.2, yes. If you need to do it in the interim, you can use the KDUMP_COMMANDLINE_APPEND variable in /etc/sysconfig/kdump to get it in place.
------- Comment From ankigarg.com 2007-10-22 04:12 EDT------- Neil, On our LS20/LS21, we are currently editing /etc/sysconfig/kdump file to pass the flag. But wouldnt we need to document that this flag needs to be passed on x86_64 systems? Thanks, Ankita
This has been added to the online Release Notes and kdump/kexec HowTo.
------- Comment From ankigarg.com 2008-02-11 06:06 EDT------- Sripathi, am not yet sure if we can close this bug yet. Inorder to resolve this issue, besides the kernel patch, our build system was modified to pass certain kernel parameters to the kdump kernel. Will need to confirm if we do so from the RHEL5RT setup.
------- Comment From ankigarg.com 2008-02-11 06:56 EDT------- From comment #69, looks like it has been documented in the release notes of kexec/kdump. So the user will need to manually edit the /etc/sysconfig/kdump file to pass the reset_devices parameter to the kdump kernel. Only other thing to verify is the try kdump on the latest RHEL5RT src and verify kdump is working fine on the LS21. Will test and confirm.