Bug 458013
Summary: | Kdump on Dom0 Failed with CCISS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Qian Cai <qcai> |
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.2 | CC: | coldwell, coughlan, dchapman, mike.miller, thenzl |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-09-23 10:42:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Qian Cai
2008-08-06 05:43:26 UTC
I wonder if this has something to do with bz 230717. Cai, can you try this with kernel 2.6.18-90.el5? That should not have the patch for 230717 in place. Thanks! Created attachment 313610 [details]
Kdump on Dom0 worked correctly with .88.el5 Kernel
From previous testing logs, I found out that it worked fine with .88.el5 Kernel before.
Tomas, given that you were the Red Hat technical contact on bz 230717, can you shed any light on whats going on here? (In reply to comment #3) > Tomas, given that you were the Red Hat technical contact on bz 230717, can you > shed any light on whats going on here? Don't where the difference could be. I'm going to set up my test box with a xen kernel to see if I'm able to verify it and then add some other debug messages. Thank you, let me know what you find. On my local machine which is not MSI-X capable it works, but it is unreliable - I noticed once that the vmcore was not created even if the kdump kernel booted. I'll continue tomorrow. thanks, let us know what you find. Neil, the message "ciss: resetting MSI-X" is not showed because the test (control & PCI_MSIX_FLAGS_ENABLE) in the xen kernel is zero. (See code below). This could that be caused by the fact that the xen kernel/hypervisor is not using MSI-X even if it is possible ? On this box, almost every test passed, only one time the vmcore was created with name "vmcore-incomplete" (with right size). I'm using kernel 2.6.18-92.1.10.el5xen, kdump kernel is 2.6.18-92.1.10.el5, kexec-tools 1.102pre. pos = pci_find_capability(pdev, PCI_CAP_ID_MSIX); if (pos) { pci_read_config_word(pdev, msi_control_reg(pos), &control); if (control & PCI_MSIX_FLAGS_ENABLE) { printk(KERN_INFO "cciss: resetting MSI-X\n"); I honestly don't know. If you look earlier in the logs, assign_interrupt_mode seems to indicate that we can use MSI features in the dom0 kernel. As to why cciss isn't detecting MSI capabilities in the production xen kernel, I'm not sure, but should that really matter? if we reboot into a non xen kernel and all of a sudden the cciss driver detects that it can use MSI, it should be able to reset it without oopsing or deadlocking, shouldn't it? Or am I missing something? Also, and thank you for pointing this out, but cai, why were you using the kdump kernel in this system? IIRC with 5.2 you should be able to use the xen kernel itself during kdump operations. Tomas, does the problem persist if you use the xen kernel to boot kdump? (In reply to comment #9) > I honestly don't know. If you look earlier in the logs, assign_interrupt_mode > seems to indicate that we can use MSI features in the dom0 kernel. As to why > cciss isn't detecting MSI capabilities in the production xen kernel, I'm not > sure, but should that really matter? if we reboot into a non xen kernel and > all of a sudden the cciss driver detects that it can use MSI, it should be able > to reset it without oopsing or deadlocking, shouldn't it? Or am I missing > something? Sure it should. > > Also, and thank you for pointing this out, but cai, why were you using the > kdump kernel in this system? IIRC with 5.2 you should be able to use the xen > kernel itself during kdump operations. Tomas, does the problem persist if you > use the xen kernel to boot kdump? I don't know if it is possible to use xen kernel for this. On a box with only a xen kernel the creation of the initial kdump image fails,so I was forced to install a non xen kernel. Cai, I tried this today on the hp-dl360g5-01 - [root@hp-dl360g5-01 ~]# uname -a Linux hp-dl360g5-01.rhts.bos.redhat.com 2.6.18-92.el5xen #1 SMP Tue Apr 29 13:31:30 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux three times (echo c > /proc/sysrq-trigger) and every time the vmcore was created without an deadlock. This means that I'm probably doing something differently than you, please tell me what should I do to see the problem. The box is reserved now, feel free to use it (I'm leaving now). > I honestly don't know. If you look earlier in the logs, assign_interrupt_mode
> seems to indicate that we can use MSI features in the dom0 kernel. As to why
> cciss isn't detecting MSI capabilities in the production xen kernel, I'm not
It's not that cciss doesn't detect the MSI/X capabilities. The problem is the vector is already allocated. The code to reset the MSI/X stuff is not getting called.
I have tried on two machines, hp-dl360g5-01.rhts.bos.redhat.com hp-dl360g5-02.rhts.bos.redhat.com It looks like Kdump on Dom0 works occasionally. However, if I ran the following cron job, @reboot echo a >>/root/log; sleep 120; rm -rf /var/crash/*; sync; echo c >/proc/sysrq-trigger I was able to see problems not before long. Lots of strange failures. Please see the following attachments. Also, I have ran the cron job for Kdump on bare metal, but it worked fine without seen any problem. I have the machine hp-dl360g5-02.rhts.bos.redhat.com reserved, feel free to grab it. Created attachment 314083 [details]
Kdump on Dom0 failed with exe SIGSEGV.
Created attachment 314085 [details]
Kdump on Dom0 failed with init SIGSEGV.
Created attachment 314086 [details]
Kdump on Dom0 failed with dropping to rootfs
Created attachment 314088 [details]
Kdump on Dom0 failed with capture Kernel panic.
Hi, I tried out some changes in reset code during the drivers initialization - without success, sometimes I got 20 successful kdumps , but then it fails again at random places. Today rhts is not working for me, so I'll continue next week. Thanks Tomas, let us know what you find out. Neil, after some problems with previous kernels, which weren't working (you know this vmcore zero issue), I'm now testing the kernel-2.6.18-115.el5.src.rpm on my box, it's working for several days flawlesly (continuos kdumping). On the rhts hp-dl360g5-01.rhts.bos.redhat.com which was more vulnerable to the problem I stopped the tests now by 892 successful kdumps. I tested also upstream kernel 2.6.26 to find an inspiration but it failed after 514 and then after 50 tests. So at the moment is in this area our kernel better then upstream kernel. Ok, it sounds like the cciss maintainer has some work to do upstream then. Where does that leave us with this bug. Shall we close it? Please close it. Thanks! copy that. Thanks Cai! |