Bug 754907
Summary: | 2.6.41.1-1.fc15.x86_64: cciss module crash | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan ONDREJ <ondrejj> | ||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 15 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, sgruszka, steve.cameron, thenzl, xiaoli | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | kernel-2.6.41.4-1.fc15 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-12-10 19:51:25 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
What controller and what server is this? What kernel are you upgrading from? And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. -- steve (In reply to comment #1) > What controller and what server is this? [root@ftp ~]# cciss_vol_status /dev/cciss/c0d0 /dev/cciss/c0d0: (Smart Array 641) RAID 5 Volume 0 status: OK. /dev/cciss/c0d0: (Smart Array 641) Enclosure PROLIANT 6L6I (S/N: ) on Bus 0, Physical Port J1 status: OK. product: ProLiant ML350 G4 > What kernel are you upgrading from? > > And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. Last working kernel: 2.6.40.6-0.fc15.x86_64 First bad kernel (no newer fedora kernel yet): kernel-2.6.41.1-1.fc15.x86_64 (In reply to comment #2) > > What kernel are you upgrading from? > > > > And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. > > Last working kernel: 2.6.40.6-0.fc15.x86_64 > First bad kernel (no newer fedora kernel yet): kernel-2.6.41.1-1.fc15.x86_64 2.6.40.6 is 3.0.6 renamed to avoid breaking F15 userspace that wasn't ready for the 3.0 change. Similarly 2.6.41.1 is 3.1.1. FYI. This is the entirety of the difference between the cciss drivers in 3.0.6 and 3.1.1 from kernel.org: [scameron@localhost fedora-bug]$ for x in linux-3.0.6/drivers/block/cciss*[ch]; do f=`basename $x`; echo ==== $f ====; diff -u linux-3.0.6/drivers/block/$f linux-3.1.1/drivers/block/$f; done ==== cciss.c ==== --- linux-3.0.6/drivers/block/cciss.c 2011-10-03 15:25:23.000000000 -0500 +++ linux-3.1.1/drivers/block/cciss.c 2011-11-11 14:19:27.000000000 -0600 @@ -4533,6 +4533,13 @@ pmcsr &= ~PCI_PM_CTRL_STATE_MASK; pmcsr |= PCI_D0; pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr); + + /* + * The P600 requires a small delay when changing states. + * Otherwise we may think the board did not reset and we bail. + * This for kdump only and is particular to the P600. + */ + msleep(500); } return 0; } ==== cciss_cmd.h ==== ==== cciss.h ==== ==== cciss_scsi.c ==== --- linux-3.0.6/drivers/block/cciss_scsi.c 2011-10-03 15:25:23.000000000 -0500 +++ linux-3.1.1/drivers/block/cciss_scsi.c 2011-11-11 14:19:27.000000000 -0600 @@ -33,7 +33,7 @@ #include <linux/slab.h> #include <linux/string.h> -#include <asm/atomic.h> +#include <linux/atomic.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_device.h> ==== cciss_scsi.h ==== [scameron@localhost fedora-bug]$ I think whatever broke must reside outside the driver. -- steve git-bisect would probably pin it down. Seems just irq routing was changed, so now cciss share interrupt with other device. Since cciss request_irq without IRQF_SHARED flags, request fail. Is there any reason why cciss can not share interrupts? No. Most smart arrays use MSI or MSIX these days, so... wouldn't be shared, right? (I don't need IRQF_SHARED when using MSI/MSIX, correct? But for non-MSI/MSIX, then I do need IRQF_SHARED, correct?) This appears to be the commit which removed IRQF_SHARED http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2b39087c900bdb240b50ac95ee9da00d844565 That was more than a year ago, 2010-08-07... if that's it, I'm surprised nobody has complained before. Maybe something like this? Author: Stephen M. Cameron <scameron.hp.com> Date: Wed Nov 23 10:16:34 2011 -0600 cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 486f94e..942ccf8 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -4884,7 +4884,7 @@ static int cciss_request_irq(ctlr_info_t *h, } if (!request_irq(h->intr[h->intr_mode], intxhandler, - IRQF_DISABLED, h->devname, h)) + IRQF_DISABLED | IRQF_SHARED, h->devname, h)) return 0; dev_err(&h->pdev->dev, "Unable to get irq %d for %s\n", h->intr[h->intr_mode], h->devname); If that is correct, probably need similar for hpsa (though I think all boards officially supported by hpsa use MSI, but the "hpsa_allow_any=1" kernel option may expose older boards.) -- steve Jan, can you apply patch from comment 7 and test? Let me know, if you are not familiar with kernel compilation, I will lunch kernel build with patch in http://koji.fedoraproject.org/koji/ . (In reply to comment #9) > Jan, can you apply patch from comment 7 and test? Let me know, if you are not > familiar with kernel compilation, I will lunch kernel build with patch in > http://koji.fedoraproject.org/koji/ . Hello. I have no time to build a kernel now, but if you can build me a new build in koji, no problem to test it. Ok, here is the kernel with patch: http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034 (In reply to comment #11) > Ok, here is the kernel with patch: > http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034 Works well. All disks are present. Created attachment 535864 [details]
cciss.patch
This is exact patch I used in the test kernel. Josh please apply it. Stephen please post it :-) Also note that you can get rid of IRQF_DISABLED, according to include/linux/interrupt.h it is noop and deprecated.
(In reply to comment #13) > Created attachment 535864 [details] > cciss.patch > > This is exact patch I used in the test kernel. Josh please apply it. Stephen > please post it :-) Also note that you can get rid of IRQF_DISABLED, according > to include/linux/interrupt.h it is noop and deprecated. So I also have IRQF_DISABLED in the msix path as the only flag. Should I just use 0 for the flags there? Should I add in IRQF_SAMPLE_RANDOM? I seem to remember that used to be in there at one time as well. -- steve (In reply to comment #14) > (In reply to comment #13) > > Created attachment 535864 [details] > > cciss.patch > > > > This is exact patch I used in the test kernel. Josh please apply it. Stephen > > please post it :-) Also note that you can get rid of IRQF_DISABLED, according > > to include/linux/interrupt.h it is noop and deprecated. > > So I also have IRQF_DISABLED in the msix path as the only flag. Should I just > use 0 for the flags there? Should I add in IRQF_SAMPLE_RANDOM? I seem to > remember that used to be in there at one time as well. > > -- steve Well, digging around, I see there are plenty of uses of request_irq with flags passed as 0, and no scsi drivers use IRQF_SAMPLE_RANDOM and only one block driver, so I guess I shouldn't use IRQF_SAMPLE_RANDOM. Zero should be fine. As long as device do not generate interrupts in truly random maner. Created attachment 537541 [details]
Patch to add IRQF_SHARED flag to hpsa for non-msi interrupt handler
Here is the patch I sent to the lkml for hpsa to add IRQF_SHARED to non msi(x) interrupt request.
Created attachment 537543 [details]
Patch to add IRQF_SHARED flag to cciss for non msi(x) interrupts
Here is the patch I sent to lkml for cciss to add IRQF_SHARED to the non msi(x) interrupt request.
Patches added to F15 and F16, will be in the next update. kernel-2.6.41.4-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.41.4-1.fc15 Package kernel-2.6.41.4-1.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.41.4-1.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-16621/kernel-2.6.41.4-1.fc15 then log in and leave karma (feedback). kernel-2.6.41.4-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 534351 [details] Full dmesg Description of problem: After upgrade and reboot doesn't work. Version-Release number of selected component (if applicable): kernel-2.6.41.1-1.fc15.x86_64 How reproducible: always Actual results: [ 1093.978987] HP CISS Driver (v 3.6.26) [ 1094.003078] IRQ handler type mismatch for IRQ 16 [ 1094.003228] current handler: uhci_hcd:usb2 [ 1094.003366] Pid: 2921, comm: modprobe Not tainted 2.6.41.1-1.fc15.x86_64 #1 [ 1094.003509] Call Trace: [ 1094.003659] [<ffffffff810b1d76>] __setup_irq+0x39e/0x432 [ 1094.003805] [<ffffffff8111971c>] ? kmem_cache_alloc_trace+0xb3/0xc5 [ 1094.003956] [<ffffffffa01d5899>] ? process_indexed_cmd+0xa6/0xa6 [cciss] [ 1094.004001] [<ffffffff810b1ef4>] request_threaded_irq+0xea/0x116 [ 1094.004001] [<ffffffffa01d7bca>] cciss_request_irq+0x66/0x98 [cciss] [ 1094.004001] [<ffffffffa01d6ddb>] cciss_init_one+0x1123/0x1a2f [cciss] [ 1094.004001] [<ffffffff8111a707>] ? kmem_cache_alloc+0x31/0xf8 [ 1094.004001] [<ffffffff811843a4>] ? sysfs_find_dirent+0x3c/0x55 [ 1094.004001] [<ffffffff81085d77>] ? arch_local_irq_save+0x15/0x1b [ 1094.004001] [<ffffffff81262487>] local_pci_probe+0x44/0x75 [ 1094.004001] [<ffffffff81262fea>] pci_device_probe+0xd0/0xff [ 1094.004001] [<ffffffff81301017>] driver_probe_device+0x131/0x213 [ 1094.004001] [<ffffffff81301153>] __driver_attach+0x5a/0x7e [ 1094.004001] [<ffffffff813010f9>] ? driver_probe_device+0x213/0x213 [ 1094.004001] [<ffffffff8130009f>] bus_for_each_dev+0x53/0x89 [ 1094.004001] [<ffffffff81300bf6>] driver_attach+0x1e/0x20 [ 1094.004001] [<ffffffff8130081a>] bus_add_driver+0xd1/0x224 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffff813015f7>] driver_register+0x98/0x105 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffff812638ad>] __pci_register_driver+0x56/0xc1 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffffa016e07d>] cciss_init+0x7d/0xa1 [cciss] [ 1094.004001] [<ffffffff81002099>] do_one_initcall+0x7f/0x136 [ 1094.004001] [<ffffffff8108a59d>] sys_init_module+0x88/0x1d0 [ 1094.004001] [<ffffffff814a3102>] system_call_fastpath+0x16/0x1b [ 1094.011026] cciss 0000:09:02.0: Unable to get irq 16 for cciss0 [ 1094.012661] cciss: probe of 0000:09:02.0 failed with error -1 Additional info: I can make some tests on this machine, if required.