|Summary:||22.214.171.124-1.fc15.x86_64: cciss module crash|
|Product:||[Fedora] Fedora||Reporter:||Jan ONDREJ <ondrejj>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED ERRATA||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||15||CC:||gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, sgruszka, steve.cameron, thenzl, xiaoli|
|Fixed In Version:||kernel-126.96.36.199-1.fc15||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2011-12-10 19:51:25 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Jan ONDREJ 2011-11-18 07:26:22 UTC
Created attachment 534351 [details] Full dmesg Description of problem: After upgrade and reboot doesn't work. Version-Release number of selected component (if applicable): kernel-188.8.131.52-1.fc15.x86_64 How reproducible: always Actual results: [ 1093.978987] HP CISS Driver (v 3.6.26) [ 1094.003078] IRQ handler type mismatch for IRQ 16 [ 1094.003228] current handler: uhci_hcd:usb2 [ 1094.003366] Pid: 2921, comm: modprobe Not tainted 184.108.40.206-1.fc15.x86_64 #1 [ 1094.003509] Call Trace: [ 1094.003659] [<ffffffff810b1d76>] __setup_irq+0x39e/0x432 [ 1094.003805] [<ffffffff8111971c>] ? kmem_cache_alloc_trace+0xb3/0xc5 [ 1094.003956] [<ffffffffa01d5899>] ? process_indexed_cmd+0xa6/0xa6 [cciss] [ 1094.004001] [<ffffffff810b1ef4>] request_threaded_irq+0xea/0x116 [ 1094.004001] [<ffffffffa01d7bca>] cciss_request_irq+0x66/0x98 [cciss] [ 1094.004001] [<ffffffffa01d6ddb>] cciss_init_one+0x1123/0x1a2f [cciss] [ 1094.004001] [<ffffffff8111a707>] ? kmem_cache_alloc+0x31/0xf8 [ 1094.004001] [<ffffffff811843a4>] ? sysfs_find_dirent+0x3c/0x55 [ 1094.004001] [<ffffffff81085d77>] ? arch_local_irq_save+0x15/0x1b [ 1094.004001] [<ffffffff81262487>] local_pci_probe+0x44/0x75 [ 1094.004001] [<ffffffff81262fea>] pci_device_probe+0xd0/0xff [ 1094.004001] [<ffffffff81301017>] driver_probe_device+0x131/0x213 [ 1094.004001] [<ffffffff81301153>] __driver_attach+0x5a/0x7e [ 1094.004001] [<ffffffff813010f9>] ? driver_probe_device+0x213/0x213 [ 1094.004001] [<ffffffff8130009f>] bus_for_each_dev+0x53/0x89 [ 1094.004001] [<ffffffff81300bf6>] driver_attach+0x1e/0x20 [ 1094.004001] [<ffffffff8130081a>] bus_add_driver+0xd1/0x224 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffff813015f7>] driver_register+0x98/0x105 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffff812638ad>] __pci_register_driver+0x56/0xc1 [ 1094.004001] [<ffffffffa016e000>] ? 0xffffffffa016dfff [ 1094.004001] [<ffffffffa016e07d>] cciss_init+0x7d/0xa1 [cciss] [ 1094.004001] [<ffffffff81002099>] do_one_initcall+0x7f/0x136 [ 1094.004001] [<ffffffff8108a59d>] sys_init_module+0x88/0x1d0 [ 1094.004001] [<ffffffff814a3102>] system_call_fastpath+0x16/0x1b [ 1094.011026] cciss 0000:09:02.0: Unable to get irq 16 for cciss0 [ 1094.012661] cciss: probe of 0000:09:02.0 failed with error -1 Additional info: I can make some tests on this machine, if required.
Comment 1 Stephen Cameron 2011-11-22 19:01:05 UTC
What controller and what server is this? What kernel are you upgrading from? And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. -- steve
Comment 2 Jan ONDREJ 2011-11-22 19:12:52 UTC
(In reply to comment #1) > What controller and what server is this? [root@ftp ~]# cciss_vol_status /dev/cciss/c0d0 /dev/cciss/c0d0: (Smart Array 641) RAID 5 Volume 0 status: OK. /dev/cciss/c0d0: (Smart Array 641) Enclosure PROLIANT 6L6I (S/N: ) on Bus 0, Physical Port J1 status: OK. product: ProLiant ML350 G4 > What kernel are you upgrading from? > > And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. Last working kernel: 220.127.116.11-0.fc15.x86_64 First bad kernel (no newer fedora kernel yet): kernel-18.104.22.168-1.fc15.x86_64
Comment 3 Josh Boyer 2011-11-22 19:25:22 UTC
(In reply to comment #2) > > What kernel are you upgrading from? > > > > And 2.6.41? I thought they stopped at 2.6.38 and then 3.0. > > Last working kernel: 22.214.171.124-0.fc15.x86_64 > First bad kernel (no newer fedora kernel yet): kernel-126.96.36.199-1.fc15.x86_64 188.8.131.52 is 3.0.6 renamed to avoid breaking F15 userspace that wasn't ready for the 3.0 change. Similarly 184.108.40.206 is 3.1.1. FYI.
Comment 4 Stephen Cameron 2011-11-22 19:40:08 UTC
This is the entirety of the difference between the cciss drivers in 3.0.6 and 3.1.1 from kernel.org: [scameron@localhost fedora-bug]$ for x in linux-3.0.6/drivers/block/cciss*[ch]; do f=`basename $x`; echo ==== $f ====; diff -u linux-3.0.6/drivers/block/$f linux-3.1.1/drivers/block/$f; done ==== cciss.c ==== --- linux-3.0.6/drivers/block/cciss.c 2011-10-03 15:25:23.000000000 -0500 +++ linux-3.1.1/drivers/block/cciss.c 2011-11-11 14:19:27.000000000 -0600 @@ -4533,6 +4533,13 @@ pmcsr &= ~PCI_PM_CTRL_STATE_MASK; pmcsr |= PCI_D0; pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr); + + /* + * The P600 requires a small delay when changing states. + * Otherwise we may think the board did not reset and we bail. + * This for kdump only and is particular to the P600. + */ + msleep(500); } return 0; } ==== cciss_cmd.h ==== ==== cciss.h ==== ==== cciss_scsi.c ==== --- linux-3.0.6/drivers/block/cciss_scsi.c 2011-10-03 15:25:23.000000000 -0500 +++ linux-3.1.1/drivers/block/cciss_scsi.c 2011-11-11 14:19:27.000000000 -0600 @@ -33,7 +33,7 @@ #include <linux/slab.h> #include <linux/string.h> -#include <asm/atomic.h> +#include <linux/atomic.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_device.h> ==== cciss_scsi.h ==== [scameron@localhost fedora-bug]$ I think whatever broke must reside outside the driver. -- steve
Comment 5 Stephen Cameron 2011-11-22 19:41:36 UTC
git-bisect would probably pin it down.
Comment 6 Stanislaw Gruszka 2011-11-23 14:58:32 UTC
Seems just irq routing was changed, so now cciss share interrupt with other device. Since cciss request_irq without IRQF_SHARED flags, request fail. Is there any reason why cciss can not share interrupts?
Comment 7 Stephen Cameron 2011-11-23 16:16:46 UTC
No. Most smart arrays use MSI or MSIX these days, so... wouldn't be shared, right? (I don't need IRQF_SHARED when using MSI/MSIX, correct? But for non-MSI/MSIX, then I do need IRQF_SHARED, correct?) This appears to be the commit which removed IRQF_SHARED http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2b39087c900bdb240b50ac95ee9da00d844565 That was more than a year ago, 2010-08-07... if that's it, I'm surprised nobody has complained before. Maybe something like this? Author: Stephen M. Cameron <firstname.lastname@example.org> Date: Wed Nov 23 10:16:34 2011 -0600 cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c index 486f94e..942ccf8 100644 --- a/drivers/block/cciss.c +++ b/drivers/block/cciss.c @@ -4884,7 +4884,7 @@ static int cciss_request_irq(ctlr_info_t *h, } if (!request_irq(h->intr[h->intr_mode], intxhandler, - IRQF_DISABLED, h->devname, h)) + IRQF_DISABLED | IRQF_SHARED, h->devname, h)) return 0; dev_err(&h->pdev->dev, "Unable to get irq %d for %s\n", h->intr[h->intr_mode], h->devname);
Comment 8 Stephen Cameron 2011-11-23 16:19:41 UTC
If that is correct, probably need similar for hpsa (though I think all boards officially supported by hpsa use MSI, but the "hpsa_allow_any=1" kernel option may expose older boards.) -- steve
Comment 9 Stanislaw Gruszka 2011-11-24 07:23:34 UTC
Jan, can you apply patch from comment 7 and test? Let me know, if you are not familiar with kernel compilation, I will lunch kernel build with patch in http://koji.fedoraproject.org/koji/ .
Comment 10 Jan ONDREJ 2011-11-24 07:26:51 UTC
(In reply to comment #9) > Jan, can you apply patch from comment 7 and test? Let me know, if you are not > familiar with kernel compilation, I will lunch kernel build with patch in > http://koji.fedoraproject.org/koji/ . Hello. I have no time to build a kernel now, but if you can build me a new build in koji, no problem to test it.
Comment 11 Stanislaw Gruszka 2011-11-24 11:04:16 UTC
Ok, here is the kernel with patch: http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034
Comment 12 Jan ONDREJ 2011-11-24 11:12:33 UTC
(In reply to comment #11) > Ok, here is the kernel with patch: > http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034 Works well. All disks are present.
Comment 13 Stanislaw Gruszka 2011-11-24 11:40:05 UTC
Created attachment 535864 [details] cciss.patch This is exact patch I used in the test kernel. Josh please apply it. Stephen please post it :-) Also note that you can get rid of IRQF_DISABLED, according to include/linux/interrupt.h it is noop and deprecated.
Comment 14 Stephen Cameron 2011-11-28 15:03:14 UTC
(In reply to comment #13) > Created attachment 535864 [details] > cciss.patch > > This is exact patch I used in the test kernel. Josh please apply it. Stephen > please post it :-) Also note that you can get rid of IRQF_DISABLED, according > to include/linux/interrupt.h it is noop and deprecated. So I also have IRQF_DISABLED in the msix path as the only flag. Should I just use 0 for the flags there? Should I add in IRQF_SAMPLE_RANDOM? I seem to remember that used to be in there at one time as well. -- steve
Comment 15 Stephen Cameron 2011-11-28 15:07:19 UTC
(In reply to comment #14) > (In reply to comment #13) > > Created attachment 535864 [details] > > cciss.patch > > > > This is exact patch I used in the test kernel. Josh please apply it. Stephen > > please post it :-) Also note that you can get rid of IRQF_DISABLED, according > > to include/linux/interrupt.h it is noop and deprecated. > > So I also have IRQF_DISABLED in the msix path as the only flag. Should I just > use 0 for the flags there? Should I add in IRQF_SAMPLE_RANDOM? I seem to > remember that used to be in there at one time as well. > > -- steve Well, digging around, I see there are plenty of uses of request_irq with flags passed as 0, and no scsi drivers use IRQF_SAMPLE_RANDOM and only one block driver, so I guess I shouldn't use IRQF_SAMPLE_RANDOM.
Comment 16 Stanislaw Gruszka 2011-11-28 15:42:55 UTC
Zero should be fine. As long as device do not generate interrupts in truly random maner.
Comment 17 Stephen Cameron 2011-11-28 17:11:41 UTC
Created attachment 537541 [details] Patch to add IRQF_SHARED flag to hpsa for non-msi interrupt handler Here is the patch I sent to the lkml for hpsa to add IRQF_SHARED to non msi(x) interrupt request.
Comment 18 Stephen Cameron 2011-11-28 17:12:51 UTC
Created attachment 537543 [details] Patch to add IRQF_SHARED flag to cciss for non msi(x) interrupts Here is the patch I sent to lkml for cciss to add IRQF_SHARED to the non msi(x) interrupt request.
Comment 19 Chuck Ebbert 2011-11-28 21:39:12 UTC
Patches added to F15 and F16, will be in the next update.
Comment 20 Fedora Update System 2011-11-29 13:51:50 UTC
kernel-220.127.116.11-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-18.104.22.168-1.fc15
Comment 21 Fedora Update System 2011-11-30 02:03:19 UTC
Package kernel-22.214.171.124-1.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-126.96.36.199-1.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-16621/kernel-188.8.131.52-1.fc15 then log in and leave karma (feedback).
Comment 22 Fedora Update System 2011-12-10 19:51:25 UTC
kernel-184.108.40.206-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.