Bug 754907 - 2.6.41.1-1.fc15.x86_64: cciss module crash
Summary: 2.6.41.1-1.fc15.x86_64: cciss module crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-18 07:26 UTC by Jan ONDREJ
Modified: 2012-07-06 14:43 UTC (History)
9 users (show)

Fixed In Version: kernel-2.6.41.4-1.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-10 19:51:25 UTC


Attachments (Terms of Use)
Full dmesg (68.15 KB, application/octet-stream)
2011-11-18 07:26 UTC, Jan ONDREJ
no flags Details
cciss.patch (453 bytes, text/plain)
2011-11-24 11:40 UTC, Stanislaw Gruszka
no flags Details
Patch to add IRQF_SHARED flag to hpsa for non-msi interrupt handler (1.17 KB, text/plain)
2011-11-28 17:11 UTC, Stephen Cameron
no flags Details
Patch to add IRQF_SHARED flag to cciss for non msi(x) interrupts (1.19 KB, text/plain)
2011-11-28 17:12 UTC, Stephen Cameron
no flags Details

Description Jan ONDREJ 2011-11-18 07:26:22 UTC
Created attachment 534351 [details]
Full dmesg

Description of problem:
After upgrade and reboot doesn't work.

Version-Release number of selected component (if applicable):
kernel-2.6.41.1-1.fc15.x86_64

How reproducible:
always
  
Actual results:
[ 1093.978987] HP CISS Driver (v 3.6.26)
[ 1094.003078] IRQ handler type mismatch for IRQ 16
[ 1094.003228] current handler: uhci_hcd:usb2
[ 1094.003366] Pid: 2921, comm: modprobe Not tainted 2.6.41.1-1.fc15.x86_64 #1
[ 1094.003509] Call Trace:
[ 1094.003659]  [<ffffffff810b1d76>] __setup_irq+0x39e/0x432
[ 1094.003805]  [<ffffffff8111971c>] ? kmem_cache_alloc_trace+0xb3/0xc5
[ 1094.003956]  [<ffffffffa01d5899>] ? process_indexed_cmd+0xa6/0xa6 [cciss]
[ 1094.004001]  [<ffffffff810b1ef4>] request_threaded_irq+0xea/0x116
[ 1094.004001]  [<ffffffffa01d7bca>] cciss_request_irq+0x66/0x98 [cciss]
[ 1094.004001]  [<ffffffffa01d6ddb>] cciss_init_one+0x1123/0x1a2f [cciss]
[ 1094.004001]  [<ffffffff8111a707>] ? kmem_cache_alloc+0x31/0xf8
[ 1094.004001]  [<ffffffff811843a4>] ? sysfs_find_dirent+0x3c/0x55
[ 1094.004001]  [<ffffffff81085d77>] ? arch_local_irq_save+0x15/0x1b
[ 1094.004001]  [<ffffffff81262487>] local_pci_probe+0x44/0x75
[ 1094.004001]  [<ffffffff81262fea>] pci_device_probe+0xd0/0xff
[ 1094.004001]  [<ffffffff81301017>] driver_probe_device+0x131/0x213
[ 1094.004001]  [<ffffffff81301153>] __driver_attach+0x5a/0x7e
[ 1094.004001]  [<ffffffff813010f9>] ? driver_probe_device+0x213/0x213
[ 1094.004001]  [<ffffffff8130009f>] bus_for_each_dev+0x53/0x89
[ 1094.004001]  [<ffffffff81300bf6>] driver_attach+0x1e/0x20
[ 1094.004001]  [<ffffffff8130081a>] bus_add_driver+0xd1/0x224
[ 1094.004001]  [<ffffffffa016e000>] ? 0xffffffffa016dfff
[ 1094.004001]  [<ffffffff813015f7>] driver_register+0x98/0x105
[ 1094.004001]  [<ffffffffa016e000>] ? 0xffffffffa016dfff
[ 1094.004001]  [<ffffffff812638ad>] __pci_register_driver+0x56/0xc1
[ 1094.004001]  [<ffffffffa016e000>] ? 0xffffffffa016dfff
[ 1094.004001]  [<ffffffffa016e07d>] cciss_init+0x7d/0xa1 [cciss]
[ 1094.004001]  [<ffffffff81002099>] do_one_initcall+0x7f/0x136
[ 1094.004001]  [<ffffffff8108a59d>] sys_init_module+0x88/0x1d0
[ 1094.004001]  [<ffffffff814a3102>] system_call_fastpath+0x16/0x1b
[ 1094.011026] cciss 0000:09:02.0: Unable to get irq 16 for cciss0
[ 1094.012661] cciss: probe of 0000:09:02.0 failed with error -1

Additional info:
I can make some tests on this machine, if required.

Comment 1 Stephen Cameron 2011-11-22 19:01:05 UTC
What controller and what server is this?
What kernel are you upgrading from?

And 2.6.41?  I thought they stopped at 2.6.38 and then 3.0.

-- steve

Comment 2 Jan ONDREJ 2011-11-22 19:12:52 UTC
(In reply to comment #1)
> What controller and what server is this?

[root@ftp ~]# cciss_vol_status /dev/cciss/c0d0
/dev/cciss/c0d0: (Smart Array 641) RAID 5 Volume 0 status: OK. 
/dev/cciss/c0d0: (Smart Array 641) Enclosure PROLIANT 6L6I (S/N: ) on Bus 0, Physical Port J1 status: OK.

product: ProLiant ML350 G4

> What kernel are you upgrading from?
> 
> And 2.6.41?  I thought they stopped at 2.6.38 and then 3.0.

Last working kernel: 2.6.40.6-0.fc15.x86_64
First bad kernel (no newer fedora kernel yet): kernel-2.6.41.1-1.fc15.x86_64

Comment 3 Josh Boyer 2011-11-22 19:25:22 UTC
(In reply to comment #2)
> > What kernel are you upgrading from?
> > 
> > And 2.6.41?  I thought they stopped at 2.6.38 and then 3.0.
> 
> Last working kernel: 2.6.40.6-0.fc15.x86_64
> First bad kernel (no newer fedora kernel yet): kernel-2.6.41.1-1.fc15.x86_64

2.6.40.6 is 3.0.6 renamed to avoid breaking F15 userspace that wasn't ready for the 3.0 change.

Similarly 2.6.41.1 is 3.1.1.

FYI.

Comment 4 Stephen Cameron 2011-11-22 19:40:08 UTC
This is the entirety of the difference between the cciss drivers in 3.0.6 and 3.1.1 from kernel.org:

[scameron@localhost fedora-bug]$ for x in linux-3.0.6/drivers/block/cciss*[ch]; do  f=`basename $x`; echo ==== $f ====; diff -u linux-3.0.6/drivers/block/$f linux-3.1.1/drivers/block/$f; done
==== cciss.c ====
--- linux-3.0.6/drivers/block/cciss.c	2011-10-03 15:25:23.000000000 -0500
+++ linux-3.1.1/drivers/block/cciss.c	2011-11-11 14:19:27.000000000 -0600
@@ -4533,6 +4533,13 @@
 		pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
 		pmcsr |= PCI_D0;
 		pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr);
+
+		/*
+		 * The P600 requires a small delay when changing states.
+		 * Otherwise we may think the board did not reset and we bail.
+		 * This for kdump only and is particular to the P600.
+		 */
+		msleep(500);
 	}
 	return 0;
 }
==== cciss_cmd.h ====
==== cciss.h ====
==== cciss_scsi.c ====
--- linux-3.0.6/drivers/block/cciss_scsi.c	2011-10-03 15:25:23.000000000 -0500
+++ linux-3.1.1/drivers/block/cciss_scsi.c	2011-11-11 14:19:27.000000000 -0600
@@ -33,7 +33,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 
-#include <asm/atomic.h>
+#include <linux/atomic.h>
 
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_device.h>
==== cciss_scsi.h ====
[scameron@localhost fedora-bug]$

I think whatever broke must reside outside the driver.

-- steve

Comment 5 Stephen Cameron 2011-11-22 19:41:36 UTC
git-bisect would probably pin it down.

Comment 6 Stanislaw Gruszka 2011-11-23 14:58:32 UTC
Seems just irq routing was changed, so now cciss share interrupt with other device. Since cciss request_irq without IRQF_SHARED flags, request fail.

Is there any reason why cciss can not share interrupts?

Comment 7 Stephen Cameron 2011-11-23 16:16:46 UTC
No.

Most smart arrays use MSI or MSIX these days, so... wouldn't be shared, right?
(I don't need IRQF_SHARED when using MSI/MSIX, correct?  But for non-MSI/MSIX, then I do need IRQF_SHARED, correct?)

This appears to be the commit which removed IRQF_SHARED

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2b39087c900bdb240b50ac95ee9da00d844565

That was more than a year ago, 2010-08-07... if that's it, I'm surprised nobody has complained before.

Maybe something like this?

Author: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Date:   Wed Nov 23 10:16:34 2011 -0600

    cciss: Add IRQF_SHARED back in for the non-MSI(X) interrupt handler

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 486f94e..942ccf8 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -4884,7 +4884,7 @@ static int cciss_request_irq(ctlr_info_t *h,
        }
 
        if (!request_irq(h->intr[h->intr_mode], intxhandler,
-                       IRQF_DISABLED, h->devname, h))
+                       IRQF_DISABLED | IRQF_SHARED, h->devname, h))
                return 0;
        dev_err(&h->pdev->dev, "Unable to get irq %d for %s\n",
                h->intr[h->intr_mode], h->devname);

Comment 8 Stephen Cameron 2011-11-23 16:19:41 UTC
If that is correct, probably need similar for hpsa (though I think all boards officially supported by hpsa use MSI, but the "hpsa_allow_any=1" kernel option may expose older boards.)

-- steve

Comment 9 Stanislaw Gruszka 2011-11-24 07:23:34 UTC
Jan, can you apply patch from comment 7 and test? Let me know, if you are not familiar with kernel compilation, I will lunch kernel build with patch in http://koji.fedoraproject.org/koji/ .

Comment 10 Jan ONDREJ 2011-11-24 07:26:51 UTC
(In reply to comment #9)
> Jan, can you apply patch from comment 7 and test? Let me know, if you are not
> familiar with kernel compilation, I will lunch kernel build with patch in
> http://koji.fedoraproject.org/koji/ .

Hello. I have no time to build a kernel now, but if you can build me a new build in koji, no problem to test it.

Comment 11 Stanislaw Gruszka 2011-11-24 11:04:16 UTC
Ok, here is the kernel with patch:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034

Comment 12 Jan ONDREJ 2011-11-24 11:12:33 UTC
(In reply to comment #11)
> Ok, here is the kernel with patch:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3537034

Works well. All disks are present.

Comment 13 Stanislaw Gruszka 2011-11-24 11:40:05 UTC
Created attachment 535864 [details]
cciss.patch

This is exact patch I used in the test kernel. Josh please apply it. Stephen please post it :-) Also note that you can get rid of IRQF_DISABLED, according to include/linux/interrupt.h it is noop and deprecated.

Comment 14 Stephen Cameron 2011-11-28 15:03:14 UTC
(In reply to comment #13)
> Created attachment 535864 [details]
> cciss.patch
> 
> This is exact patch I used in the test kernel. Josh please apply it. Stephen
> please post it :-) Also note that you can get rid of IRQF_DISABLED, according
> to include/linux/interrupt.h it is noop and deprecated.

So I also have IRQF_DISABLED in the msix path as the only flag.  Should I just use 0 for the flags there?  Should I add in IRQF_SAMPLE_RANDOM?  I seem to remember that used to be in there at one time as well.

-- steve

Comment 15 Stephen Cameron 2011-11-28 15:07:19 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Created attachment 535864 [details]
> > cciss.patch
> > 
> > This is exact patch I used in the test kernel. Josh please apply it. Stephen
> > please post it :-) Also note that you can get rid of IRQF_DISABLED, according
> > to include/linux/interrupt.h it is noop and deprecated.
> 
> So I also have IRQF_DISABLED in the msix path as the only flag.  Should I just
> use 0 for the flags there?  Should I add in IRQF_SAMPLE_RANDOM?  I seem to
> remember that used to be in there at one time as well.
> 
> -- steve

Well, digging around, I see there are plenty of uses of request_irq with flags passed as 0, and no scsi drivers use IRQF_SAMPLE_RANDOM and only one block driver, so I guess I shouldn't use IRQF_SAMPLE_RANDOM.

Comment 16 Stanislaw Gruszka 2011-11-28 15:42:55 UTC
Zero should be fine. As long as device do not generate interrupts in truly random maner.

Comment 17 Stephen Cameron 2011-11-28 17:11:41 UTC
Created attachment 537541 [details]
Patch to add IRQF_SHARED flag to hpsa for non-msi interrupt handler

Here is the patch I sent to the lkml for hpsa to add IRQF_SHARED to non msi(x) interrupt request.

Comment 18 Stephen Cameron 2011-11-28 17:12:51 UTC
Created attachment 537543 [details]
Patch to add IRQF_SHARED flag to cciss for non msi(x) interrupts

Here is the patch I sent to lkml for cciss to add IRQF_SHARED to the non msi(x) interrupt request.

Comment 19 Chuck Ebbert 2011-11-28 21:39:12 UTC
Patches added to F15 and F16, will be in the next update.

Comment 20 Fedora Update System 2011-11-29 13:51:50 UTC
kernel-2.6.41.4-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.41.4-1.fc15

Comment 21 Fedora Update System 2011-11-30 02:03:19 UTC
Package kernel-2.6.41.4-1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.41.4-1.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-16621/kernel-2.6.41.4-1.fc15
then log in and leave karma (feedback).

Comment 22 Fedora Update System 2011-12-10 19:51:25 UTC
kernel-2.6.41.4-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.