Bug 664592 - a test unit ready causes a panic on 5.6 (CCISS driver)
a test unit ready causes a panic on 5.6 (CCISS driver)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.6
Unspecified Linux
urgent Severity urgent
: rc
: ---
Assigned To: Tomas Henzl
Chao Ye
: Regression, ZStream
Depends On:
Blocks: 668976 707606
  Show dependency treegraph
 
Reported: 2010-12-20 16:28 EST by Barry Donahue
Modified: 2012-05-04 20:29 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Using the cciss driver, when a TUR (Test Unit Ready) was executed, the rq->bio pointer in the blk_rq_bytes function was of value null, which resulted in a null pointer dereference, and, consequently, kernel panic occurred. With this update, the rq->bio pointer is used only when the blk_fs_request(rq) condition is true, thus, kernel panic no longer occurs.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-21 06:01:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
fix panic in blk_rq_bytes (512 bytes, patch)
2011-01-05 02:44 EST, Tomas Henzl
no flags Details | Diff

  None (edit)
Description Barry Donahue 2010-12-20 16:28:24 EST
Description of problem: Running a recommended test case from BZ 662154 Causes a panic on CCISS systems.


Steps to Reproduce:
1. Install 5.6 onto a CCISS system and run the following test case.
#!/bin/bash

((i=0))

while (( i<100000))
do
 sg_turs /dev/cciss/c0d0
 ((i+=1))
done


The system will panic.

# ./test.sh 
Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: 
 [<ffffffff880b94fc>] :cciss:cciss_softirq_done+0xea/0x36d
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/resource
CPU 1 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ata_piix libata ide_cd be2iscsi i5000_edac cdrom pcspkr libiscsi2 be2net edac_mc tpm_tis scsi_transport_iscsi2 serio_raw scsi_transport_iscsi 8021q hpilo tpm tpm_bios bnx2 shpchp dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-236.el5 #1
RIP: 0010:[<ffffffff880b94fc>]  [<ffffffff880b94fc>] :cciss:cciss_softirq_done+0xea/0x36d
RSP: 0018:ffff8101aff37ec0  EFLAGS: 00010246
RAX: 0000000040002988 RBX: 0000000000000002 RCX: ffff81019b287608
RDX: ffff8101aff37f20 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff810037e00034 R08: 0000000000000000 R09: ffff8101aff31e38
R10: 0000000000000082 R11: 0000000000000048 R12: 0000000000000000
R13: ffff81019b2875f8 R14: ffff8101aff50000 R15: ffff810037e00000
FS:  0000000000000000(0000) GS:ffff8101aff147c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000030 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff8101aff30000, task ffff8101aff18100)
Stack:  0000000000000086 0000000000000082 ffffffff880bc1c9 0000000000000046
 ffff8101aff50000 0000000000000046 0000000000000001 ffffffff8043cfc0
 000000000000000a 0000000000000001 ffffffff8044f280 ffffffff80037e5a
Call Trace:
 <IRQ>  [<ffffffff880bc1c9>] :cciss:do_cciss_intr+0xaab/0xae8
 [<ffffffff80037e5a>] blk_done_softirq+0x5f/0x6d
 [<ffffffff80012464>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d5f5>] do_softirq+0x2c/0x7d
 [<ffffffff8006d485>] do_IRQ+0xec/0xf5
 [<ffffffff80057083>] mwait_idle+0x0/0x20
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8006b981>] mwait_idle_with_hints+0x66/0x67
 [<ffffffff8005708f>] mwait_idle+0xc/0x20
 [<ffffffff80049360>] cpu_idle+0x95/0xb8
 [<ffffffff80078672>] start_secondary+0x490/0x49f


Code: 8b 57 30 74 0a 89 d5 81 e5 00 fe ff ff eb 07 41 8b ad dc 00 
RIP  [<ffffffff880b94fc>] :cciss:cciss_softirq_done+0xea/0x36d
 RSP <ffff8101aff37ec0>
CR2: 0000000000000030
 <0>Kernel panic - not syncing: Fatal exception
Comment 1 Tomas Henzl 2010-12-21 11:01:42 EST
(In reply to comment #0)
I was able to reproduce this on another system, will look into it.
Comment 2 Tomas Henzl 2010-12-22 08:35:01 EST
Hi Mike, Steve,
it's very likely that this is caused by the latest patchset - update to 3.6.22 ported to RHEL5. Please look at this issue.
Comment 3 Tomas Henzl 2011-01-05 02:44:36 EST
Created attachment 471810 [details]
fix panic in blk_rq_bytes

When the tur is sent down then in blk_rq_byte is the rq->bio = null, this causes a null pointer dereference here int nr_sectors = bio_sectors(rq->bio);

Let me know if the patch is ok for you.
Comment 4 RHEL Product and Program Management 2011-01-05 03:10:03 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Stephen Cameron 2011-01-05 09:12:54 EST
Tomas, your patch attachment 471810 [details] looks fine.  For comparison, here is what we have in our CVS for the driver which we build for RHEL5 for the blk_rq_bytes function:

/**
 * blk_rq_bytes - Returns bytes left to complete in the entire request
 * @rq: the request being processed
 * this function is copied from later kernels (2.6.29-ish), where it is
 * normally defined in blk/blk-core.c
 * Slightly modified for older kernels.
 **/
static unsigned int blk_rq_bytes(struct request *rq)
{
        int nr_sectors;

        if (blk_fs_request(rq))  {
                BUG_ON(!rq->bio);
                nr_sectors = bio_sectors(rq->bio);
                return nr_sectors << 9;
        }

        return rq->data_len;
}

-- steve
Comment 8 Tomas Henzl 2011-01-05 09:47:10 EST
(In reply to comment #7)
> Tomas, your patch attachment 471810 [details] looks fine.
Thanks Steve.
In our git the blk_rq_bytes is called only when blk_pc_request(rq) is true
so the branch 
>         if (blk_fs_request(rq))  {
>                 BUG_ON(!rq->bio);
>                 nr_sectors = bio_sectors(rq->bio);
>                 return nr_sectors << 9;
>         }
is never taken.

Maybe I could misuse the opportunity :) - could you review also the 
https://bugzilla.redhat.com/attachment.cgi?id=467508
in
https://bugzilla.redhat.com/show_bug.cgi?id=635143#c28
?
Comment 9 Stephen Cameron 2011-01-05 10:01:39 EST
(In reply to comment #8)
> (In reply to comment #7)
> > Tomas, your patch attachment 471810 [details] looks fine.
> Thanks Steve.
> In our git the blk_rq_bytes is called only when blk_pc_request(rq) is true
> so the branch 
> >         if (blk_fs_request(rq))  {
> >                 BUG_ON(!rq->bio);
> >                 nr_sectors = bio_sectors(rq->bio);
> >                 return nr_sectors << 9;
> >         }
> is never taken.

Hmm.  This appears to be true in our driver as well.  Good catch.  Not that it hurts much as it is.  The compiler might even be smart enough to figure that out, since blk_rq_bytes is only called in one place that I see and so probably gets inlined, and those blk_xx_request() are macros iirc (that got removed in later kernels for some reason)... but might be a long shot for it to be that smart.

> 
> Maybe I could misuse the opportunity :) - could you review also the 
> https://bugzilla.redhat.com/attachment.cgi?id=467508
> in
> https://bugzilla.redhat.com/show_bug.cgi?id=635143#c28
> ?

Ok, will take a look.

-- steve
Comment 13 Jarod Wilson 2011-01-26 16:09:12 EST
in kernel-2.6.18-241.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 23 Martin Prpic 2011-07-13 16:20:23 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Using the cciss driver, when a TUR (Test Unit Ready) was executed, the rq->bio pointer in the blk_rq_bytes function was of value null, which resulted in a null pointer dereference, and, consequently, kernel panic occurred. With this update, the rq->bio pointer is used only when the blk_fs_request(rq) condition is true, thus, kernel panic no longer occurs.
Comment 24 errata-xmlrpc 2011-07-21 06:01:08 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Note You need to log in before you can comment on or make changes to this bug.