Bug 505506

Summary: RHEL4.8: crash in do_cciss_request()
Product: Red Hat Enterprise Linux 4 Reporter: Lachlan McIlroy <lmcilroy>
Component: kernelAssignee: Tomas Henzl <thenzl>
Status: CLOSED ERRATA QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.8CC: bmarzins, bubrown, coughlan, dhoward, fge, jolsa, jtluka, mike.miller, mmatsuya, tao, vgaikwad
Target Milestone: rcKeywords: ZStream
Target Release: 4.9   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 662154 (view as bug list) Environment:
Last Closed: 2011-02-16 15:47:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 513189, 662154    
Attachments:
Description Flags
Patch to initialize entire request structure none

Description Lachlan McIlroy 2009-06-12 06:49:44 UTC
PROBLEM:

The multipath executable frequently crashes the system showing:


----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at cciss:3307
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in: mptctl mptbase sg ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) md5 ipv6 netconsole netdum
p hp_ilo(U) cpufreq_powersave loop dm_round_robin dm_multipath button battery ac joydev ehci_hcd uhci_hcd i50
00_edac edac_mc hw_random e1000(U) bnx2(U) bonding(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix
libata cciss(U) sd_mod qla2xxx(U) scsi_mod qla2xxx_conf(U)
Pid: 8563, comm: multipath Not tainted 2.6.9-78.ELsmp
RIP: 0010:[<ffffffffa0176222>] <ffffffffa0176222>{:cciss:do_cciss_request+78}
RSP: 0018:0000010687a99568  EFLAGS: 00010012
RAX: 0000010696829678 RBX: 000001069651dac0 RCX: 00000106968a7280
RDX: 00000106968a7280 RSI: 00000106968a7248 RDI: 000001069651dac0
RBP: 0000010696829678 R08: 0000000000000002 R09: 0000000000000060
R10: 000001069651dac0 R11: ffffffffa01761d4 R12: 0000000000000000
R13: 000001069651dac0 R14: 00000106965e0000 R15: 000001069892de00
FS:  0000002a9557c8c0(0000) GS:ffffffff8050d300(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003e2d592c0f CR3: 00000000cfe08000 CR4: 00000000000006e0
Process multipath (pid: 8563, threadinfo 0000010687a98000, task 00000106864d17f0)
Stack: 0000000000000000 0000000000000000 0000000000000000 000001069651dac0
      0000000000000000 0000000000000000 0000000000000000 0000000000000000
      0000000000000000 0000000000000246
Call Trace:<ffffffff8016375a>{cache_alloc_refill+390} <ffffffff8015e738>{mempool_alloc+129}
      <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff80318304>{thread_return+0}
      <ffffffff803183c5>{thread_return+193} <ffffffff80140a40>{__mod_timer+293}
      <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff80286986>{cdrom_timer_expiry+0}
      <ffffffff8027b289>{ide_outb+0} <ffffffff80318304>{thread_return+0}
      <ffffffff803183c5>{thread_return+193} <ffffffff80140a40>{__mod_timer+293}
      <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff80286986>{cdrom_timer_expiry+0}
      <ffffffff8027b289>{ide_outb+0} <ffffffff80286c24>{cdrom_transfer_packet_command+172}
      <ffffffff80318588>{wait_for_completion+231} <ffffffff80134709>{default_wake_function+0}
      <ffffffff8015e738>{mempool_alloc+129} <ffffffff801360cc>{autoremove_wake_function+0}
      <ffffffff8015e738>{mempool_alloc+129} <ffffffff801360cc>{autoremove_wake_function+0}
      <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff8025d3cd>{cfq_insert_request+63}
      <ffffffff8025d220>{cfq_next_request+46} <ffffffff8025481b>{generic_unplug_device+24}
      <ffffffff80254f99>{blk_execute_rq+166} <ffffffff801360cc>{autoremove_wake_function+0}
      <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff80259107>{sg_io+455}
      <ffffffff80191f35>{dput+56} <ffffffff802594eb>{scsi_cmd_ioctl+740}
      <ffffffff80193046>{__d_lookup+287} <ffffffff8018891f>{do_lookup+44}
      <ffffffff80191f35>{dput+56} <ffffffff80189a56>{__link_path_walk+3562}
      <ffffffffa017569a>{:cciss:cciss_ioctl+3385} <ffffffff80191f35>{dput+56}
      <ffffffff801ef3a1>{__up_read+16} <ffffffff8025125a>{kobj_lookup+358}
      <ffffffff80183514>{do_open+928} <ffffffff80257943>{blkdev_ioctl+1674}
      <ffffffff8017b095>{__dentry_open+248} <ffffffff8017b246>{filp_open+95}
      <ffffffff801848fa>{sys_newstat+32} <ffffffff801f1b75>{strncpy_from_user+74}
      <ffffffff8018dc81>{sys_ioctl+853} <ffffffff801102f6>{system_call+126}





ANALYSIS:

Analysis of the dump reveals that we BUG() because of the entry shown below:

crash> struct request 0000010696829678
struct request {
 queuelist = {
   next = 0x1069651dac0,
   prev = 0x1069651dac0
 },
 flags = 1120,
 sector = 1743,
 nr_sectors = 50,
 current_nr_sectors = 1,
 hard_sector = 1743,
 hard_nr_sectors = 50,
 hard_cur_sectors = 1,
 nr_cbio_segments = 1,
 nr_cbio_sectors = 1,
 cbio = 0x106941d9880,
 bio = 0x0,
 biotail = 0x0,
 elevator_private = 0x106968a7248,
 rq_status = 1,
 rq_disk = 0x1069892de00,
 errors = 0,
 start_time = 4327277071,
 nr_phys_segments = 50,        <------- nr_phys_segments = 50 > MAXSGENTRIES
 nr_hw_segments = 50,
 tag = -1,
 buffer = 0x0,
 ref_count = 2,
 q = 0x1069651dac0,


Which is from do_cciss_request():


3002         if (creq->nr_phys_segments > MAXSGENTRIES)
3003                 BUG();


Further analysis of the IO shows that we were attempting to perform a SCSI
TUR (test-unit-ready).  I could not reproduce this using multipath but was able
to crash the system in the same way by calling sg_turs /dev/cciss/c0d0 in a loop.
The multipath binary performs exactly the same ioctl():


From a multipath strace:

7992  open("/dev/cciss/c0d0", O_RDONLY) = 4
7992  ioctl(4, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[00, 00, 00, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000,
flags=0, status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=0, info=0}) = 0


From an sg_turs strace:

8024  ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[00, 00, 00, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=60000,
flags=0, status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=0, info=0}) = 0

Only slight diff is the timeout.

In the dumps I produced it shows the same stack trace and seemingly apparent problem, although
mine additionally shows evidence of data corruption, eg:

crash> struct request 00000103a4f30eb8
struct request {
 queuelist = {
   next = 0x101f8e9c080,
   prev = 0x101f8e9c080
 },
 flags = 1120,
 sector = 16514577519336752818,       <------------------ All looks bad from here
 nr_sectors = 17875655937648256291,
 current_nr_sectors = 3109905447,
 hard_sector = 17482706148473970798,
 hard_nr_sectors = 9979233092578643319,
 hard_cur_sectors = 442092742,
 nr_cbio_segments = 13892,
 nr_cbio_sectors = 6024779977099665222,           <------ to here!!
 cbio = 0xe431dfdc03593afe,
 bio = 0x0,
 biotail = 0x0,
 elevator_private = 0x103f809f290,
 rq_status = 1,
 rq_disk = 0x103f8fee800,
 errors = 0,
 start_time = 17968924045220252110,
 nr_phys_segments = 25835,             <--- causes the BUG() crash, way out of range!
 nr_hw_segments = 28685,
 tag = 1596562687,
 buffer = 0x0,
 ref_count = 2,
 q = 0x101f8e9c080,
 rl = 0x101f8e9c180,
 waiting = 0x101f759fa18,
 special = 0x0,
 cmd_len = 6,
 cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
 data_len = 0,
 data = 0x0,
 sense_len = 0,
 sense = 0x101f759fa78,
 timeout = 60000,
 pm = 0xa29fc14af076012d
}

So it is possible that we are looking at an additional problem or perhaps the same
one exacerbated by the frequency of the IOs.  I started to analyse the issue
to look for root cause but this has still yet to be determined.


The customer has been able to work-around this problem in their environment with
multipath by adding cciss to the blacklisted devices, eg:

 
devnode_blacklist {
#   wwid   26353900f02796769
  devnode   "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
  devnode   "^hd[a-z][[0-9]*]"
#   devnode   "^cciss!c[0-9]d[0-9]*"   <--- uncommenting this should help!
}


Now multipath no longer accesses the device.




REPRODUCER:

The system can be made to crash within a few seconds if the following script
is executed:


# more tur-loop
#!/bin/bash

((i=0))

while (( i<100000))
do
 sg_turs /dev/cciss/c0d0
 ((i+=1))
done


Investigation results using systemtap revealed the following after
a number of iterations of the reproducer script:


   0 sg_turs(8792): -> cciss_open (8792, 5860887988348), cpu# 1
   14 sg_turs(8792):  -> get_host (8792, 5860888011243), cpu# 1
   20 sg_turs(8792):   -> get_drv (8792, 5860888025666), cpu# 1
   26 sg_turs(8792):   <- cciss_open (8792, 5860888041781), cpu# 1
   87 sg_turs(8792):   -> cciss_ioctl (8792, 5860888200681), cpu# 1
   92 sg_turs(8792):    -> get_host (8792, 5860888212167), cpu# 1
   96 sg_turs(8792):     -> get_drv (8792, 5860888223285), cpu# 1
  101 sg_turs(8792):      -> scsi_cmd_ioctl (8792, 5860888237689), cpu# 1
  107 sg_turs(8792):       -> constant_test_bit (8792, 5860888253856), cpu# 1
  111 sg_turs(8792):        -> blk_get_queue (8792, 5860888263492), cpu# 1
  115 sg_turs(8792):         -> atomic_inc (8792, 5860888274862), cpu# 1
  120 sg_turs(8792):         <- blk_get_queue (8792, 5860888286863), cpu# 1
  125 sg_turs(8792):         -> sg_io (8792, 5860888299438), cpu# 1
  130 sg_turs(8792):          -> verify_command (8792, 5860888312890), cpu# 1
  134 sg_turs(8792):          <- verify_command (8792, 5860888324740), cpu# 1
  140 sg_turs(8792):          -> blk_get_request (8792, 5860888339244), cpu# 1
  145 sg_turs(8792):           -> get_request_wait (8792, 5860888353116), cpu# 1
  151 sg_turs(8792):            -> get_request (8792, 5860888367427), cpu# 1
  156 sg_turs(8792):             -> get_io_context (8792, 5860888381467), cpu# 1
  161 sg_turs(8792):              -> atomic_inc (8792, 5860888394839), cpu# 1
  166 sg_turs(8792):              <- get_io_context (8792, 5860888406646), cpu# 1
  170 sg_turs(8792):              -> blk_queue_full (8792, 5860888419168), cpu# 1
  175 sg_turs(8792):               -> constant_test_bit (8792, 5860888429981), cpu# 1
  179 sg_turs(8792):                -> blk_alloc_request (8792, 5860888441320), cpu# 1
  496 sg_turs(8792):                 -> elv_set_request (8792, 5860888464864), cpu# 1
  502 sg_turs(8792):                  -> cfq_set_request (8792, 5860888481758), cpu# 1
  508 sg_turs(8792):                   -> cfq_get_queue (8792, 5860888495975), cpu# 1
  513 sg_turs(8792):                    -> __cfq_get_queue (8792, 5860888511323), cpu# 1
  518 sg_turs(8792):                     -> __cfq_find_cfq_hash (8792, 5860888524459), cpu# 1
  524 sg_turs(8792):                      -> prefetch (8792, 5860888538224), cpu# 1
  528 sg_turs(8792):                       -> __cfq_find_cfq_hash (8792, 5860888550704), cpu# 1
  533 sg_turs(8792):                        -> prefetch (8792, 5860888562843), cpu# 1
  539 sg_turs(8792):                         -> list_add (8792, 5860888578383), cpu# 1

 544 sg_turs(8792):                          -> __list_add (8792, 5860888591592), cpu# 1
  549 sg_turs(8792):                          <- __cfq_get_queue (8792, 5860888605811), cpu# 1
  555 sg_turs(8792):                         <- cfq_set_request (8792, 5860888619959), cpu# 1
  559 sg_turs(8792):                        <- elv_set_request (8792, 5860888632358), cpu# 1
  565 sg_turs(8792):                        -> ioc_batching (8792, 5860888645983), cpu# 1
  570 sg_turs(8792):                         -> put_io_context (8792, 5860888660562), cpu# 1
  576 sg_turs(8792):                          -> atomic_dec_and_test (8792, 5860888674279), cpu# 1
  581 sg_turs(8792):                          <- put_io_context (8792, 5860888687735), cpu# 1
  586 sg_turs(8792):                         <- get_request (8792, 5860888700379), cpu# 1
  591 sg_turs(8792):                        <- get_request_wait (8792, 5860888712829), cpu# 1
  595 sg_turs(8792):                       <- blk_get_request (8792, 5860888725045), cpu# 1
kernel.function("blk_execute_rq@drivers/block/ll_rw_blk.c:2016")
q = 0x000001001fb30080, rq = 0x000001001178ca98, rq->q= 0x000001001fb30080
rq->flags = 0x400
rq->current_nr_sectors =852049
rq->hard_sector = 10010608
rq->hard_nr_sectors = 65
rq->hard_cur_sectors = 2506225280
rq->nr_cbio_segments = 42
rq->nr_cbio_sectors = 655439
rq->cbio= 0x0000002a9561bac0
rq->bio = 0x0000000000000000
rq->biotail= 0x0000000000000000
rq->nr_phys_segments= 47872
rq->nr_hw_segments= 38241
rq->ref_count = 1
  610 sg_turs(8792):                       -> blk_execute_rq (8792, 5860888763412), cpu# 1
  616 sg_turs(8792):                        -> elv_add_request (8792, 5860888779482), cpu# 1



The above request structure will result in the system crashing.
Analysis of the
functions in the call graph suggest that this might be just a simple
case of
structure initialisation (or lack thereof) in here:



static inline struct request *blk_alloc_request(request_queue_t *q,int gfp_mask)
{
       struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);

       if (!rq)
               return NULL;

       if (!elv_set_request(q, rq, gfp_mask))
               return rq;

       mempool_free(rq, q->rq.rq_pool);
       return NULL;
}


I checked RHEL5 and it looked no different so I checked further upstream
and found the following:


http://lxr.linux.no/linux+v2.6.29/block/blk-core.c#L649



648static struct request *
649blk_alloc_request(struct request_queue *q, int rw, int priv, gfp_t gfp_mask)
650{
651        struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);
652
653        if (!rq)
654                return NULL;
655
656        blk_rq_init(q, rq);               <----------------------
657
658        rq->cmd_flags = rw | REQ_ALLOCED;
659
660        if (priv) {
661                if (unlikely(elv_set_request(q, rq, gfp_mask))) {
662                        mempool_free(rq, q->rq.rq_pool);
663                        return NULL;
664                }
665                rq->cmd_flags |= REQ_ELVPRIV;
666        }
667
668        return rq;
669}



123void blk_rq_init(struct request_queue *q, struct request *rq)
124{
125        memset(rq, 0, sizeof(*rq));       <------------
126
127        INIT_LIST_HEAD(&rq->queuelist);
128        INIT_LIST_HEAD(&rq->timeout_list);
129        rq->cpu = -1;
130        rq->q = q;
131        rq->sector = rq->hard_sector = (sector_t) -1;
132        INIT_HLIST_NODE(&rq->hash);
133        RB_CLEAR_NODE(&rq->rb_node);
134        rq->cmd = rq->__cmd;
135        rq->tag = -1;
136        rq->ref_count = 1;
137}




I have therefore modified the code to include a structure
initialisation:


static inline struct request *blk_alloc_request(request_queue_t *q,int gfp_mask)
{
       struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);

       if (!rq)
               return NULL;

       memset(rq, 0, sizeof(struct request));      <--------

       if (!elv_set_request(q, rq, gfp_mask))
               return rq;

       mempool_free(rq, q->rq.rq_pool);
       return NULL;
}

Whether it requires the additional changes to be backported in to RHEL4
I'll leave to RH to determine.  However testing of the above change
produced zero crashes after 100000 iterations of the test script calling
sg_turs.

Comment 1 Lachlan McIlroy 2009-06-12 06:55:50 UTC
Created attachment 347516 [details]
Patch to initialize entire request structure

Comment 4 Ben Marzinski 2009-06-25 22:13:36 UTC
While the fix in comment #1 seems correct to me, since this is only happening for the cciss driver, in seems pretty reasonable to fix it there.

Looking at do_cciss_request(), it seems like you should be able to fix this with something like

--- linux-2.6.9/drivers/block/cciss.c   2009-06-25 09:08:02.000000000 -0500
+++ linux-2.6.9/drivers/block/cciss.c.505506    2009-06-25 10:47:22.000000000 -0500
@@ -3328,7 +3328,7 @@ queue:
        if (!creq)
                goto startio;
 
-       if (creq->nr_phys_segments > MAXSGENTRIES)
+       if (creq->bio && creq->nr_phys_segments > MAXSGENTRIES)
                 BUG();
 
        if (( c = cmd_alloc(h, 1)) == NULL)


I don't have any hardware to test this on, so I'm not positive that it works, but it seems like it should. Just my two cents.

Comment 5 Lachlan McIlroy 2009-06-26 03:43:57 UTC
I'd be inclined to use the suggested patch for the following reasons

- it's been tested and is known to fix the problem
- it may fix the same issue for other drivers (that don't detect the
  issue with a BUG())
- it may fix other issues related to other fields in the request structure
  that are not being initialised
- it's what is now being done in mainline

Comment 6 Tomas Henzl 2009-07-01 11:40:21 UTC
I've built a test kernel in brew (taskID=1869626). 
It is accessible here -> http://people.redhat.com/thenzl/505506/ , if you need another kernel flavour let me know I can post it also, but my quota is limited.

Please retest it with kernel from people/... , so I can then post it to rhkl.
And thanks for the analysis and patch.

Comment 7 Tomas Henzl 2009-07-01 13:35:48 UTC
(In reply to comment #4)
> While the fix in comment #1 seems correct to me, since this is only happening
> for the cciss driver, in seems pretty reasonable to fix it there.
> 
> I don't have any hardware to test this on, so I'm not positive that it works,
> but it seems like it should. Just my two cents.  

Your patch works, I've tested it. 
I've decided to take the first patch because
- it was verified by the reporter 
- it will help also our customers with cciss driver directly from HP

Thanks,
Tomas

Comment 8 Lachlan McIlroy 2009-07-02 03:06:49 UTC
Tomas,

I tested your kernel with the test case from IT#302867 and there were no problems so it looks good.

Lachlan

Comment 10 Tomas Henzl 2009-07-02 12:32:13 UTC
Lachlan,
thanks, it's just posted to rhkl.

Comment 11 RHEL Program Management 2009-07-02 12:33:03 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 13 Lachlan McIlroy 2009-07-10 01:09:59 UTC
Tomas,

Any progress on this fix?

Lachlan

Comment 14 Tomas Henzl 2009-07-10 07:57:24 UTC
(In reply to comment #13)

> Any progress on this fix?
It is posted - http://post-office.corp.redhat.com/archives/rhkernel-list/2009-July/msg00074.html
now it is waiting for ack's from other developers.

Comment 17 Vivek Goyal 2009-07-28 17:23:08 UTC
Committed in 89.7.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 22 Gris Ge 2011-01-11 05:23:59 UTC
Reproduced this on kernel-2.6.9-89.EL, got the same kernel panic.

On kernel-2.6.9-95.EL, performed 100000 sg_turs /dev/cciss/c0d0, no error occurred.

Change bug status into verified.

Comment 23 errata-xmlrpc 2011-02-16 15:47:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html