PROBLEM: The multipath executable frequently crashes the system showing: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at cciss:3307 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: mptctl mptbase sg ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) md5 ipv6 netconsole netdum p hp_ilo(U) cpufreq_powersave loop dm_round_robin dm_multipath button battery ac joydev ehci_hcd uhci_hcd i50 00_edac edac_mc hw_random e1000(U) bnx2(U) bonding(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata cciss(U) sd_mod qla2xxx(U) scsi_mod qla2xxx_conf(U) Pid: 8563, comm: multipath Not tainted 2.6.9-78.ELsmp RIP: 0010:[<ffffffffa0176222>] <ffffffffa0176222>{:cciss:do_cciss_request+78} RSP: 0018:0000010687a99568 EFLAGS: 00010012 RAX: 0000010696829678 RBX: 000001069651dac0 RCX: 00000106968a7280 RDX: 00000106968a7280 RSI: 00000106968a7248 RDI: 000001069651dac0 RBP: 0000010696829678 R08: 0000000000000002 R09: 0000000000000060 R10: 000001069651dac0 R11: ffffffffa01761d4 R12: 0000000000000000 R13: 000001069651dac0 R14: 00000106965e0000 R15: 000001069892de00 FS: 0000002a9557c8c0(0000) GS:ffffffff8050d300(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003e2d592c0f CR3: 00000000cfe08000 CR4: 00000000000006e0 Process multipath (pid: 8563, threadinfo 0000010687a98000, task 00000106864d17f0) Stack: 0000000000000000 0000000000000000 0000000000000000 000001069651dac0 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000246 Call Trace:<ffffffff8016375a>{cache_alloc_refill+390} <ffffffff8015e738>{mempool_alloc+129} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff80318304>{thread_return+0} <ffffffff803183c5>{thread_return+193} <ffffffff80140a40>{__mod_timer+293} <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff8027b289>{ide_outb+0} <ffffffff80318304>{thread_return+0} <ffffffff803183c5>{thread_return+193} <ffffffff80140a40>{__mod_timer+293} <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff80286986>{cdrom_timer_expiry+0} <ffffffff8027b289>{ide_outb+0} <ffffffff80286c24>{cdrom_transfer_packet_command+172} <ffffffff80318588>{wait_for_completion+231} <ffffffff80134709>{default_wake_function+0} <ffffffff8015e738>{mempool_alloc+129} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff8015e738>{mempool_alloc+129} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff8025d3cd>{cfq_insert_request+63} <ffffffff8025d220>{cfq_next_request+46} <ffffffff8025481b>{generic_unplug_device+24} <ffffffff80254f99>{blk_execute_rq+166} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff801360cc>{autoremove_wake_function+0} <ffffffff80259107>{sg_io+455} <ffffffff80191f35>{dput+56} <ffffffff802594eb>{scsi_cmd_ioctl+740} <ffffffff80193046>{__d_lookup+287} <ffffffff8018891f>{do_lookup+44} <ffffffff80191f35>{dput+56} <ffffffff80189a56>{__link_path_walk+3562} <ffffffffa017569a>{:cciss:cciss_ioctl+3385} <ffffffff80191f35>{dput+56} <ffffffff801ef3a1>{__up_read+16} <ffffffff8025125a>{kobj_lookup+358} <ffffffff80183514>{do_open+928} <ffffffff80257943>{blkdev_ioctl+1674} <ffffffff8017b095>{__dentry_open+248} <ffffffff8017b246>{filp_open+95} <ffffffff801848fa>{sys_newstat+32} <ffffffff801f1b75>{strncpy_from_user+74} <ffffffff8018dc81>{sys_ioctl+853} <ffffffff801102f6>{system_call+126} ANALYSIS: Analysis of the dump reveals that we BUG() because of the entry shown below: crash> struct request 0000010696829678 struct request { queuelist = { next = 0x1069651dac0, prev = 0x1069651dac0 }, flags = 1120, sector = 1743, nr_sectors = 50, current_nr_sectors = 1, hard_sector = 1743, hard_nr_sectors = 50, hard_cur_sectors = 1, nr_cbio_segments = 1, nr_cbio_sectors = 1, cbio = 0x106941d9880, bio = 0x0, biotail = 0x0, elevator_private = 0x106968a7248, rq_status = 1, rq_disk = 0x1069892de00, errors = 0, start_time = 4327277071, nr_phys_segments = 50, <------- nr_phys_segments = 50 > MAXSGENTRIES nr_hw_segments = 50, tag = -1, buffer = 0x0, ref_count = 2, q = 0x1069651dac0, Which is from do_cciss_request(): 3002 if (creq->nr_phys_segments > MAXSGENTRIES) 3003 BUG(); Further analysis of the IO shows that we were attempting to perform a SCSI TUR (test-unit-ready). I could not reproduce this using multipath but was able to crash the system in the same way by calling sg_turs /dev/cciss/c0d0 in a loop. The multipath binary performs exactly the same ioctl(): From a multipath strace: 7992 open("/dev/cciss/c0d0", O_RDONLY) = 4 7992 ioctl(4, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[00, 00, 00, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=0, info=0}) = 0 From an sg_turs strace: 8024 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[00, 00, 00, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=60000, flags=0, status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=0, info=0}) = 0 Only slight diff is the timeout. In the dumps I produced it shows the same stack trace and seemingly apparent problem, although mine additionally shows evidence of data corruption, eg: crash> struct request 00000103a4f30eb8 struct request { queuelist = { next = 0x101f8e9c080, prev = 0x101f8e9c080 }, flags = 1120, sector = 16514577519336752818, <------------------ All looks bad from here nr_sectors = 17875655937648256291, current_nr_sectors = 3109905447, hard_sector = 17482706148473970798, hard_nr_sectors = 9979233092578643319, hard_cur_sectors = 442092742, nr_cbio_segments = 13892, nr_cbio_sectors = 6024779977099665222, <------ to here!! cbio = 0xe431dfdc03593afe, bio = 0x0, biotail = 0x0, elevator_private = 0x103f809f290, rq_status = 1, rq_disk = 0x103f8fee800, errors = 0, start_time = 17968924045220252110, nr_phys_segments = 25835, <--- causes the BUG() crash, way out of range! nr_hw_segments = 28685, tag = 1596562687, buffer = 0x0, ref_count = 2, q = 0x101f8e9c080, rl = 0x101f8e9c180, waiting = 0x101f759fa18, special = 0x0, cmd_len = 6, cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", data_len = 0, data = 0x0, sense_len = 0, sense = 0x101f759fa78, timeout = 60000, pm = 0xa29fc14af076012d } So it is possible that we are looking at an additional problem or perhaps the same one exacerbated by the frequency of the IOs. I started to analyse the issue to look for root cause but this has still yet to be determined. The customer has been able to work-around this problem in their environment with multipath by adding cciss to the blacklisted devices, eg: devnode_blacklist { # wwid 26353900f02796769 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z][[0-9]*]" # devnode "^cciss!c[0-9]d[0-9]*" <--- uncommenting this should help! } Now multipath no longer accesses the device. REPRODUCER: The system can be made to crash within a few seconds if the following script is executed: # more tur-loop #!/bin/bash ((i=0)) while (( i<100000)) do sg_turs /dev/cciss/c0d0 ((i+=1)) done Investigation results using systemtap revealed the following after a number of iterations of the reproducer script: 0 sg_turs(8792): -> cciss_open (8792, 5860887988348), cpu# 1 14 sg_turs(8792): -> get_host (8792, 5860888011243), cpu# 1 20 sg_turs(8792): -> get_drv (8792, 5860888025666), cpu# 1 26 sg_turs(8792): <- cciss_open (8792, 5860888041781), cpu# 1 87 sg_turs(8792): -> cciss_ioctl (8792, 5860888200681), cpu# 1 92 sg_turs(8792): -> get_host (8792, 5860888212167), cpu# 1 96 sg_turs(8792): -> get_drv (8792, 5860888223285), cpu# 1 101 sg_turs(8792): -> scsi_cmd_ioctl (8792, 5860888237689), cpu# 1 107 sg_turs(8792): -> constant_test_bit (8792, 5860888253856), cpu# 1 111 sg_turs(8792): -> blk_get_queue (8792, 5860888263492), cpu# 1 115 sg_turs(8792): -> atomic_inc (8792, 5860888274862), cpu# 1 120 sg_turs(8792): <- blk_get_queue (8792, 5860888286863), cpu# 1 125 sg_turs(8792): -> sg_io (8792, 5860888299438), cpu# 1 130 sg_turs(8792): -> verify_command (8792, 5860888312890), cpu# 1 134 sg_turs(8792): <- verify_command (8792, 5860888324740), cpu# 1 140 sg_turs(8792): -> blk_get_request (8792, 5860888339244), cpu# 1 145 sg_turs(8792): -> get_request_wait (8792, 5860888353116), cpu# 1 151 sg_turs(8792): -> get_request (8792, 5860888367427), cpu# 1 156 sg_turs(8792): -> get_io_context (8792, 5860888381467), cpu# 1 161 sg_turs(8792): -> atomic_inc (8792, 5860888394839), cpu# 1 166 sg_turs(8792): <- get_io_context (8792, 5860888406646), cpu# 1 170 sg_turs(8792): -> blk_queue_full (8792, 5860888419168), cpu# 1 175 sg_turs(8792): -> constant_test_bit (8792, 5860888429981), cpu# 1 179 sg_turs(8792): -> blk_alloc_request (8792, 5860888441320), cpu# 1 496 sg_turs(8792): -> elv_set_request (8792, 5860888464864), cpu# 1 502 sg_turs(8792): -> cfq_set_request (8792, 5860888481758), cpu# 1 508 sg_turs(8792): -> cfq_get_queue (8792, 5860888495975), cpu# 1 513 sg_turs(8792): -> __cfq_get_queue (8792, 5860888511323), cpu# 1 518 sg_turs(8792): -> __cfq_find_cfq_hash (8792, 5860888524459), cpu# 1 524 sg_turs(8792): -> prefetch (8792, 5860888538224), cpu# 1 528 sg_turs(8792): -> __cfq_find_cfq_hash (8792, 5860888550704), cpu# 1 533 sg_turs(8792): -> prefetch (8792, 5860888562843), cpu# 1 539 sg_turs(8792): -> list_add (8792, 5860888578383), cpu# 1 544 sg_turs(8792): -> __list_add (8792, 5860888591592), cpu# 1 549 sg_turs(8792): <- __cfq_get_queue (8792, 5860888605811), cpu# 1 555 sg_turs(8792): <- cfq_set_request (8792, 5860888619959), cpu# 1 559 sg_turs(8792): <- elv_set_request (8792, 5860888632358), cpu# 1 565 sg_turs(8792): -> ioc_batching (8792, 5860888645983), cpu# 1 570 sg_turs(8792): -> put_io_context (8792, 5860888660562), cpu# 1 576 sg_turs(8792): -> atomic_dec_and_test (8792, 5860888674279), cpu# 1 581 sg_turs(8792): <- put_io_context (8792, 5860888687735), cpu# 1 586 sg_turs(8792): <- get_request (8792, 5860888700379), cpu# 1 591 sg_turs(8792): <- get_request_wait (8792, 5860888712829), cpu# 1 595 sg_turs(8792): <- blk_get_request (8792, 5860888725045), cpu# 1 kernel.function("blk_execute_rq@drivers/block/ll_rw_blk.c:2016") q = 0x000001001fb30080, rq = 0x000001001178ca98, rq->q= 0x000001001fb30080 rq->flags = 0x400 rq->current_nr_sectors =852049 rq->hard_sector = 10010608 rq->hard_nr_sectors = 65 rq->hard_cur_sectors = 2506225280 rq->nr_cbio_segments = 42 rq->nr_cbio_sectors = 655439 rq->cbio= 0x0000002a9561bac0 rq->bio = 0x0000000000000000 rq->biotail= 0x0000000000000000 rq->nr_phys_segments= 47872 rq->nr_hw_segments= 38241 rq->ref_count = 1 610 sg_turs(8792): -> blk_execute_rq (8792, 5860888763412), cpu# 1 616 sg_turs(8792): -> elv_add_request (8792, 5860888779482), cpu# 1 The above request structure will result in the system crashing. Analysis of the functions in the call graph suggest that this might be just a simple case of structure initialisation (or lack thereof) in here: static inline struct request *blk_alloc_request(request_queue_t *q,int gfp_mask) { struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask); if (!rq) return NULL; if (!elv_set_request(q, rq, gfp_mask)) return rq; mempool_free(rq, q->rq.rq_pool); return NULL; } I checked RHEL5 and it looked no different so I checked further upstream and found the following: http://lxr.linux.no/linux+v2.6.29/block/blk-core.c#L649 648static struct request * 649blk_alloc_request(struct request_queue *q, int rw, int priv, gfp_t gfp_mask) 650{ 651 struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask); 652 653 if (!rq) 654 return NULL; 655 656 blk_rq_init(q, rq); <---------------------- 657 658 rq->cmd_flags = rw | REQ_ALLOCED; 659 660 if (priv) { 661 if (unlikely(elv_set_request(q, rq, gfp_mask))) { 662 mempool_free(rq, q->rq.rq_pool); 663 return NULL; 664 } 665 rq->cmd_flags |= REQ_ELVPRIV; 666 } 667 668 return rq; 669} 123void blk_rq_init(struct request_queue *q, struct request *rq) 124{ 125 memset(rq, 0, sizeof(*rq)); <------------ 126 127 INIT_LIST_HEAD(&rq->queuelist); 128 INIT_LIST_HEAD(&rq->timeout_list); 129 rq->cpu = -1; 130 rq->q = q; 131 rq->sector = rq->hard_sector = (sector_t) -1; 132 INIT_HLIST_NODE(&rq->hash); 133 RB_CLEAR_NODE(&rq->rb_node); 134 rq->cmd = rq->__cmd; 135 rq->tag = -1; 136 rq->ref_count = 1; 137} I have therefore modified the code to include a structure initialisation: static inline struct request *blk_alloc_request(request_queue_t *q,int gfp_mask) { struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask); if (!rq) return NULL; memset(rq, 0, sizeof(struct request)); <-------- if (!elv_set_request(q, rq, gfp_mask)) return rq; mempool_free(rq, q->rq.rq_pool); return NULL; } Whether it requires the additional changes to be backported in to RHEL4 I'll leave to RH to determine. However testing of the above change produced zero crashes after 100000 iterations of the test script calling sg_turs.
Created attachment 347516 [details] Patch to initialize entire request structure
While the fix in comment #1 seems correct to me, since this is only happening for the cciss driver, in seems pretty reasonable to fix it there. Looking at do_cciss_request(), it seems like you should be able to fix this with something like --- linux-2.6.9/drivers/block/cciss.c 2009-06-25 09:08:02.000000000 -0500 +++ linux-2.6.9/drivers/block/cciss.c.505506 2009-06-25 10:47:22.000000000 -0500 @@ -3328,7 +3328,7 @@ queue: if (!creq) goto startio; - if (creq->nr_phys_segments > MAXSGENTRIES) + if (creq->bio && creq->nr_phys_segments > MAXSGENTRIES) BUG(); if (( c = cmd_alloc(h, 1)) == NULL) I don't have any hardware to test this on, so I'm not positive that it works, but it seems like it should. Just my two cents.
I'd be inclined to use the suggested patch for the following reasons - it's been tested and is known to fix the problem - it may fix the same issue for other drivers (that don't detect the issue with a BUG()) - it may fix other issues related to other fields in the request structure that are not being initialised - it's what is now being done in mainline
I've built a test kernel in brew (taskID=1869626). It is accessible here -> http://people.redhat.com/thenzl/505506/ , if you need another kernel flavour let me know I can post it also, but my quota is limited. Please retest it with kernel from people/... , so I can then post it to rhkl. And thanks for the analysis and patch.
(In reply to comment #4) > While the fix in comment #1 seems correct to me, since this is only happening > for the cciss driver, in seems pretty reasonable to fix it there. > > I don't have any hardware to test this on, so I'm not positive that it works, > but it seems like it should. Just my two cents. Your patch works, I've tested it. I've decided to take the first patch because - it was verified by the reporter - it will help also our customers with cciss driver directly from HP Thanks, Tomas
Tomas, I tested your kernel with the test case from IT#302867 and there were no problems so it looks good. Lachlan
Lachlan, thanks, it's just posted to rhkl.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Tomas, Any progress on this fix? Lachlan
(In reply to comment #13) > Any progress on this fix? It is posted - http://post-office.corp.redhat.com/archives/rhkernel-list/2009-July/msg00074.html now it is waiting for ack's from other developers.
Committed in 89.7.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Reproduced this on kernel-2.6.9-89.EL, got the same kernel panic. On kernel-2.6.9-95.EL, performed 100000 sg_turs /dev/cciss/c0d0, no error occurred. Change bug status into verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html