Bug 848798 - Oops in libsas on IO error in sas_eh_finish_cmd+0x2f
Summary: Oops in libsas on IO error in sas_eh_finish_cmd+0x2f
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-16 12:33 UTC by Oleg Drokin
Modified: 2013-01-02 13:22 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-02 13:22:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Oleg Drokin 2012-08-16 12:33:34 UTC
I happen to have a system with a bad harddrive in it. It was working "Fine" in FC15 with the latest available kernel from updates for it (as in, once you hit the bad sector, you eventually get IO errors).
Now starting from FC16 install DVD and to FC17 install DVD kernel crashes when trying to access that bad sector:

[17590.128031] sas: command 0xffff88011dd6bf00, task 0xffff880101396140, timed out: BLK_EH_NOT_HANDLED
[17590.128050] sas: Enter sas_scsi_recover_host
[17590.128053] sas: trying to find task 0xffff880101396140
[17590.128055] sas: sas_scsi_find_task: aborting task 0xffff880101396140
[17590.128127] sas: sas_scsi_find_task: querying task 0xffff880101396140
[17590.128129] sas: sas_scsi_find_task: aborting task 0xffff880101396140
[17590.128336] sas: sas_scsi_find_task: querying task 0xffff880101396140
[17590.128340] sas: sas_scsi_find_task: aborting task 0xffff880101396140
[17590.128450] sas: sas_scsi_find_task: querying task 0xffff880101396140
[17590.128453] sas: sas_scsi_find_task: aborting task 0xffff880101396140
[17590.128581] sas: sas_scsi_find_task: querying task 0xffff880101396140
[17590.128584] sas: sas_scsi_find_task: aborting task 0xffff880101396140
[17590.128723] sas: sas_scsi_find_task: querying task 0xffff880101396140
[17590.128728] sas: task 0xffff880101396140 is not at LU: I_T recover
[17590.128730] sas: I_T nexus reset for dev 500304800004fa51
[17590.155338] sas: sas_form_port: phy1 belongs to port0 already(1)!
[17590.629353] sas: I_T 500304800004fa51 recovered
[17590.629357] sas: sas_ata_task_done: SAS error 8d
[17590.629378] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[17590.629563] IP: [<ffffffffa0213cbf>] sas_eh_finish_cmd+0x2f/0x60 [libsas]
[17590.629686] PGD 0 
[17590.629722] Oops: 0002 [#1] SMP 
[17590.629786] CPU 2 
[17590.629821] Modules linked in: aic94xx radeon i2c_algo_bit ttm drm_kms_helper drm ioatdma i2c_i801 parport_pc i2c_core parport serio_raw shpchp dca nls_utf8 libsas e1000e scsi_transport_sas e100 mii sunrpc xts lrw gf128mul sha256_generic dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs edd floppy scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs [last unloaded: aic94xx]
[17590.630306] 
[17590.630306] Pid: 990, comm: scsi_eh_3 Not tainted 3.3.4-5.fc17.x86_64 #1 Supermicro X7DB8/X7DB8
[17590.630306] RIP: 0010:[<ffffffffa0213cbf>]  [<ffffffffa0213cbf>] sas_eh_finish_cmd+0x2f/0x60 [libsas]
[17590.630306] RSP: 0018:ffff8801013c5d60  EFLAGS: 00010286
[17590.630306] RAX: 0000000000000000 RBX: ffff88011dd6b600 RCX: ffff88011dd6b600
[17590.630306] RDX: ffff88011ccc6000 RSI: ffff8801246680e8 RDI: 0000000000000000
[17590.630306] RBP: ffff8801013c5d70 R08: ffff8801246680e8 R09: 0000000180240001
[17590.630306] R10: 0000000001395301 R11: 0000000000000000 R12: ffff880124668010
[17590.630306] R13: 0000000000000004 R14: ffff88011d347000 R15: ffff88011dd6a900
[17590.630306] FS:  0000000000000000(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000
[17590.630306] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17590.630306] CR2: 000000000000001c CR3: 0000000001a05000 CR4: 00000000000006e0
[17590.630306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17590.630306] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17590.630306] Process scsi_eh_3 (pid: 990, threadinfo ffff8801013c4000, task ffff880127f68000)
[17590.630306] Stack:
[17590.630306]  ffff88012798be00 ffff88011dd6a900 ffff8801013c5e20 ffffffffa02152e6
[17590.630306]  0000000000000000 0000000000013580 ffff8801233b3f00 ffff880124668010
[17590.630306]  ffff880124668010 ffff88011ccc6000 ffff88011d347000 ffff8801013c5db8
[17590.630306] Call Trace:
[17590.630306]  [<ffffffffa02152e6>] sas_scsi_recover_host+0x9a6/0x9c0 [libsas]
[17590.630306]  [<ffffffff813bdb53>] scsi_error_handler+0x143/0x6c0
[17590.630306]  [<ffffffff813bda10>] ? scsi_eh_get_sense+0x1d0/0x1d0
[17590.630306]  [<ffffffff81078843>] kthread+0x93/0xa0
[17590.630306]  [<ffffffff815f4ca4>] kernel_thread_helper+0x4/0x10
[17590.630306]  [<ffffffff810787b0>] ? flush_kthread_worker+0x80/0x80
[17590.630306]  [<ffffffff815f4ca0>] ? gs_change+0x13/0x13
[17590.630306] Code: 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 8b 17 48 8b 87 d8 00 00 00 48 89 fb 48 8b 12 48 89 c7 4c 8b a2 b0 06 00 00 <83> 60 1c fb ff 90 60 01 00 00 48 89 df 49 8d b4 24 d8 00 00 00 
[17590.630306] RIP  [<ffffffffa0213cbf>] sas_eh_finish_cmd+0x2f/0x60 [libsas]
[17590.630306]  RSP <ffff8801013c5d60>
[17590.630306] CR2: 000000000000001c
[17590.757544] ---[ end trace ac662b0b36c4e5a4 ]---

the SAS controller is:
04:02.0 Serial Attached SCSI controller: Adaptec AIC-9410W SAS (Razor ASIC non-RAID) (rev 09)
that s served by aic94xx module.

I did some searches and this appears to be a not yet reported problem.
I know that I need to replace the drive in question which I will do eventually, but oopsing like that is a bad idea too and needs to be fixed.
At the moment it's not very easy for me to test any fixes as the system is in a sort of a bad state some 6000km away, but I can get people in there trying new stuff from time to time.

Comment 1 Josh Boyer 2012-11-28 14:43:41 UTC
[17590.630306] Pid: 990, comm: scsi_eh_3 Not tainted 3.3.4-5.fc17.x86_64 #1 Supermicro X7DB8/X7DB8

3.3.4 is fairly old.  Do you see this with 3.6.x?


Note You need to log in before you can comment on or make changes to this bug.