Bug 653706
I talked with Mike a little about this. The problem here is that the host reset ends up removing all the devices/paths (even the good ones). That is most likely an mpt driver problem, as opposed to the layers above. Andrius, please add the right contacts from LSI to this BZ. (In reply to comment #0) > Created attachment 460697 [details] > In this test I set 3 Luns. One is for OS and two are for data. And there are two paths to each LUN? (An sosreport would help us interpret the logs better.) Has this problem been seen on other kernel/o.s. versions? Yes, this is a dual storage controller system using ALUA, so there are two paths to each LUN, one active and one passive. If the active path goes down the standby path should become active. We can also produce the problem on Ubuntu 10.4 using 2.6.32 kernel. We do not seem to see the problem so far on SLES11 SP1. We never saw the problem on RH 5.x. I'm not familiar with sosreport... if there is additional debug data we can provide, please provide some more details, and we'll do our best to gather it. (In reply to comment #3) > I'm not familiar with sosreport... if there is additional debug data we can > provide, please provide some more details, and we'll do our best to gather it. man sosreport If just generates a tarball with the output of a bunch of "show" commands, /var/log/messages, and so on. Attached log is get by "console redirect to serial" and It is equivalent with "/var/log/message". Main reason we use "console redirect mechanism" is that system can not save error log (mpt/scsi) when error occured. We use SAN stroage for OS, and OS cannot access /var/log/message when this error occured. So it is impossible to save log /var/log/message in error case. Please, check attached log. Thanks. Hi,Tom sorry, I'm confused about previous comment. Whatever, In error case, system already is in hang status. So we cannot run sosreport. We can only run soreport after reboot. Is that OK with you? Thanks. (In reply to comment #6) > Hi,Tom > We can only run soreport after reboot. Is that OK with you? Yes, that is what I had in mind - a view of the system in its "normal" state. This will give a bit of background on version numbers and how things are connected (multipath -v4 -ll) and such. Created attachment 462780 [details]
RHEL6 32bit SOS-report files
This RHEL6 32bit sosreport result.
Created attachment 462781 [details]
RHEL6 32bit multipath information (multipath -ll/v4, dmsetup ls --tree)
This file save the result of (multipath -ll/ dmsetup ls --tree/ multpath -v4).
Created attachment 462782 [details]
RHEL6 64bit SOS-report files
This file save reulst of sosrepot in RHEL6 64bit.
Created attachment 462783 [details]
RHEL6 64bit multipath information (multipath -ll/v4, dmsetup ls --tree)
This file save result of multipath -ll / dmsetup ls --tree/ multipath -v4 in RHEL6 64bit.
Hi, Tom I update sosreport file and multpath information file for the both 32bit and 64bit in Normal status. Thanks. Hi Kashyap, per comment#2 it looks like the problem is in the mptsas driver. Please take a look at it. (In reply to comment #13) > Hi Kashyap, > > per comment#2 it looks like the problem is in the mptsas driver. > Please take a look at it. I am going through the contents right now. Meanwhile can someone explain test scenario ? And yes please enable driver logging level to 0x8188. ~ Kashyap Hi, kashyap. Intel multiplex has dual storage controller and it supports ALUA feature. So we supports dual path with alua feature. In this test, system is configured with three LUNs- one is for OS(RHEL6) and others are for data. we create I/O for the data LUNs and then we test failover/failback by rebooting each storage contorller. Hi, Kashyap. You want to set mpt_debug_level as 0x8188. Is that right? Created attachment 463589 [details]
RHEL6 32bit serial console log with mptbase debug (0x8188) and scsi_logging (0x124)
This file get with setting - mpt debug level (0x8188) scsi logging level (0x124).
Created attachment 463590 [details]
RHEL6 64bit serial console log with mptbase debug (0x8188) and scsi_logging (0x124)
This file get with seeting - mptbase debug level (0x8188) and scsi logging level(0x124).
Hi, Kashyap. I attached log file. Thanks. ilgu hong, Your input helps me. I wanted exactly same logfiles. Here is my analysis... [Search for "kashyap" ] -- sg_remove: sg0 sg_device_destroy: sg0 device-mapper: multipath: Failing path 8:0. sd 0:0:3:0: [sda] Synchronizing SCSI cache <-- "Kashyap": sda is removed and it is assigned to 0:0:3:0 mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Exend_request: I/O error, dev sdb, sector 12887056 device-mapper: multipath: Failing path 8:16. mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done mptsas: ioc0: TaskMgmt completed: (mf = ffff8800778c8400, mr = ffff8800778c1720) mptsas: ioc0: TaskMgmt completed: fw_channel = 0, fw_id = 0, task_type = 0x03, iocstatus = 0x0000 loginfo = 0x00000000, response_code = 0x00, term_cmnds = 10 mptsas: ioc0: TaskMgmt: completed (0 seconds) mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800782d8480) Error handler scsi_eh_0 waking up sd 0:0:3:0: scsi_eh_prt_fail_): 2a 00 01 d4 11 70 00 00 08 00 // "Kashyap": Task abort is for same sda device is started here.....(See it is 0:0:3:0 device) mptscsih: ioc0: TaskMgmt request (mf=ffff8800778c8500) mptscsih: ioc0: TaskMgmt: ctx2abort (0x00000000) task_type = 0x04, timeout = 10 mptbase: ioc0: MPT event:(17h) : SAS Broadcase Primative: phy=3 port=3 width=1 primative=0x04 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800786ca200) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptscsih: ioc0: TaskMgmt completed (mf=ffff8800778c8500, mr=ffff8800778c18b0) mptscsih: ioc0: TaskMgmt fw_channel = 0, fw_id = 0, task_type = 0x04, iocstatus = 0x0000, loginfo = 0x00000000, response_code = 0x00, term_cmnds = 0 - Error handling is started for device which is a deleting at the same time.. Due to race in EEH it is happening...To avoid this mptsas driver has callback "eh_timed_out" It has been done using below Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=615866 But most of the time it happens only when Device Missing Delay timer is ON. (it means chances are high when DMD is on, but very rare we have seen similar root cause for device offline issue). In this case without DMD also we are seeing similar issue reported in bugzilla 615866. Please try the attached patch.! Thanks, Kashyap Created attachment 463699 [details]
Avoid Error handling when device is deleting
Hi Kashyap- We are looking into adding the patch and retesting RHEL6 to see if this helps us. I did want to note that we previously mentioned we also see this problem on Ubuntu 10.4 and in that case we are using the latest upstream and already have this patch. We'll still try this patch in the context of RHEL6, I just wanted to share the concern that this may not be a full solution for the problem. The comments below are from one of the developers working on that Ubuntu build: The bad news is we already have this fix because we've been using the latest upstream, so it's something else. diff -Naurp fusion_orig/mptsas.c fusion/mptsas.c --- fusion_orig/mptsas.c 2010-12-01 03:37:24.102545034 +0530 +++ fusion/mptsas.c 2010-12-01 03:38:28.458582184 +0530 @@ -1925,7 +1925,8 @@ static enum blk_eh_timer_return mptsas_e } vdevice = sc->device->hostdata; - if (vdevice && vdevice->vtarget && vdevice->vtarget->inDMD) { + if (vdevice && vdevice->vtarget && (vdevice->vtarget->inDMD + || vdevice->vtarget->deleted)) { dtmprintk(ioc, printk(MYIOC_s_WARN_FMT ": %s: target removed " "or in device removal delay (sc=%p)\n", ioc->name, __func__, sc )); and my mptsas tree: vdevice = sc->device->hostdata; if (vdevice && vdevice->vtarget && (vdevice->vtarget->inDMD || vdevice->vtarget->deleted)) { dtmprintk(ioc, printk(MYIOC_s_WARN_FMT ": %s: target removed " "or in device removal delay (sc=%p)\n", ioc->name, __func__, sc)); rc = BLK_EH_RESET_TIMER; goto done; } The concept here is to reset the timer on the scmd so that it stays out of the error handler long enough to be reconciled by the hotplug removal. Problem is, you can only reset the timer so many times, perhaps 3, no more than 5. Created attachment 464085 [details]
Test log with mptsas patch
This log include 3 files.
RHEL6-32.log is serail console log for 32bit host, and it looks fine with mptsas patch.
RHEL6-64.log is serial console log for 64bit host, and it shows new error symptom.
error-var-log-message-RHEL6-64.txt is /var/log/message for 64bit host, and I just added this file for the conveninet of debugging.
Hi Kashyap, I checked your patch. It looks like there is little improvement on MPTsas behavior but it also creates new symptoms. I tested your patch with both 32bit and 64bit RHEL6. In 32bit RHEL6 host, works fine more than 23 iteration (each iteration crates 5 times failover/failback for each controller) and it still works fine. But 65bit RHEL6 host, It looks like go to endless loop in the Mptsas driver/firmware. check below log. In the below log, mptsas_eh_timed_out handler contineously called event though failback finished, and it looks like mptsas go to abnormal status and does not works. I attached serial console log file and I also attached /var/log/message for the convenient for your debug. In the RHEL6-64.log serial log, error symptom starts at the line 5665. In the error-var-log-message-RHEL6-64.txt log, error symptom starts at the line 12585. < Error symptom - serial log for RHEL6 -64> <<======================== failover end_device-0:83:1: mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 0, phy 13,sas_addr 0x5000155000020408 phy-0:83:6470DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:83:0: [sda] CDB: Write(10): 2a 00 02 14 6f b0 00 00 30 00 end_request: I/O error, dev sda, sector 34893744 scsi 0:0:83:0: mptscsih: ioc0: completing cmds: fw_channel 0, fw_id 0, sc=ffff88003e4dec80, mf = ffff880077e8a580, idx=fb scsi 0:0:83:0: [sda] Unhandled error code scsi 0:0:83:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:83:0: [sda] CDB: Write(10): 2a 00 01 96 65 70 00 00 08 00 end_request: I/O error, dev sda, sector 26633584 scsi 0:0:83:0: [sda] Unhandled error code <<=========================== sg1 removed sg_remove: sg1 sg_device_destroy: sg1 scsi 0:0:83:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:83:0: [sda] CDB: Write(10): 2a 00 01 55 b4 90 00 00 08 00 end_request: I/O error, dev sda, sector 22394000 sd 0:0:83:1: [sdb] Synchronizing SCSI cachcode sd 0:0:83:1: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 0:0:83:1: [sdb] CDB: Read(10): 28 00 00 01 ec 98 00 00 08 00 end_request: I/O error, dev sdb, sector 126104 device-mapper: multipath: Failing path 8:16. <<=========================== mptsas eh timeout handler mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) sd 0:0:85:0: alua: port group 01 state S supports touSnA sd 0:0:85:1: alua: port group 01 state S supports touSnA sd 0:0:85:1: alua: port group 01 switched to state A sd 0:0:85:0: alua: port group 01 switched to state A sd 0:0:85:0: alua: port group 01 state A supports touSnA scsi 0:0:83:0: alua: Detached sd 0:0:83:1: alua: Detached sd 0:0:85:1: alua: port group 01 state A supports touSnA device-mapper: multipath: Failing path 8:32. <<============================ mptsas eh timeout handler mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) <<============================ failback mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880004147980) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Added mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880004147740) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880004147d40) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880004147ec0) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=0 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880004147680) <<============================ still see : mptsas eh timeout handler mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) INFO: task mpt/0:366 blocked for more than 120 seconds. "echo 08/0x80 [ext4] [<ffffffffa0254249>] ext4_orphan_add+0xb9/0x1e0 [ext4] [<ffffffffa021d4e5>] ? jbd2_journal_start+0xb5/0x100 [jbd2] [<ffffffffa0252a60>] ext4_setattr+0x240/0x390 [ext4] [<ffffffff81188318>] notify_change+0x168/0x340 [<ffffffff8116ab84>] do_truncate+0x64/0xa0 [<ffffffff8117d468>] do_filp_open+0x748/0xd40 [<ffffffff81171b14>] ? cp_new_stat+0xe4/0x100 [<ffffffff811892f2>] ? alloc_fd+0x92/0x160 [<ffffffff81169929>] do_sys_open+0x69/0x140 [<ffffffff81169a40>] sys_open+0x20/0x30 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88007b2a43c0) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Deleted mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880040a8e6c0) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003780a200) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077f64440) mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Deleted: id=0 channel=0 mptsas: ioc0: TaskMgmt request (mf=ffff880077e86f00) mptsas: ioc0: TaskMgmt type=3 (sas device delete) fw_channel = 0 fw_id = 0) mptbase: ioc0: clearing discovery_quiesce_io flag mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptsas: ioc0: TaskMgmt completed: (mf = ffff880077e86f00, mr = ffff880077e82030) mptsas: ioc0: TaskMgmt completed: fw_channel = 0, fw_id = 0, task_type = 0x03, iocstatus = 0x0000 loginfo = 0x00000000, response_code = 0x00, term_cmnds = 0 mptsas: ioc0: TaskMgmt: completed (0 seconds) mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800137a4ec0) INFO: task mpt/0:366 blocked for more than 120 seconds. "echo 0toremove_wake_function+0x0/0x40 [<ffffffff8120bf4b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20 [<ffffffff8116c818>] vfs_write+0xb8/0x1a0 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116d251>] sys_write+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88007b2a4480) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Added mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880052ddacc0) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800782098c0) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88007b2a4180) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=0 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880078209740) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800438b6c80) (In reply to comment #22) > Hi Kashyap- > > We are looking into adding the patch and retesting RHEL6 to see if this helps > us. > > I did want to note that we previously mentioned we also see this problem on > Ubuntu 10.4 and in that case we are using the latest upstream and already have > this patch. We'll still try this patch in the context of RHEL6, I just wanted > to share the concern that this may not be a full solution for the problem. The > comments below are from one of the developers working on that Ubuntu build: > > The bad news is we already have this fix because we've been using the latest > upstream, so it's something else. > > diff -Naurp fusion_orig/mptsas.c fusion/mptsas.c > --- fusion_orig/mptsas.c 2010-12-01 03:37:24.102545034 +0530 > +++ fusion/mptsas.c 2010-12-01 03:38:28.458582184 +0530 > @@ -1925,7 +1925,8 @@ static enum blk_eh_timer_return mptsas_e > } > > vdevice = sc->device->hostdata; > - if (vdevice && vdevice->vtarget && vdevice->vtarget->inDMD) { > + if (vdevice && vdevice->vtarget && (vdevice->vtarget->inDMD > + || vdevice->vtarget->deleted)) { > dtmprintk(ioc, printk(MYIOC_s_WARN_FMT ": %s: target removed " > "or in device removal delay (sc=%p)\n", > ioc->name, __func__, sc )); > > and my mptsas tree: > > vdevice = sc->device->hostdata; > if (vdevice && vdevice->vtarget && (vdevice->vtarget->inDMD > || vdevice->vtarget->deleted)) { > dtmprintk(ioc, printk(MYIOC_s_WARN_FMT ": %s: target removed " > "or in device removal delay (sc=%p)\n", > ioc->name, __func__, sc)); > rc = BLK_EH_RESET_TIMER; > goto done; > } > > The concept here is to reset the timer on the scmd so that it stays out of the > error handler long enough to be reconciled by the hotplug removal. Problem is, > you can only reset the timer so many times, perhaps 3, no more than 5. I am about to look at new logs provided in this bugzilla, but in parallel can we have test result of another two patches ? I have already proposed those patches for Ubuntu 10.4. I have attached both the patches. ~ Kashyap Created attachment 464262 [details]
4x.100 patch
Created attachment 464263 [details]
4x.101 patch
Hi Kashyap, Those new patches are not made from RHEL6's message/funsion/ source tree. Please, make a patch for RHEL6 and re-attached those patches. Thanks. Created attachment 464388 [details]
RHEL6 variant of .100 patch
Created attachment 464389 [details]
RHEL6 variant of .101 patch
Created attachment 464685 [details]
failed test serial log for 100.patch
This file get with 100 path and 64bit RHEL6.
I enabled 0x8188 debug level for mptbase and 0x124 for scsi_mod.
Created attachment 464686 [details]
failed test /var/log/message for RHEL6 64bit
This is system /var/log/message for 100.patch with RHEL6 64bit.
Hi Kashyap- I still see same problem with "100 patch" in the RHEL6 64bit. One thing interesting is that it still occured only 64bit RHEL6 and it also occured almost same iteration. Is there any problem related with 64bit supporting? I attached log for this- see line 5792 at the RHEL6-64-100.log and see also line 12537 at the var_log_message_rhel6-64-100-patch.txt Please, check below. <<<<<< ======================== failover start ioc0: TaskMgmt type=3 (sas device delete) fw_channel = 0 fw_id =nt=ffff8800520e99c0 timeout + 60HZ sd 0:0:81:1: [sdb] Synchronizing SCSI cache mptscsih: ioc0: WARNING - qcmd: SCpnt=ffff8800520e99c0 timeout + 60HZ mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Exfw_event=0xffff88003902cb00) sd 0:0:82:0: alua: port group 01 state A supports touSnA sd 0:0:82:1: alua: port group 01 state A supports touSnA sd 0:0:81:1: alua: Detached sd 0:0:82:1: alua: port group 01 state A supports touSnA scsi 0:0:81:0: alua: Detached sd 0:0:82:0: alua: port group 01 state A supports touSnA sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:82:2: alua: port group 01 state A supports touSnA sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 <<<<<< =========== mptsas_eh_timed_out called mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 <<<<<< ========== failover mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003902ca40) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Added mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003902ce00) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003902c740) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003902c2c0) <<<<<< ============ another failover mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=0 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003902c200) <<<<<< =========== still see mptsas_eh_timed_out called with same sc( scsi command) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) INFO: task mpt/0:347 blocked for more than 120 seconds. "echo 075>] inode_setattr+0x35/0x170 [<ffffffffa02529a6>] ext4_setattr+0x186/0x390 [ext4] [<ffffffff81188318>] notify_change+0x168/0x340 [<ffffffff8116ab84>] do_truncate+0x64/0xa0 [<ffffffff8117d468>] do_filp_open+0x748/0xd40 [<ffffffff81171b14>] ? cp_new_stat+0xe4/0x100 [<ffffffff811892f2>] ? alloc_fd+0x92/0x160 [<ffffffff81169929>] do_sys_open+0x69/0x140 [<ffffffff81169a40>] sys_open+0x20/0x30 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=3: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880033f509c0) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Deleted mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880033f50a80) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880033f50840) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075c93480) mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Deleted: id=1 channel=0 mptsas: ioc0: TaskMgmt request (mf=ffff880077a89900) mptsas: ioc0: TaskMgmt type=3 (sas device delete) fw_channel = 0 fw_id = 1) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptsas: ioc0: TaskMgmt completed: (mf = ffff880077a89900, mr = ffff880077a800a0) mptsas: ioc0: TaskMgmt completed: fw_channel = 0, fw_id = 1, task_type = 0x03, iocstatus = 0x0000 loginfo = 0x00000000, response_code = 0x00, term_cmnds = 0 mptsas: ioc0: TaskMgmt: completed (0 seconds) mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880012c2ba40) INFO: task mpt/0:347 blocked for more than 120 seconds. "echo 0/0xe0 [<ffffffff8107112c>] ? do_setitimer+0x1ac/0x220 [<ffffffff81180619>] sys_getdents+0x89/0xf0 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) Hi,all. There is typo at the upper comment. change "failover" to "failback" " <<<<<< =========== mptsas_eh_timed_out called mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 <<<<<< ========== failover " <<<<<< =========== mptsas_eh_timed_out called mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff880031e1de80) sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 sd 0:0:81:2: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00 <<<<<< ========== failback Created attachment 464688 [details]
failed test serial log for 101.patch with rhel6 64bit
This serial log get with 0x8188 mptbase debug level and 0x124 scsi_mod loggig level.
Created attachment 464689 [details]
failed test /var/log/message for RHEL6 64bit with 101 patch
/var/log/message for RHEL6 64bit host with 101 patch
Hi Kashyap- I see mptsas driver makes kernel panic and go to the mal status with "101 patch". It looks like more unstable than 101 patch. I attached log files. see line 1107 at the RHEL6-64-101.log and see line 979 at the var_log_message_rhel6_64_101-patch.txt. see below: <<<<<=================== failover end_device-0:7:1: mptsas: ioc0: removing ssp device: fw_choval delay (sc=ffff880005446dc0) scsi 0:0:7:0: [sda] Unhandled error code sg_remove: sg1 sg_device_destroy: sg1 scsi 0:0:7:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:7:0: [sda] CDB: Write(10): 2a 00 01 95 c2 60 00 00 10 00 end_request: I/O error, dev sda, sector 26591840 sd 0:0:7:1: [sdb] Synchronizing SCSI cache mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Exmoved or in device removal delay (sc=ffff8800261696c0) sd 0:0:8:1: alua: port group 01 state A supports touSnA scsi 0:0:7:0: alua: Detached sd 0:0:8:0: alua: port group 01 state A supports touSnA sd 0:0:7:1: alua: Detached sd 0:0:8:1: alua: port group 01 state A supports touSnA device-mapper: multipath: Failing path 8:32. <<<<<================== mptsas_eh_timed_out is called. mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) <<<<<=================== faiback mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98d80) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Added mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98180) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98540) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98f00) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=0 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98600) <<<<<================ still see mptsas_eh_timed_out is called mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) INFO: task mpt/0:351 blocked for more than 120 seconds. <<<<================= mptsas makes oops. "echo 0fffffa01d15bb>] mptsas_expander_delete+0x12b/0x320 [mptsas] [<ffffffffa01d4b50>] ? mptsas_firmware_event_work+0x0/0xf9c [mptsas] [<ffffffffa01d571e>] mptsas_firmware_event_work+0xbce/0xf9c [mptsas] [<ffffffff814c8286>] ? thread_return+0x4e/0x778 [<ffffffff81091f8e>] ? prepare_to_wait+0x4e/0x80 [<ffffffffa01d4b50>] ? mptsas_firmware_event_work+0x0/0xf9c [mptsas] [<ffffffff8108c610>] worker_thread+0x170/0x2a0 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108c4a0>] ? worker_thread+0x0/0x2a0 [<ffffffff81091936>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff810918a0>] ? kthread+0x0/0xa0 [<ffffffff810141c0>] ? child_rip+0x0/0x20 mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) INFO: task mpt/0:351 blocked for more than 120 seconds. "echo 0nt_fs_time+0x27/0x30 [<ffffffff8112889d>] ? shmem_xattr_security_get+0x1d/0x30 [<ffffffff8110e230>] __generic_file_aio_write+0x250/0x480 [<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff811a339c>] blkdev_aio_write+0x3c/0xa0 [<ffffffff8116c51a>] do_sync_write+0xfa/0x140 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8120bf4b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811ff3b6>] ? security_file_permission+0x16/0x20 [<ffffffff8116c818>] vfs_write+0xb8/0x1a0 [<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff8116d251>] sys_write+0x51/0x90 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=3: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2800) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Deleted mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2b00) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2d40) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2ec0) mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Deleted: id=1 channel=0 mptsas: ioc0: TaskMgmt request (mf=ffff88007750a200) mptsas: ioc0: TaskMgmt type=3 (sas device delete) fw_channel = 0 fw_id = 1) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptsas: ioc0: TaskMgmt completed: (mf = ffff88007750a200, mr = ffff880077500c80) mptsas: ioc0: TaskMgmt completed: fw_channel = 0, fw_id = 1, task_type = 0x03, iocstatus = 0x0000 loginfo = 0x00000000, response_code = 0x00, term_cmnds = 0 mptsas: ioc0: TaskMgmt: completed (0 seconds) mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2bc0) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=3: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880037644ec0) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Added mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2440) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880037644d40) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate 3.0 Gpbs mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2500) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Stop mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=1 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed22c0) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) mptbase: ioc0: MPT event:(16h) : SAS Discovery: Start mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=2: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2e00) mptbase: ioc0: MPT event:(1Bh) : Expander Status Change: Deleted mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880077ed2140) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=6: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800599cc080) mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=13: Rate Unknown mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff8800599cc380) mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Deleted: id=0 channel=0 ilgu hong, Your attached logs is having all prints which we wanted to debug so I will say logs are fine. Now, When I analyzed both the logs for .100 and .101, symptoms of the issue is same. (though you have reported .101 is having kernel crash, it is not actually kernel crash, but mpt/0 is hang for more than 120 seconds) 1. Very first thing to mention here is, we have not hit the "SAME" issue this time as reported earlier that "Device goes OFFLINE" * I have not seen any single Task abort (Error handling was never kicked off). It means whatever patches I have give (.100 and .101) were only active only if there is a Error handling kicked off. To me it looks like even without the patch, also you may hit this new issue [ mpt/0 hang for more than 120 second ] NOTE : Once mpt/0 is hanged we will not see any further device ADD/REMOVE to OS. 2. Why "mpt/0" Hangs ? I have very basic pointer on this. Once driver receive "Expander deleted" event as part of failover, It will remove all three associate LUN and later "expander" will be removed.. _BUT_ somehow expander_delete is not able to remove device and it hangs (see below stack dump) --- ptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=0 channel=0 mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880075f98600) mptsas: ioc0: WARNING - : mptsas_eh_timed_out: target removed or in device removal delay (sc=ffff8800261696c0) INFO: task mpt/0:351 blocked for more than 120 seconds. "echo 0fffffa01d15bb>] mptsas_expander_delete+0x12b/0x320 [mptsas] [<ffffffffa01d4b50>] ? mptsas_firmware_event_work+0x0/0xf9c [mptsas] [<ffffffffa01d571e>] mptsas_firmware_event_work+0xbce/0xf9c [mptsas] [<ffffffff814c8286>] ? thread_return+0x4e/0x778 [<ffffffff81091f8e>] ? prepare_to_wait+0x4e/0x80 ---- I would suggest, if we can reproduce original issue without any of the (.100/.101) patch, and later we can verify if that issue goes away after applying .100/.101 patch, we can resolve it and as I have mentioned to target this issue(mpt/0 hangs), we can open another bugzilla and track it over there. What is you suggestion ? Thanks, Kashyap (In reply to comment #38) > What is you suggestion ? ilgu? I'm OK with kashyap's suggestion, but I wonder whether LSI/RedHat knows the root cause of previous problem. Whatever, I have no right to decide closing/opening BT in RHEL. Please, ask Joe. HI, Joe. What do you think about Kashyap's suggestion. thanks. If mpt/0 hang is a separate issue, it makes sense to track it in a separate bug. As far as this bug, I don't think we've verified .100 or .101 fix the problem, so this needs to remain open. Are these patches based on the 4.x branch? Perhaps we would have more luck with patches against the 3.x branch (where the problem was originally observed). This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Ilgu, it look to so now that with the attachment 463699 [details] applied the very original issue is gone (comment #24). this without the .100 and .101 patch. Please correct me if I'm wrong. For the other issue reported in comment #24 the mpt/0 hang, where we don't have a solution yet, please open a new bugzilla. You can use 'clone this bug'. I can do that, if you agree with my conclusion. Hi, Josep. What do you think about upper comment? Thansk. That seems OK. We can close this based on the patch and open a new bug to chase the mpt/0 hang we see with the patch applied. Tomas, I'll take you up on your offer to clone the bug, if you don't mind doing that for us. Thanks. (In reply to comment #45) > That seems OK. We can close this based on the patch and open a new bug to > chase the mpt/0 hang we see with the patch applied. Tomas, I'll take you up on > your offer to clone the bug, if you don't mind doing that for us. I'm going to do that, but there are some problems with the bugzilla settings right now. I've to ask for help an administrator, please be patient. (In reply to comment #45) > That seems OK. We can close this based on the patch and open a new bug to > chase the mpt/0 hang we see with the patch applied. Tomas, I'll take you up on > your offer to clone the bug, if you don't mind doing that for us. It's a little bit better, when the real reporter does this - I'm the reporter now in the new bug, but I hope that it is not so important. (In reply to comment #46) > I'm going to do that, but there are some problems with the bugzilla settings > right now. I've to ask for help an administrator, please be patient. Bugzilla now works for me - thanks - new bug is created - bz#672278 This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. The original issue is solved with the inDMD patch, this is planned for next release. I'm closing this one now. *** This bug has been marked as a duplicate of bug 642618 *** |
Created attachment 460697 [details] logs In this test I set 3 Luns. One is for OS and two are for data. Symptom : It looks like RHEL 6 run error handler routine before recognize third LUN removed and it makes Host reset. And Host reset remove survival path and then system lost all available path. See below for detail. <<<<<<<<<<<<<< first lun is remove by failover sg_remove: sg0 sg_device_destroy: sg0 sd 0:0:67:0: [sda] Synchronizing SCSI cache sd 0:0:67:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:67:0: [sda] Unhandled error code device-mapper: multipath: Failing path 8:0. <<<<<<<<<<<<<< Second lun is removed by failover sg_remove: sg1 sg_device_destroy: sg1 device-mapper: multipath: Failing path 8:16. device-mapper: multipath: Failing path 8:32. scsi 0:0:67:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi 0:0:67:0: [sda] CDB: Write(10): 2a 00 00 62 4d 58 00 00 08 00 end_request: I/O error, dev sda, sector 6442328 sd 0:0:67:1: [sdb] Synchronizing SCSI cache mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done <<<<<<<<<<<<<< SCSI error handler routine is called (target reset) Error handler scsi_eh_0 waking up sd 0:0:67:2: scsi_eh_prt_failet 59 scsi_eh_0: Target reset failed target: 59 scsi_eh_0: Sending target reset to target 60 scsi_eh_0: Target reset failed target: 60 scsi_eh_0: Sending target reset to target 61 scsi_eh_0: Target reset failed target: 61 scsi_eh_0: Sending target reset to target 62 scsi_eh_0: Target reset failed target: 62 scsi_eh_0: Sending target reset to target 63 scsi_eh_0: Target reset failed target: 63 scsi_eh_0: Sending target reset to target 64 scsi_eh_0: Target reset failed target: 64 scsi_eh_0: Sending target reset to target 65 scsi_eh_0: Target reset failed target: 65 scsi_eh_0: Sending target reset to target 66 scsi_eh_0: Target reset failed target: 66 scsi_eh_0: Sending target reset to target 67 scsi_eh_0: Target reset failed target: 67 scsi_eh_0: Sending BRST chan: 0 <<<<<<<<<<<<<<<<<< bus reset scsi_try_bus_reset: Snd Bus RST mptscsih: ioc0: attempting bus reset! (sc=c5938e00) sd 0:0:67:2: [sdc] CDB: Read(10): 28 00 01 66 f9 90 00 00 08 00 mptscsih: ioc0: bus reset: SUCCESS (sc=c5938e00) <<<<<<<<<<<<<< HBA report DID_DISCONNECT , but it looks like scsi error handler routine does not care about it). scsi_eh_done scmd: c5938e00 result: 10000 scsi_send_eh_cmnd: scmd: c5938e00, timeleft: 10000 scsi_send_eh_cmnd: scsi_eh_completed_normally 2003 scsi_eh_tur: scmd c5938e00 rtn 2003 <<<<<<<<<<<<<<<<< Host reset scsi_eh_0: Sending HRST scsi_try_host_reset: Snd Host RST mptscsih: ioc0: attempting host reset! (sc=c5938e00) mptbase: ioc0: Initiating recovery mptscsih: ioc0: host reset: SUCCESS (sc=c5938e00) scsi_eh_done scmd: c5938e00 result: 10000 scsi_send_eh_cmnd: scmultipath: Failing path 8:80. mptbase: ioc0: LogInfo(0x30030501sd 0:0:68:0: [sdd] Synchronizing SCSI cache sd 0:0:68:0: [sdd] 0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501) cb_idx mptbase_reply mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501) cb_idx mptbase_reply mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501) cb_idx mptbase_reply mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 0 GiB) <<<<<<<<<<<<<<<<<<<< All paths are removed by Host reset and then new path added. sdb: sd 0:0:69:2: [sdc] Write Protect is off sd 0:0:69:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA sdc: unknown partition table sd 0:0:69:0: [sda] Write Protect is off sd 0:0:69:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:69:2: [sdc] Attached SCSI disk sda: sda1 sda2 unknown partition table sd 0:0:69:0: [sda] Attached SCSI disk sd 0:0:69:1: [sdb] Attached SCSI disk mptsas: ioc0: add expander: num_phys 25, sas_addr (0x5000423e2e4c52ff) mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 1, phy 1 sd 0:0:70:2: [sdi] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB) sd 0:0:70:2: [sdi] Write Protect is off sd 0:0:70:2: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:70:0: [sdg] Write Protect is off sdi:Dev sdi: unable to read RDB block 0 unable to read partition table sd 0:0:70:1: [sdh] Attached SCSI disk sd 0:0:70:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:70:2: [sdi] Attached SCSI disk sdg:Dev sdg: unable to read RDB block 0 unable to read partition table sd 0:0:70:0: [sdg] Attached SCSI disk <<<<<<<<<<<<<<<<<<<<< kernel panic INFO: task jbd2/dm-3-8:622 blocked for more than 120 seconds. ace+0xa0/0xb0 [<c059e462>] ? selinux_file_alloc_security+0x42/0xc0 [<c052ada1>] ? path_walk+0x51/0xc0 [<c052af29>] ? do_path_lookup+0x59/0x90 [<c052bc69>] ? do_filp_open+0xc9/0xb10 [<c04fa822>] ? handle_mm_fault+0x132/0x1d0 [<c080f730>] ? do_page_fault+0x0/0x480 [<c051b568>] ? do_sys_open+0x58/0x130 [<c04a993c>] ? audit_syscall_entry+0x21c/0x240 [<c051b6bc>] ? sys_open+0x2c/0x40 [<c04099fb>] ? sysenter_do_call+0x12/0x28 PuTTYsg_remove: sg0 sg_device_destroy: sg0 sd 0:0:69:0: alua: Detached sd 0:0:69:0: [sda] Synchronizing SCSI cache sd 0:0:69:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sg_remove: sg1 sg_device_destroy: sg1 sd 0:0:69:1: alua: Detached sd 0:0:69:1: [sdb] Synchronizing SCSI cache sd 0:0:69:1: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sg_remove: sg2 sg_device_destroy: sg2 sd 0:0:69:2: alua: Detached sd 0:0:69:2: [sdc] Synchronizing SCSI cache sd 0:0:69:2: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK scsi target0:0:69: mptsas: ioc0: delete device: fw_channel 0, fw_id 0, phy 12, sas_addr 0x5000155000020509