| Summary: | IBM zfcp driver recovery after a fabric events lands up in "Medium access timeout failure. Offlining disk!" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | loberman <loberman> |
| Component: | kernel | Assignee: | Ewan D. Milne <emilne> |
| kernel sub component: | Storage Drivers | QA Contact: | guazhang <guazhang> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | cww, emilne, guazhang, loberman, mgandhi, mgoodwin, mlombard, mpg, nkshirsa |
| Version: | 6.8 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-08 14:44:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1374441, 1461138 | ||
|
Description
loberman
2016-08-25 14:26:21 UTC
This is very similar to BZ #1182838, where a switch firmware rolling update caused path flapping (see also linked customer case 01321891). In that BZ/case, the scsi TUR path checker was also in use, and during the firmware update TURs issued by multipathd were succeeding, but subsequent I/O was failing .. i.e. we had failover/failback flapping every checker_interval seconds. The bug is that TUR should NOT have been succeeding (because the device is not ready even though technically it is accessible). The workaround/solution was to change to the directio path_checker and the case was resolved. In the case attached to this BZ, if TURs were not succeeding during the firmware update, then the paths would have remained failed until they returned after the upgrade, and hence we would not have tripped max_medium_access_timeouts and hence the scsi err handler would not have eventually offlined the devices ... which is when everything went downhill with manual intervention required to recover. So it seems an alternative solution/workaround here would be to change to path_checker = directio in multipath.conf to avoid the TUR issue. Test kernel with patch to SCSI error handling available for testing at: http://people.redhat.com/emilne/RPMS/.bz1370212/ Contains the following change: commit 8ab5d0046f69034fb7f74abc25f209262d2098c1 Author: Ewan D. Milne <emilne> Date: Fri Sep 23 09:50:12 2016 -0400 scsi_error: count medium access timeout only once per EH run The current medium access timeout counter will be increased for each command, so if there are enough failed commands we'll hit the medium access timeout for even a single failure. Fix this by making the timeout per EH run, ie the counter will only be increased once per device and EH run. Signed-off-by: Hannes Reinecke <hare> (Modified for RHEL6 -- KABI changes, also changed to add argument to scsi_eh_action, scsi_driver.eh_action, and sd_eh_action instead of overloading the existing eh_disp argument with a reset flag.) Signed-off-by: Ewan D. Milne <emilne> --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 047cc20..d3e4550 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -50,6 +50,7 @@ #define HOST_RESET_SETTLE_TIME (10) static int scsi_eh_try_stu(struct scsi_cmnd *scmd); +static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn, bool reset); /* called with shost->host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) @@ -130,6 +131,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag) ret = 1; scmd->eh_eflags |= eh_flag; + scsi_eh_action(scmd, 0, 1); list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q); shost->host_failed++; scsi_eh_wakeup(shost); @@ -975,12 +977,12 @@ static int scsi_request_sense(struct scsi_cmnd *scmd) return scsi_send_eh_cmnd(scmd, NULL, 0, scmd->device->eh_timeout, ~0); } -static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn) +static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn, bool reset) { if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) { struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); if (sdrv->eh_action) - rtn = sdrv->eh_action(scmd, rtn); + rtn = sdrv->eh_action(scmd, rtn, reset); } return rtn; } @@ -1155,7 +1157,7 @@ static int scsi_eh_test_devices(struct list_head *cmd_list, if (scmd->device == sdev) { if (finish_cmds && (try_stu || - scsi_eh_action(scmd, SUCCESS) == SUCCESS)) + scsi_eh_action(scmd, SUCCESS, 0) == SUCCESS)) scsi_eh_finish_cmd(scmd, done_q); else list_move_tail(&scmd->eh_entry, work_q); @@ -1289,7 +1291,7 @@ static int scsi_eh_stu(struct Scsi_Host *shost, list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (scmd->device == sdev && - scsi_eh_action(scmd, SUCCESS) == SUCCESS) + scsi_eh_action(scmd, SUCCESS, 0) == SUCCESS) scsi_eh_finish_cmd(scmd, done_q); } } @@ -1353,7 +1355,7 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost, list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (scmd->device == sdev && - scsi_eh_action(scmd, rtn) != FAILED) + scsi_eh_action(scmd, rtn, 0) != FAILED) scsi_eh_finish_cmd(scmd, done_q); } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index f812367..d6ed528 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -105,7 +105,7 @@ static int sd_suspend(struct device *, pm_message_t state); static int sd_resume(struct device *); static void sd_rescan(struct device *); static int sd_done(struct scsi_cmnd *); -static int sd_eh_action(struct scsi_cmnd *, int); +static int sd_eh_action(struct scsi_cmnd *, int, bool); static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer); static void scsi_disk_release(struct device *cdev); static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *); @@ -1349,6 +1349,7 @@ static const struct block_device_operations sd_fops = { * sd_eh_action - error handling callback * @scmd: sd-issued command that has failed * @eh_disp: The recovery disposition suggested by the midlayer + * @reset: Reset the medium access timed out increment flag * * This function is called by the SCSI midlayer upon completion of an * error test command (currently TEST UNIT READY). The result of sending @@ -1357,10 +1358,14 @@ static const struct block_device_operations sd_fops = { * test unit ready (so wrongly see the device as having a successful * recovery) **/ -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset) { struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk); + if (reset) { + sdkp->medium_access_reset = 0; + return eh_disp; + } if (!scsi_device_online(scmd->device) || !scsi_medium_access_command(scmd) || host_byte(scmd->result) != DID_TIME_OUT || @@ -1374,7 +1379,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) * process of recovering or has it suffered an internal failure * that prevents access to the storage medium. */ - sdkp->medium_access_timed_out++; + if (!sdkp->medium_access_reset) { + sdkp->medium_access_timed_out++; + sdkp->medium_access_reset = 1; + } /* * If the device keeps failing read/write commands but TEST UNIT diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h index ebf68e3..c7c7434 100644 --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -90,6 +90,7 @@ struct scsi_disk { unsigned lbpvpd : 1; #ifndef __GENKSYMS__ unsigned cache_override : 1; /* temp override of WCE,RCD */ + unsigned medium_access_reset : 1; #endif }; #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev) diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h index 20fdfc2..e1dd47a 100644 --- a/include/scsi/scsi_driver.h +++ b/include/scsi/scsi_driver.h @@ -16,7 +16,7 @@ struct scsi_driver { void (*rescan)(struct device *); int (*done)(struct scsi_cmnd *); - int (*eh_action)(struct scsi_cmnd *, int); + int (*eh_action)(struct scsi_cmnd *, int, bool); }; #define to_scsi_driver(drv) \ container_of((drv), struct scsi_driver, gendrv) Many Thanks Ewan, I have offered this to the customer to test it for us. Thanks. Let me know how it works out. We have had a bit of discussion, and I think there are other issues that this does not fix, but I would rather fix 95% of the problem for the customer now and worry about other aspects later. This is a -660 kernel and as far as I can tell it does not have any recent zfcp fixes, we are going to need a separate BZ to track those if we don't have one already. Understood I will open a new BZ and track the coming zfcp changes when they show up upstream Steffen did not say when they would be coming out, rather he just said for me to keep an eye out. When they show up will open the BZ to get them back-ported as it would appear they are important in the general stability sense for the zfcp driver. This new BZ to tyrack the zfcp changes will be have to be 2 BZ'S one for 6.9 and 7.2+ Thanks!! Patch was missing a line
diff -Nurp linux-2.6.32-573.18.1.el6.orig/drivers/scsi/libfc/fc_exch.c linux-2.6.32-573.18.1.el6/drivers/scsi/libfc/fc_exch.c
--- linux-2.6.32-573.18.1.el6.orig/drivers/scsi/libfc/fc_exch.c 2016-01-06 10:15:32.000000000 -0500
+++ linux-2.6.32-573.18.1.el6/drivers/scsi/libfc/fc_exch.c 2016-10-12 20:33:54.558469871 -0400
@@ -815,14 +815,19 @@ err:
* EM is selected when a NULL match function pointer is encountered
* or when a call to a match function returns true.
*/
-static inline struct fc_exch *fc_exch_alloc(struct fc_lport *lport,
- struct fc_frame *fp)
+static struct fc_exch *fc_exch_alloc(struct fc_lport *lport,
+ struct fc_frame *fp)
{
struct fc_exch_mgr_anchor *ema;
+ struct fc_exch *ep;
- list_for_each_entry(ema, &lport->ema_list, ema_list)
- if (!ema->match || ema->match(fp))
- return fc_exch_em_alloc(lport, ema->mp);
+ list_for_each_entry(ema, &lport->ema_list, ema_list) {
+ if (!ema->match || ema->match(fp)) {
+ ep = fc_exch_em_alloc(lport, ema->mp);
+ if (ep)
+ return ep;
+ }
+ }
return NULL;
}
diff -Nurp linux-2.6.32-573.18.1.el6.orig/include/linux/netdevice.h linux-2.6.32-573.18.1.el6/include/linux/netdevice.h
--- linux-2.6.32-573.18.1.el6.orig/include/linux/netdevice.h 2016-01-06 10:15:59.000000000 -0500
+++ linux-2.6.32-573.18.1.el6/include/linux/netdevice.h 2016-10-12 20:08:52.828043196 -0400
@@ -1103,6 +1103,10 @@ struct net_device
#define NETIF_F_ALL_TSO (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
+#define NETIF_F_ALL_FCOE (NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
+ NETIF_F_FSO)
+
+
/*
* If one device supports one of these features, then enable them
* for all in netdev_increment_features.
diff -Nurp linux-2.6.32-573.18.1.el6.orig/include/scsi/fc_frame.h linux-2.6.32-573.18.1.el6/include/scsi/fc_frame.h
--- linux-2.6.32-573.18.1.el6.orig/include/scsi/fc_frame.h 2016-01-06 10:15:10.000000000 -0500
+++ linux-2.6.32-573.18.1.el6/include/scsi/fc_frame.h 2016-10-12 20:08:52.829043197 -0400
@@ -137,6 +137,8 @@ static inline struct fc_frame *fc_frame_
fp = fc_frame_alloc_fill(dev, len);
else
fp = _fc_frame_alloc(len);
+ if(!fp)
+ printk("RHDEBUG: In fcp_frame_alloc, we returned fp = %p\n",fp);
return fp;
}
diff -Nurp linux-2.6.32-573.18.1.el6.orig/net/8021q/vlan_dev.c linux-2.6.32-573.18.1.el6/net/8021q/vlan_dev.c
--- linux-2.6.32-573.18.1.el6.orig/net/8021q/vlan_dev.c 2016-01-06 10:15:58.000000000 -0500
+++ linux-2.6.32-573.18.1.el6/net/8021q/vlan_dev.c 2016-10-12 20:08:52.829043197 -0400
@@ -522,7 +522,9 @@ static int vlan_dev_init(struct net_devi
netdev_extended(dev)->hw_features = NETIF_F_ALL_CSUM | NETIF_F_SG |
NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |
- NETIF_F_HIGHDMA | NETIF_F_SCTP_CSUM;
+ NETIF_F_HIGHDMA | NETIF_F_SCTP_CSUM |
+ NETIF_F_ALL_FCOE;
+
dev->features |= real_dev->vlan_features | NETIF_F_LLTX;
dev->gso_max_size = real_dev->gso_max_size;
[loberman@dhcp-33-21 SOURCES]$
Yes mistake. 26 is for another BZ. Thanks for catching that. Yes build from source. Ignore comnent 26 (In reply to loberman from comment #29) > Yes mistake. 26 is for another BZ. > Thanks for catching that. > > Yes build from source. > > Ignore comnent 26 Thanks Laurence! Let me build the test kernel for all arch with patch in comment 17. I have put an s390x version of the test kernel with patch to SCSI error handling in: http://people.redhat.com/emilne/RPMS/.bz1370212/ >Hi Ewan, I have got update from the customer, that they could test a kernel for >s390 system. The link in comment#17 shows test kernel for x86_64 arch only. >Could you please let me know if there is a test kernel available for s390 arch. >that I can share with the customer. > >Thanks, >Milan. > >(In reply to Ewan D. Milne from comment #33) > I have put an s390x version of the test kernel with patch to SCSI error > handling in: > > http://people.redhat.com/emilne/RPMS/.bz1370212/ Is there any update on whether the customer was able to test this? We are past the 6.9 deadline at this point. Upstream has a fix forthcoming: https://marc.info/?l=linux-scsi&m=148827743226480&w=2 Regards Laurence Per discussion w/support, closing as WONTFIX. There is a workaround available, which is to increase max_medium_access_timeouts to a large value (see KB article https://access.redhat.com/site/solutions/2575901). It appears as if the problem may no longer be occurring due to fixes on the array side as well as we are no longer receiving reports of this problem. |