Description of problem: The SCSI ALUA handler does not handle ALUA transitioning states properly. For eg. for an ALUA enabled NetApp controller which supports implicit ALUA alone (with the following valid states - port group 00 state A supports ToUsNA), the following code snippet is seen in alua_rtpg: if (h->tpgs & TPGS_MODE_EXPLICIT) { switch (h->state) { case TPGS_STATE_TRANSITIONING: /* State transition, retry */ goto retry; break; case TPGS_STATE_OFFLINE: /* Path is offline, fail */ err = SCSI_DH_DEV_OFFLINED; break; default: break; } } else { /* Only Implicit ALUA support */ if (h->state == TPGS_STATE_OPTIMIZED || h->state == TPGS_STATE_NONOPTIMIZED || h->state == TPGS_STATE_STANDBY) /* Useable path if active */ err = SCSI_DH_OK; else /* Path unuseable for unavailable/offline */ err = SCSI_DH_DEV_OFFLINED; } During NetApp controller faults, the lun is in 'transitioning' state. But from the above code, it seems this is handled for explicit ALUA alone, and not for implicit ALUA. It should have ideally handled this for both. Secondly, in the alua_prep_fn: if (h->state != TPGS_STATE_OPTIMIZED && h->state != TPGS_STATE_NONOPTIMIZED) { ret = BLKPREP_KILL; req->flags |= REQ_QUIET; } Why is TPGS_STATE_TRANSITIONING not handled above? For this state, I suppose the prep_fn should be returning BLKPREP_DEFER. Because of these issues with the ALUA handler, we seem to have hit delayed dm-multipath IO (on SCSI devices using the ALUA handler) as described in bug 606259. Version-Release number of selected component (if applicable): kernel-2.6.18-194.el5 (RHEL 5.5)
The "log messages" attached to bug#606259 only ever show alua_rtpg() logging of the form: port group 00 state A supports ToUsNA These "ToUsNA" flags map to the following supported states: TPGS_SUPPORT_TRANSITION TPGS_SUPPORT_UNAVAILABLE TPGS_SUPPORT_NONOPTIMIZED TPGS_SUPPORT_OPTIMIZED comment#0 shows the block of code that handles these states for explicit and implicit alua. I agree that alua_rtpg() clearly lacks implicit alua support for TPGS_STATE_TRANSITIONING (which this NetApp LUN clearly needs given TPGS_SUPPORT_TRANSITION). But it strikes me as odd that we don't see something like the following in the messages file (from bug#606259) when all the controller faults occur: port group 00 state T supports ToUsNA So does alua_rtpg() ever actually get h->state == TPGS_STATE_TRANSITIONING for this implicit alua LUN? Anyway, ignoring my concern about alua_rtpg() possibly never seeing TPGS_STATE_TRANSITIONING for a moment, something like the following may suffice (this will need Mike Christie's feedback): diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c index a78aaa6..9e116de 100644 --- a/drivers/scsi/device_handler/scsi_dh_alua.c +++ b/drivers/scsi/device_handler/scsi_dh_alua.c @@ -610,6 +610,9 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_dh_data *h) h->state == TPGS_STATE_STANDBY) /* Useable path if active */ err = SCSI_DH_OK; + else if (h->state == TPGS_STATE_TRANSITIONING) + /* State transition, retry */ + goto retry; else /* Path unuseable for unavailable/offline */ err = SCSI_DH_DEV_OFFLINED; @@ -686,8 +689,10 @@ static int alua_prep_fn(struct scsi_device *sdev, struct request *req) struct alua_dh_data *h = get_alua_data(sdev); int ret = BLKPREP_OK; - if (h->state != TPGS_STATE_OPTIMIZED && - h->state != TPGS_STATE_NONOPTIMIZED) { + if (h->state == TPGS_STATE_TRANSITIONING) + ret = BLKPREP_DEFER; + else if (h->state != TPGS_STATE_OPTIMIZED && + h->state != TPGS_STATE_NONOPTIMIZED) { ret = BLKPREP_KILL; req->flags |= REQ_QUIET; }
Mike, could you please review comment#1, thanks.
Patch looks good to me. I do not know why transitioning was not handled in the prep_fn function before, but BLK_DEFER makes sense to me.
Hi Martin, Here is a summary of outstanding questions that we have: 1) Do you ever see the scsi_dh_alua handler process TPGS_STATE_TRANSITIONING? - something like the following in the kernel log: port group 00 state T supports ToUsNA 2) What is the maximum time that a NetApp LUN can be in the transitioning state when ALUA is used? - is this highly dependent on the amount of IO in the controller cache? - seems the kernel is doing the right thing of continuing to retry: https://bugzilla.redhat.com/show_bug.cgi?id=559586#c9 - but that the NetApp LUN stays in the transitioning state beyond 360 seconds: https://bugzilla.redhat.com/show_bug.cgi?id=606259#c57 3) Is there an alternative, NetApp supported, configuration if ALUA is disabled (in both the NetApp controller and linux/device-mapper-multipath)? - would this alternative resolve the delayed IO behavior seen in the host (Linux) or would the IO delays persist?
(In reply to comment #4) > Hi Martin, > > Here is a summary of outstanding questions that we have: > > 1) Do you ever see the scsi_dh_alua handler process TPGS_STATE_TRANSITIONING? > - something like the following in the kernel log: > port group 00 state T supports ToUsNA Yes. We see this message when the ALUA handler is in use for NetApp LUNs. And NetApp supports implicit ALUA alone with the following valid states - TRANSITION, UNAVAILABLE, NONOPTIMIZED & OPTIMIZED. > > 2) What is the maximum time that a NetApp LUN can be in the transitioning state > when ALUA is used? This should not exceed 120 seconds. > - is this highly dependent on the amount of IO in the controller cache? This is actually dependent on the controller config. If you have several aggregates, volumes, snapshots, etc., on the controllers, the NetApp LUN 'TRANSITIONING' time would be higher during cf takeovers/givebacks. > - seems the kernel is doing the right thing of continuing to retry: > https://bugzilla.redhat.com/show_bug.cgi?id=559586#c9 Yes. We want the kernel to retry till the 'TRANSITION' is complete. And that's why we chose the ALUA handler for the same. > - but that the NetApp LUN stays in the transitioning state beyond 360 > seconds: > https://bugzilla.redhat.com/show_bug.cgi?id=606259#c57 Hmm..let me look into that. > > 3) Is there an alternative, NetApp supported, configuration if ALUA is disabled > (in both the NetApp controller and linux/device-mapper-multipath)? Yes, you can use non-ALUA configs as well. For this, disable ALUA on the corresponding igroup on the NetApp controller and then use mpath_prio_ontap instead of mpath_prio_alua in the host multipath.conf. > - would this alternative resolve the delayed IO behavior seen in the host > (Linux) or would the IO delays persist? Yes, it would resolve the delayed IO behavior since the delayed IO is seen on ALUA setups alone.
(In reply to comment #5) > (In reply to comment #4) > > Hi Martin, > > > > Here is a summary of outstanding questions that we have: > > > > 1) Do you ever see the scsi_dh_alua handler process TPGS_STATE_TRANSITIONING? > > - something like the following in the kernel log: > > port group 00 state T supports ToUsNA > > Yes. We see this message when the ALUA handler is in use for NetApp LUNs. And > NetApp supports implicit ALUA alone with the following valid states - > TRANSITION, UNAVAILABLE, NONOPTIMIZED & OPTIMIZED. I can confirm that I have seen instances of the following too: scsi 1:0:3:0: alua: port group 01 state T supports ToUsNA This means the alua_rtpg() hunk from the patch in comment#1 is beneficial. But I have yet to see proof (from my debug kernel's scsi_dh_alua tracing) that the 2nd hunk, which changes alua_prep_fn, from the patch in comment#1 helps. My debugging would print "alua_prep_fn: TPGS_STATE_TRANSITIONING" if that path was taken. (and I've been doing a lot of takeover/giveback testing under dt load). > > 2) What is the maximum time that a NetApp LUN can be in the transitioning state > > when ALUA is used? > > This should not exceed 120 seconds. > > > - is this highly dependent on the amount of IO in the controller cache? > > This is actually dependent on the controller config. If you have several > aggregates, volumes, snapshots, etc., on the controllers, the NetApp LUN > 'TRANSITIONING' time would be higher during cf takeovers/givebacks. Is there anything of note about your backend LUN config that we should look to replicate in our config related to the above? Meaning: do you have many snapshots, several aggregates, etc?
(In reply to comment #6) > > Is there anything of note about your backend LUN config that we should look to > replicate in our config related to the above? Meaning: do you have many > snapshots, several aggregates, etc? No. You can ignore snapshots, aggregates, etc. Just stick to the config mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=606259#c52
(In reply to comment #6) > > But I have yet to see proof (from my debug kernel's scsi_dh_alua tracing) that > the 2nd hunk, which changes alua_prep_fn, from the patch in comment#1 helps. > My debugging would print "alua_prep_fn: TPGS_STATE_TRANSITIONING" if that path > was taken. > I'm seeing similar behavior as well on our setup here. Only "alua_rtpg: trying submit_rtpg" messages are visible, but not the "alua_prep_fn: TPGS_STATE_TRANSITIONING" messages.
And now I have hit something worse. To avoid hitting bug 599487 on Emulex hosts, I turned off the Emulex heartbeat parameter 'lpfc_enable_hba_heartbeat' to 0 as recommended in this bug. And the host paniced during controller takeover/givebacks - this is with the alua debug kernel containing lpfc driver v8.2.0.63.3p: Kernel BUG at drivers/scsi/lpfc/lpfc_scsi.c:2206 invalid opcode: 0000 [1] SMP last sysfs file: /block/dm-14/dev CPU 3 Modules linked in: nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg floppy tg3 pcspkr i2c_i801 i2c_core e752x_edac edac_mc ide_cd serio_raw cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_round_robin dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata shpchp lpfc scsi_transport_fc sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 438, comm: scsi_eh_0 Not tainted 2.6.18-194.11.1.el5.alua_dbg #1 RIP: 0010:[<ffffffff880ff793>] [<ffffffff880ff793>] :lpfc:lpfc_abort_handler+0x58/0x33d RSP: 0018:ffff81007e705dd0 EFLAGS: 00010246 RAX: ffff81003531a680 RBX: ffff81003531a680 RCX: ffff81007e705e90 RDX: ffff81007e705e90 RSI: ffff81003531a698 RDI: ffff81007e6eb050 RBP: ffff81007e624000 R08: ffff81007e704000 R09: 000000000000003c R10: ffff810002390a90 R11: ffffffff880ff73b R12: 0000000000000000 R13: 0000000000000282 R14: ffff81007e401b58 R15: ffffffff800a07c0 FS: 0000000000000000(0000) GS:ffff8100026ca6c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b081fcca1d0 CR3: 0000000013601000 CR4: 00000000000006e0 Process scsi_eh_0 (pid: 438, threadinfo ffff81007e704000, task ffff81007f949100) Stack: ffff81003531a680 ffff81007e6eb000 ffff81007e6eb4f8 000020023b9aca00 ffff810000000000 0000000300000001 ffff81007e705e00 ffff81007e705e00 0000958c9102002a 0000000000000018 ffff810000000001 ffff81007e705e28 Call Trace: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 [<ffffffff880791a4>] :scsi_mod:scsi_error_handler+0x290/0x4ac [<ffffffff88078f14>] :scsi_mod:scsi_error_handler+0x0/0x4ac [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003287b>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 [<ffffffff8003277d>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 0f 0b 68 dd e0 11 88 c2 9e 08 4d 8b 7c 24 10 4c 3b 3c 24 0f RIP [<ffffffff880ff793>] :lpfc:lpfc_abort_handler+0x58/0x33d RSP <ffff81007e705dd0> <0>Kernel panic - not syncing: Fatal exception
I'll now try with the lpfc patch mentioned in the same bug 599487 - hopefully that should resolve the panic. Meanwhile tests on the QLogic host are running fine so far - I have not hit any delayed IO on it yet. And from the /var/log/messages, I see both hunks of the patch being executed: # cat /var/log/messages|grep ToUsNA Aug 10 16:41:03 IBMx336-200-134 kernel: scsi 0:0:0:0: alua: port group 01 state N supports ToUsNA Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:11: alua: port group 01 state N supports ToUsNA Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:12: alua: port group 01 state N supports ToUsNA Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:13: alua: port group 01 state N supports ToUsNA .... # cat /var/log/messages|grep alua_rtpg Aug 10 16:41:03 IBMx336-200-134 kernel: scsi 0:0:0:0: alua_rtpg: trying submit_rtpg Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:11: alua_rtpg: trying submit_rtpg Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:12: alua_rtpg: trying submit_rtpg Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:14: alua_rtpg: trying submit_rtpg .... # cat /var/log/messages|grep alua_prep Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: TPGS_STATE_TRANSITIONING Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: TPGS_STATE_TRANSITIONING Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: TPGS_STATE_TRANSITIONING Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: TPGS_STATE_TRANSITIONING .... But I noticed that the alua_prep_fn messages shown above is called for sd 0:0:0:1 alone from the logs.
(In reply to comment #10) > I'll now try with the lpfc patch mentioned in the same bug 599487 - hopefully > that should resolve the panic. > > Meanwhile tests on the QLogic host are running fine so far - I have not hit any > delayed IO on it yet. And from the /var/log/messages, I see both hunks of the > patch being executed: OK, I'll be posting the patch upstream as well as prep'ing a patch for 5.6. > # cat /var/log/messages|grep ToUsNA > Aug 10 16:41:03 IBMx336-200-134 kernel: scsi 0:0:0:0: alua: port group 01 state > N supports ToUsNA > Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:11: alua: port group 01 > state N supports ToUsNA > Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:12: alua: port group 01 > state N supports ToUsNA > Aug 10 16:41:04 IBMx336-200-134 kernel: scsi 0:0:0:13: alua: port group 01 > state N supports ToUsNA > .... OK, but to be clear, the "N" variety was always possible. The new code I added introduces messages with "T" like: scsi 1:0:3:0: alua: port group 01 state T supports ToUsNA > # cat /var/log/messages|grep alua_prep > Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > TPGS_STATE_TRANSITIONING > Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > TPGS_STATE_TRANSITIONING > Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > TPGS_STATE_TRANSITIONING > Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > TPGS_STATE_TRANSITIONING > .... > > But I noticed that the alua_prep_fn messages shown above is called for sd > 0:0:0:1 alone from the logs. OK, its not clear to me why TPGS_STATE_TRANSITIONING would be confined to that one device.
(In reply to comment #9) > And now I have hit something worse. To avoid hitting bug 599487 on Emulex > hosts, I turned off the Emulex heartbeat parameter 'lpfc_enable_hba_heartbeat' > to 0 as recommended in this bug. > > And the host paniced during controller takeover/givebacks - this is with the > alua debug kernel containing lpfc driver v8.2.0.63.3p: I'm running that same kernel (2.6.18-194.11.1.el5.alua_dbg) without problems during the cf takeover/giveback test on a host with lpfc (0:8.2.0.63.3p). I guess I just haven't been lucky enough to hit bug 599487 -- that said I haven't disabled 'lpfc_enable_hba_heartbeat' either.
(In reply to comment #11) > > OK, but to be clear, the "N" variety was always possible. The new code I added > introduces messages with "T" like: > scsi 1:0:3:0: alua: port group 01 state T supports ToUsNA > Yes, I see that as well: # cat /var/log/messages|grep "state T" Aug 10 19:59:27 IBMx336-200-134 kernel: sd 0:0:0:10: alua: port group 01 state T supports ToUsNA Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua: port group 01 state T supports ToUsNA Aug 10 22:22:38 IBMx336-200-134 kernel: sd 1:0:1:25: alua: port group 03 state T supports ToUsNA ....
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
(In reply to comment #11) > (In reply to comment #10) > > Meanwhile tests on the QLogic host are running fine so far - I have not hit any > > delayed IO on it yet. And from the /var/log/messages, I see both hunks of the > > patch being executed: > > OK, I'll be posting the patch upstream as well as prep'ing a patch for 5.6. In response to having posted the patch upstream (to linux-scsi), Hannes Reinecke had the following insight: http://www.spinics.net/lists/linux-scsi/msg46193.html Hannes' first critique is actually what was intended by the patch: "The path is retried indefinitely. Arrays are _supposed_ to be in 'transitioning' only temporary; however, if the array is stuck due to a fw error we're stuck in 'defer', too." > > # cat /var/log/messages|grep alua_prep > > Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > > TPGS_STATE_TRANSITIONING > > Aug 10 22:02:45 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > > TPGS_STATE_TRANSITIONING > > Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > > TPGS_STATE_TRANSITIONING > > Aug 10 22:02:46 IBMx336-200-134 kernel: sd 0:0:0:1: alua_prep_fn: > > TPGS_STATE_TRANSITIONING > > .... > > > > But I noticed that the alua_prep_fn messages shown above is called for sd > > 0:0:0:1 alone from the logs. > > OK, its not clear to me why TPGS_STATE_TRANSITIONING would be confined to that > one device. But Hannes' second point of critique may help explain the behaviour Martin saw: "Secondly this path fails with 'directio' multipath checker. Remember that 'directio' is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() callback is evaluated, which will return 'DEFER' here once the path is in transitioning. And the state is never updated as RTPG is never called." So I think the 2nd hunk of the patch (which modifies alua_prep_fn) needs to be dropped. I'll ping Mike Christie to see what he thinks.
Just posted v3, which is the result of upstream review on linux-scsi: http://www.spinics.net/lists/linux-scsi/msg46988.html
in kernel-2.6.18-225.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
(In reply to comment #16) > But Hannes' second point of critique may help explain the behaviour Martin saw: > "Secondly this path fails with 'directio' multipath checker. Remember that > 'directio' > is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() > callback > is evaluated, which will return 'DEFER' here once the path is in transitioning. > And the state is never updated as RTPG is never called." > > So I think the 2nd hunk of the patch (which modifies alua_prep_fn) needs to be > dropped. > > I'll ping Mike Christie to see what he thinks. So what's the final take on this? Is there a problem using the directio checker with the ALUA handler? i.e. should we switch to some other path checkers like tur or readsector0 when using the ALUA handler?
(In reply to comment #23) > (In reply to comment #16) > > But Hannes' second point of critique may help explain the behaviour Martin saw: > > "Secondly this path fails with 'directio' multipath checker. Remember that > > 'directio' > > is using 'fs' requests, not block-pc ones. Hence for all I/O the prep_fn() > > callback > > is evaluated, which will return 'DEFER' here once the path is in transitioning. > > And the state is never updated as RTPG is never called." > > > > So I think the 2nd hunk of the patch (which modifies alua_prep_fn) needs to be > > dropped. > > > > I'll ping Mike Christie to see what he thinks. > > So what's the final take on this? Is there a problem using the directio checker > with the ALUA handler? i.e. should we switch to some other path checkers like > tur or readsector0 when using the ALUA handler? The above concern with directio path checker was specific to the patch that was being discussed upstream. This concern with directio has been resolved with the 5.6 fix (which is the equivalent of the upstream fix). So I'm not aware of any reason why directio should be avoided for ALUA. Though Mike Christie did have some concern that directio could cause unnecessary transitions here: https://bugzilla.redhat.com/show_bug.cgi?id=606259#c69 Setting needinfo to get Mike Christie's (or Ben Marzinski's) thoughts on tur vs directio w/ ALUA.
(In reply to comment #24) > Setting needinfo to get Mike Christie's (or Ben Marzinski's) thoughts on tur vs > directio w/ ALUA. Ben actually responded to Mike Christie's question with this: https://bugzilla.redhat.com/show_bug.cgi?id=606259#c75 So Martin, RHEL5 kernel >= 2.6.18-225.el5 has the fix, and Jarod provided a link to download this kernel in comment#19 Have you tried this kernel with directio? Do you have a specific concern or was your question just a continuation of previous concern raised in bz#606259 ?
I think directio would only be a problem if multipathd was testing all paths. If it is only testing paths that are down we should be ok.
(In reply to comment #25) > (In reply to comment #24) > > So Martin, RHEL5 kernel >= 2.6.18-225.el5 has the fix, and Jarod provided a > link to download this kernel in comment#19 > > Have you tried this kernel with directio? No, I have not yet tried this. > Do you have a specific concern or was your question just a continuation of previous concern raised in bz#606259 ? My query was in context of both - the directio concerns raised during the upstream discussion & the concerns raised by Ben & Mike Christie in bug 606259. Seeing these discussions, one does get the impression that you may run into problems if using directio - something that may be avoided with other checkers like tur. I am just looking for a confirmation on this from Red Hat.
(In reply to comment #27) > (In reply to comment #25) > > Do you have a specific concern or was your question just a continuation of previous concern raised in bz#606259 ? > > My query was in context of both - the directio concerns raised during the > upstream discussion That discussion is independent of any code that has ever shipped in RHEL or upstream. It was a problem with a specific patch that was proposed. That patch was never used. > & the concerns raised by Ben & Mike Christie in bug 606259. > Seeing these discussions, one does get the impression that you may run into > problems if using directio - something that may be avoided with other checkers > like tur. I am just looking for a confirmation on this from Red Hat. But I'll follow-up with Ben on the use of directio given his reply here: https://bugzilla.redhat.com/show_bug.cgi?id=606259#c75 In comment#26, Mike Christie speculated that testing all paths with directio could be a problem. So we'll work on getting you confirmation. Thanks.
Without setting up a NetApp box to test this, I can't say for certain, but we've had the checker set to directio for a while now, and when I've used one in the past, I've never noticed any ping-ponging. Multipath does check both the active and the failed paths, but ping-ponging on an ALUA setup seems unlikely to me. Do you know if reading a single sector's worth of IO from the non-optimal path will cause it to transition if you have implict ALUA setup? I assume not. On most arrays, the non-optimal path needs to recieve significantly more IO than the optimal path for the array to switch which controller manages the LUN. Otherwise, what you have is an active/passive array, that can automatically transfer the active path, which is not what ALUA is. With multipathd, both paths get checked just as often, so the amount of checker IO should be the same, however the optimal path gets all the IO coming to the multipath device, so there will never be a time when the non-optimal path is getting more IO than the optimal path.
Could we please have this ALUA transitioning fix backported to 5.5.z? And that means backporting the jiffies related fix in bug 556476 to 5.5.z as well.
*** Bug 606259 has been marked as a duplicate of this bug. ***
Reminder! There should be a fix present for this BZ in snapshot 3 -- unless otherwise noted in a previous comment. Please test and update this BZ with test results as soon as possible.
Any test results available here?
Action on NetApp to test this ASAP. Any results, Martin???
(In reply to comment #39) > Action on NetApp to test this ASAP. Any results, Martin??? Martin, The 5.6 kernel may be downloaded from here: http://people.redhat.com/jwilson/el5/238.el5/
Test results look good. Updated 'PartnerVerified' accordingly.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html