Bug 1643231 - [RFE] enable ALUA support at the gluster handler
Summary: [RFE] enable ALUA support at the gluster handler
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: ---
Assignee: Xiubo Li
QA Contact: Prasanth
URL:
Whiteboard:
Depends On: 1643195 1761365
Blocks: OCS-3.11.1-devel-triage-done
TreeView+ depends on / blocked
 
Reported: 2018-10-25 17:52 UTC by Prasanna Kumar Kalever
Modified: 2021-05-06 08:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-06 08:25:18 UTC
Embargoed:
xiubli: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1669500 0 unspecified CLOSED 'hardware_handler "1 alua" ' missing from multipath.conf file 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1669984 0 unspecified CLOSED 'hardware_handler "1 alua" ' missing from multipath.conf file 2021-02-22 00:41:40 UTC

Description Prasanna Kumar Kalever 2018-10-25 17:52:31 UTC
Description of problem:

Add ALUA support at glfs handler of tcmu-runner

The alua is a must for gluster-block, because due to the design of the LIO & tcmu, if one path has been blocked for a long time(such due to the network's reason) and then the IO requests in the client side will timed out and try to resend the IO requests through the other path. Just then the blocked path recovered and it will continue the old IO requests to the backend, which may overwrite and crash the data.

### When does ALUA be of help ?

Let me explain this with a simple example:
Example 1:
Say initiator had sent a write request (lets call it as cmd[0]) to tcmu-runner, which send that down to glusterfs. But for some reason the cmd[0] is delayed at glusterfs layer. In the meanwhile, path switch happens and same cmd[0] sent by application through node2 and gluster return success. Now there is an another write request (cmd[1]) at the same offset of previous command, consider cmd[1] also succeed. Now what if cmd[0] lingering there sent though node 1, goes into action ? corruption at offset ?

Example 2:
Say write is arrived at tcmu-runner (call it as cmd[0]) just got delayed in tcmu-runner layer for some reasons (not yet sent to gluster yet). If its delayed too long for reasons like resource crunch or coz of a network disconnect between tcmu-runner and initiator, there will be path switch. Now consider the same old case, cmd[0] is sent via path2 and new write cmd[1] kicks in  at the same offset from path2 and post this, if cmd[0] is issued from tcmu-runner on node 1 to glusterfs ? what happens ? corruption ?

Yes this is what we want to solve with ALUA.

Read more details at:
- https://github.com/gluster/glusterfs/issues/466#issuecomment-425428654
- https://github.com/gluster/gluster-block/issues/53#issuecomment-432924044

Comment 2 Niels de Vos 2019-02-07 11:18:53 UTC
What is the current status of this?

Bugs 1669500 and 1669984 have been reported for the multipath configuration and seem related. Is it expected that ALUA is configured already, without this BZ being addressed?

Comment 5 Prasanna Kumar Kalever 2020-02-28 06:59:52 UTC
Xiubo,

Please add the patch link and move this to POST.

thanks!


Note You need to log in before you can comment on or make changes to this bug.