1643231 – [RFE] enable ALUA support at the gluster handler

Bug 1643231 - [RFE] enable ALUA support at the gluster handler

Summary: [RFE] enable ALUA support at the gluster handler

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tcmu-runner
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Xiubo Li
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:	1643195 1761365
Blocks:	OCS-3.11.1-devel-triage-done
TreeView+	depends on / blocked

Reported:	2018-10-25 17:52 UTC by Prasanna Kumar Kalever
Modified:	2021-05-06 08:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 08:25:18 UTC
Embargoed:
Dependent Products:
Flags:	xiubli: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1669500	0	unspecified	CLOSED	'hardware_handler "1 alua" ' missing from multipath.conf file	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1669984	0	unspecified	CLOSED	'hardware_handler "1 alua" ' missing from multipath.conf file	2021-02-22 00:41:40 UTC

Description Prasanna Kumar Kalever 2018-10-25 17:52:31 UTC

Description of problem:

Add ALUA support at glfs handler of tcmu-runner

The alua is a must for gluster-block, because due to the design of the LIO & tcmu, if one path has been blocked for a long time(such due to the network's reason) and then the IO requests in the client side will timed out and try to resend the IO requests through the other path. Just then the blocked path recovered and it will continue the old IO requests to the backend, which may overwrite and crash the data.

### When does ALUA be of help ?

Let me explain this with a simple example:
Example 1:
Say initiator had sent a write request (lets call it as cmd[0]) to tcmu-runner, which send that down to glusterfs. But for some reason the cmd[0] is delayed at glusterfs layer. In the meanwhile, path switch happens and same cmd[0] sent by application through node2 and gluster return success. Now there is an another write request (cmd[1]) at the same offset of previous command, consider cmd[1] also succeed. Now what if cmd[0] lingering there sent though node 1, goes into action ? corruption at offset ?

Example 2:
Say write is arrived at tcmu-runner (call it as cmd[0]) just got delayed in tcmu-runner layer for some reasons (not yet sent to gluster yet). If its delayed too long for reasons like resource crunch or coz of a network disconnect between tcmu-runner and initiator, there will be path switch. Now consider the same old case, cmd[0] is sent via path2 and new write cmd[1] kicks in  at the same offset from path2 and post this, if cmd[0] is issued from tcmu-runner on node 1 to glusterfs ? what happens ? corruption ?

Yes this is what we want to solve with ALUA.

Read more details at:
- https://github.com/gluster/glusterfs/issues/466#issuecomment-425428654
- https://github.com/gluster/gluster-block/issues/53#issuecomment-432924044

Comment 2 Niels de Vos 2019-02-07 11:18:53 UTC

What is the current status of this?

Bugs 1669500 and 1669984 have been reported for the multipath configuration and seem related. Is it expected that ALUA is configured already, without this BZ being addressed?

Comment 5 Prasanna Kumar Kalever 2020-02-28 06:59:52 UTC

Xiubo,

Please add the patch link and move this to POST.

thanks!

Note You need to log in before you can comment on or make changes to this bug.