1179560 – sanlock directio test file (__DIRECT_IO_TEST__) is triggering self heal

Bug 1179560 - sanlock directio test file (__DIRECT_IO_TEST__) is triggering self heal

Summary: sanlock directio test file (__DIRECT_IO_TEST__) is triggering self heal

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Generic_Hyper_Converged_Host
TreeView+	depends on / blocked

Reported:	2015-01-07 05:49 UTC by Paul Cuzner
Modified:	2017-08-28 15:21 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-28 15:21:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Paul Cuzner 2015-01-07 05:49:14 UTC

Description of problem:
The __DIRECT_IO_TEST__ file used by sanlock is triggering self heal activity upon DC start up and vm start/shutdown cycles, and whenever probes are made to the glusterfs volume. The volume option "network.remote-dio: on" is set


Version-Release number of selected component (if applicable):
rhss 3.0.x, glusterfs 3.6.0.30 el7 builds

How reproducible:
I see this entry in self heal all the time

Steps to Reproduce:
1. build an environment based on rhel7.1 beta, glusterfs 3.6 el7 and rhevm 3.5 beta
2. activate the rhev cluster
3. check the vol heal X info output on the vm storage domain

Actual results:
__DIRECT_IO_TEST__ keeps appearing in the self heal output

Expected results:
This file should not initiate any self heal/recovery action with remote-dio enabled.

Additional info:

Comment 4 Sahina Bose 2016-03-09 06:17:05 UTC

Sas, do you see this with replica 3 volume and with sharding turned on?

Comment 5 Scott Harvanek 2016-07-25 16:02:41 UTC

I can confirm I see the same thing in a distribute replicated volume-

gluster vol info gv0
 
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 08773fa0-d57d-4b0a-a517-eaba19e7d58c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 172.16.17.1:/gluster/brick1/gv0
Brick2: 172.16.17.2:/gluster/brick1/gv0
Brick3: 172.16.17.3:/gluster/brick1/gv0
Brick4: 172.16.17.4:/gluster/brick1/gv0
Options Reconfigured:
performance.read-ahead: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on

gluster volume heal gv0 info
Brick 172.16.17.1:/gluster/brick1/gv0
/__DIRECT_IO_TEST__ 
Number of entries: 1

Brick 172.16.17.2:/gluster/brick1/gv0
/__DIRECT_IO_TEST__ 
Number of entries: 1

Brick 172.16.17.3:/gluster/brick1/gv0
Number of entries: 0

Brick 172.16.17.4:/gluster/brick1/gv0
Number of entries: 0

Comment 6 Scott Harvanek 2016-07-25 16:07:23 UTC

Slightly more updated versions however-

RHEV-H - 7.1 - 20150603.0.el7ev , 7.2 sees it and wants to mark the array as inoperable, I'm not sure it's actually having the issue tho as I have 7.1 hosts in service and all the 7.2 hosts in MTX due to this.

Gluster server version 3.7.12

RHEV-M - 3.5.8-0.1

Comment 8 Pranith Kumar K 2017-02-10 07:13:54 UTC

Sas,
   Is this issue still re-creatable? We are doing the planning for 3.3.0. Let us know your inputs.

Comment 9 SATHEESARAN 2017-02-16 16:35:12 UTC

(In reply to Pranith Kumar K from comment #8)
> Sas,
>    Is this issue still re-creatable? We are doing the planning for 3.3.0.
> Let us know your inputs.

I am not seeing this issue with Gluster 3.8.4 and oVirt 4.1.

But I remember Kasturi reporting such an issue with Arbiter volume.
Let me redo the test with Arbiter and raise the bug accordingly, if the issue is seen.

Comment 10 SATHEESARAN 2017-02-16 16:36:42 UTC

(In reply to Scott Harvanek from comment #6)
> Slightly more updated versions however-
> 
> RHEV-H - 7.1 - 20150603.0.el7ev , 7.2 sees it and wants to mark the array as
> inoperable, I'm not sure it's actually having the issue tho as I have 7.1
> hosts in service and all the 7.2 hosts in MTX due to this.
> 
> Gluster server version 3.7.12
> 
> RHEV-M - 3.5.8-0.1

Scott,

Kindly check if you are seeing this issue with oVirt 4.1 and Gluster 3.8

Comment 11 Scott Harvanek 2017-02-16 16:45:34 UTC

My issue was related to the arrangement of my Distributed-Replicate volume as being unsupported, I moved away from 2x2 and haven't had an issue since.

Comment 12 SATHEESARAN 2017-02-16 17:51:27 UTC

(In reply to Scott Harvanek from comment #11)
> My issue was related to the arrangement of my Distributed-Replicate volume
> as being unsupported, I moved away from 2x2 and haven't had an issue since.

Thanks Scott. Nice to hear that. Its not that distributed-replicate is not supported, but replica 2 is prone to have split-brain issues. Its the replica 3 flavor that provides you better consistency and availability ( to certain extent )

Comment 14 Sahina Bose 2017-08-28 10:55:41 UTC

I think we can close this based on Comment 11 and comment 9?

Comment 15 SATHEESARAN 2017-08-28 15:21:28 UTC

(In reply to Sahina Bose from comment #14)
> I think we can close this based on Comment 11 and comment 9?

Yes, that makes sense. 

I am closing this bug as CLOSED CURRENTRELEASE, as this issue was not reproducible with RHGS 3.2.0

Note You need to log in before you can comment on or make changes to this bug.