Bug 1278383

Summary: Data Tiering:detach tier commit with force is resulting in loss of IO(Transport endpoint is not connected)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED NOTABUG QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: dlambrig, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: tier-attach-detach
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1264926 Environment:
Last Closed: 2016-07-01 04:15:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1264926    
Bug Blocks: 1276742    

Description Nag Pavan Chilakam 2015-11-05 11:32:22 UTC
+++ This bug was initially created as a clone of Bug #1264926 +++

Description of problem:
========================
I issued a direct commit force on a tier volume with IOs happening (say file creates of 10000). the new file creates for many files failed during the commit phase as below
touch: cannot touch ‘f995’: Transport endpoint is not connected
touch: cannot touch ‘f996’: Transport endpoint is not connected
touch: cannot touch ‘f997’: Transport endpoint is not connected
touch: cannot touch ‘f998’: Transport endpoint is not connected
touch: cannot touch ‘f999’: Transport endpoint is not connected
touch: cannot touch ‘f1000’: Transport endpoint is not connected
touch: cannot touch ‘f1001’: Transport endpoint is not connected
touch: cannot touch ‘f1002’: Transport endpoint is not connected
touch: cannot touch ‘f1003’: Transport endpoint is not connected
touch: cannot touch ‘f1004’: Transport endpoint is not connected
touch: cannot touch ‘f1005’: Transport endpoint is not connected
touch: cannot touch ‘f1006’: Transport endpoint is not connected
touch: cannot touch ‘f1007’: Transport endpoint is not connected
touch: cannot touch ‘f1008’: Transport endpoint is not connected
touch: cannot touch ‘f1009’: Transport endpoint is not connected


However, files continued with creation after certain point.
In my case i did a touch of f{1..10000} and issued detach-tier force commit.
files f840 to f1019 failed to created with transport end point error.
the files from f1020 till f10000 got created

[root@zod ~]# gluster --version
glusterfs 3.7.4 built on Sep 19 2015 01:30:43
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@zod ~]# rpm -qa|grep gluster
glusterfs-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-fuse-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-debuginfo-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-api-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-client-xlators-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-server-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-cli-3.7.4-0.43.gitf139283.el7.centos.x86_64
glusterfs-libs-3.7.4-0.43.gitf139283.el7.centos.x86_64
[root@zod ~]# 

Steps to Reproduce:
1.do file creates using touch for some 10000 files
2.issue a detach tier force commit
3.IO error will be seen for sometim

--- Additional comment from Dan Lambright on 2015-10-02 09:48:36 EDT ---

I believe an I/O error is unavoidable because the 'commit force' erases the tier translator in the middle of an I/O. Its job is to detach the tier immediately regardless of what else is happening. Based on that I would not call this a bug- the force option is dangerous but is working as intended. Lets discuss in scrum to make a final decision on what to do about this.

Comment 4 Mohammed Rafi KC 2016-07-01 04:15:38 UTC
detach force is command used to forcefully remove the tier. With force command we immediately changes the configuration. So the ongoing i/o are expected fail. Because of this reason we have given a warning about possible data loss.

The graceful removal of tier is detach start followed by detach commit.

So closing this bug