Bug 1340049

Summary:	Gluster related xattr are not present on the brick after workload is run in the volume
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shekhar Berry <shberry>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED NOTABUG	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	eboyd, jeder, jharriga, mliyazud, rcyriac, rhs-bugs, shberry, storage-qa-internal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-06-03 05:41:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shekhar Berry 2016-05-26 11:22:05 UTC

Description of problem:

My testing Environment
-------------------------------

1) 6 Node OSE with 1 master and 5 nodes running OSE v3.2
2) 6 RHGS PODs are created on the same OSE environment. (Containerized Gluster)
3) Gluster Volume is mounted on application PODs using Glusterfs Volume Plugin

Whenever I run sysbench with mariad workload on my gluster volume the IOPS is served by only one set of bricks in Distributed-replicate 3x2 volume. The other two pairs never get any IO. Once the test is over its seen Gluster related xattr are not present on the bricks and hence they were not serving IOPS.


Version-Release number of selected component (if applicable):

rpm -qa | grep gluster
glusterfs-libs-3.7.9-5.el7rhgs.x86_64
glusterfs-3.7.9-5.el7rhgs.x86_64
glusterfs-api-3.7.9-5.el7rhgs.x86_64
glusterfs-cli-3.7.9-5.el7rhgs.x86_64
glusterfs-server-3.7.9-5.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-5.el7rhgs.x86_64
glusterfs-fuse-3.7.9-5.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-5.el7rhgs.x86_64

rpm -qa | grep atomic
atomic-openshift-clients-3.2.0.15-1.git.0.e88b10d.el7.x86_64
atomic-openshift-3.2.0.15-1.git.0.e88b10d.el7.x86_64
atomic-openshift-master-3.2.0.15-1.git.0.e88b10d.el7.x86_64
tuned-profiles-atomic-openshift-node-3.2.0.15-1.git.0.e88b10d.el7.x86_64
atomic-openshift-sdn-ovs-3.2.0.15-1.git.0.e88b10d.el7.x86_64
atomic-openshift-node-3.2.0.15-1.git.0.e88b10d.el7.x86_64


How reproducible:

Issue Encountered:


Step 1: Creation of Containerized Gluster (GOTO Step 5 to directly view the issue)
------------------------------------------

6 RHGS POD created on OSE v3.2

rhgs-0                    0/1       Running   0          13m
rhgs-1                    0/1       Running   0          13m
rhgs-2                    0/1       Running   0          13m
rhgs-3                    0/1       Running   0          13m
rhgs-4                    0/1       Running   0          13m
rhgs-5                    0/1       Running   0          13m

A single Gluster Distributed Replicated Replica 2 volume is carved from above pods.

oc exec rhgs-0 -- gluster volume create cgluster replica 2 172.17.40.13:/bricks/b/g 172.17.40.14:/bricks/b/g 172.17.40.15:/bricks/b/g 172.17.40.16:/bricks/b/g 172.17.40.22:/bricks/b/g 172.17.40.24:/bricks/b/g

oc exec rhgs-0 -- gluster vol info
 
Volume Name: cgluster
Type: Distributed-Replicate
Volume ID: 7fef1e8f-2dbd-42e8-a3fa-eaa26c9e7489
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.40.13:/bricks/b/g
Brick2: 172.17.40.14:/bricks/b/g
Brick3: 172.17.40.15:/bricks/b/g
Brick4: 172.17.40.16:/bricks/b/g
Brick5: 172.17.40.22:/bricks/b/g
Brick6: 172.17.40.24:/bricks/b/g
Options Reconfigured:
performance.readdir-ahead: on


Step 2: Populating gluster with files and verifying its working
-----------------------------------------------------------------

mkdir -pv /mnt/glusterfs | mount -t glusterfs 172.17.40.13:/cgluster /mnt/glusterfs

touch /mnt/glusterfs/abcd{1..10}

oc exec rhgs-0 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd2
-rw-r--r--. 2 root root 0 May 25 01:53 abcd3
-rw-r--r--. 2 root root 0 May 25 01:53 abcd4
-rw-r--r--. 2 root root 0 May 25 01:53 abcd5
-rw-r--r--. 2 root root 0 May 25 01:53 abcd7

oc exec rhgs-2 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd10
-rw-r--r--. 2 root root 0 May 25 01:53 abcd6
-rw-r--r--. 2 root root 0 May 25 01:53 abcd9

oc exec rhgs-4 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd1
-rw-r--r--. 2 root root 0 May 25 01:53 abcd8


As you see above files are going to all servers as per DHT. Its a 3x2 volume and hence I am showing output of only three bricks.

rm -rf /mnt/glusterfs/*

umount /mnt/glusterfs

Step 3: Creation of Application PODs and mounting gluster inside POD using 
--------------------------------------------------------------------------
gluster volume plugin
---------------------

pod-0 below is the application POD existing on node1 of 6 node OSE v3.2

oc get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       NODE
docker-registry-4-tx3s3   1/1       Running   4          40d       ose3-node2.example.com
pod-0                     1/1       Running   0          22s       ose3-node1.example.com
rhgs-0                    0/1       Running   0          33m       ose3-master.example.com
rhgs-1                    0/1       Running   0          33m       ose3-node1.example.com
rhgs-2                    0/1       Running   0          33m       ose3-node2.example.com
rhgs-3                    0/1       Running   0          33m       ose3-node3.example.com
rhgs-4                    0/1       Running   0          33m       ose3-node4.example.com
rhgs-5                    0/1       Running   0          33m       ose3-node5.example.com
router-2-fatmi            1/1       Running   8          40d       ose3-node3.example.com

On node 1 if we issue a mount command we will see the details of gluster volume plugin

mount| grep gluster
172.17.40.13:cgluster on /var/lib/origin/openshift.local.volumes/pods/b3360cf1-2240-11e6-aa52-782bcb736d36/volumes/kubernetes.io~glusterfs/glusterfsvol type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

On doing a df inside POD we will see the gluster volume mounted on /mnt/glusterfs

oc exec pod-0 -- df|grep gluster
172.17.40.13:cgluster        1456411776 545044608 911367168  38% /mnt/glusterfs

Step 4: Creation of files on gluster mount point inside POD and seeing where 
----------------------------------------------------------------------------
the file lands on rhgs servers.
-------------------------------

oc exec pod-0 -- touch /mnt/glusterfs/abcd{1..10}

We will see file getting distributed as per DHT

oc exec rhgs-0 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd2
-rw-r--r--. 2 root root 0 May 25 01:53 abcd3
-rw-r--r--. 2 root root 0 May 25 01:53 abcd4
-rw-r--r--. 2 root root 0 May 25 01:53 abcd5
-rw-r--r--. 2 root root 0 May 25 01:53 abcd7

oc exec rhgs-2 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd10
-rw-r--r--. 2 root root 0 May 25 01:53 abcd6
-rw-r--r--. 2 root root 0 May 25 01:53 abcd9

oc exec rhgs-4 -- ls -lR /bricks/b/g/ 
/bricks/b/g/:
total 0
-rw-r--r--. 2 root root 0 May 25 01:53 abcd1
-rw-r--r--. 2 root root 0 May 25 01:53 abcd8

Step 5: Executed sysbench test on gluster volume and checked the status after 
-----------------------------------------------------------------------------
test completed
---------------

Only two of the gluster bricks were active and handling IO. The rest of the four bricks has failed.


gluster v status
Status of volume: cgluster
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.17.40.13:/bricks/b/g              49152     0          Y       464  
Brick 172.17.40.14:/bricks/b/g              N/A       N/A        N       N/A  
Brick 172.17.40.15:/bricks/b/g              N/A       N/A        N       N/A  
Brick 172.17.40.16:/bricks/b/g              49152     0          Y       364  
Brick 172.17.40.22:/bricks/b/g              N/A       N/A        N       N/A  
Brick 172.17.40.24:/bricks/b/g              N/A       N/A        N       N/A  
NFS Server on localhost                     2049      0          Y       486  
Self-heal Daemon on localhost               N/A       N/A        Y       491  
NFS Server on 172.17.40.16                  2049      0          Y       386  
Self-heal Daemon on 172.17.40.16            N/A       N/A        Y       391  
NFS Server on 172.17.40.24                  2049      0          Y       378  
Self-heal Daemon on 172.17.40.24            N/A       N/A        Y       383  
NFS Server on 172.17.40.14                  2049      0          Y       385  
Self-heal Daemon on 172.17.40.14            N/A       N/A        Y       390  
NFS Server on 172.17.40.22                  2049      0          Y       376  
Self-heal Daemon on 172.17.40.22            N/A       N/A        Y       381  
NFS Server on 172.17.40.15                  2049      0          Y       397  
Self-heal Daemon on 172.17.40.15            N/A       N/A        Y       402  
 
Task Status of Volume cgluster
------------------------------------------------------------------------------
There are no active volume tasks

I stopped the gluster volume and tried to restart it. It gave following error:

gluster v start cgluster
volume start: cgluster: failed: Pre Validation failed on 172.17.40.14. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available
Pre Validation failed on 172.17.40.15. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available
Pre Validation failed on 172.17.40.24. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available
Pre Validation failed on 172.17.40.22. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available

Heres an output of 'getfattr -d -m . -e hex /bricks/b/g/' from all my server where my Gluster PODs were created

---172.17.40.13---

getfattr: Removing leading '/' from absolute path names
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.cgluster-client-1=0x000000000000000000000005
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
trusted.glusterfs.volume-id=0x7fef1e8f2dbd42e8a3faeaa26c9e7489

--- 172.17.40.14 ---
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

getfattr: Removing leading '/' from absolute path names

--- 172.17.40.15 ---
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.glusterfs.dht=0x00000001000000000000000055555554

getfattr: Removing leading '/' from absolute path names

--- 172.17.40.16 ---
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.cgluster-client-2=0x000000000000000000000003
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x00000001000000000000000055555554
trusted.glusterfs.volume-id=0x7fef1e8f2dbd42e8a3faeaa26c9e7489

getfattr: Removing leading '/' from absolute path names

--- 172.17.40.22 ---
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

getfattr: Removing leading '/' from absolute path names

--- 172.17.40.24 ---
# file: bricks/b/g/
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

getfattr: Removing leading '/' from absolute path names


I have attached the /var/log/glusterfs/* files from each server below.

Comment 2 Shekhar Berry 2016-05-26 14:43:57 UTC

Link to Log files:

http://perf1.perf.lab.eng.bos.redhat.com/pub/mpillai/aplo/

Comment 3 Atin Mukherjee 2016-05-27 07:36:42 UTC

So here is the analysis:

From gprfs13 cmd_history.log:

[2016-05-25 05:47:53.805251]  : volume create cgluster replica 2 172.17.40.13:/bricks/b/g 172.17.40.14:/bricks/b/g 172.17.40.15:/bricks/b/g 172.17.40.16:/bricks/b/g 172.17.40.22:/bricks/b/    g 172.17.40.24:/bricks/b/g : SUCCESS
[2016-05-25 05:47:54.632645]  : v start cgluster : SUCCESS

So the volume was created and started at 05:47

From gprfs14  bricks/bricks-b-g.log

[2016-05-25 10:26:15.925553] W [MSGID: 113075] [posix-helpers.c:1824:posix_health_check_thread_proc] 0-cgluster-posix: health_check on /bricks/b/g returned [No such file or directory]    
[2016-05-25 10:26:15.925563] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-cgluster-posix: health-check failed, going down
[2016-05-25 10:26:45.965533] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-cgluster-posix: still alive! -> SIGTERM
[2016-05-25 10:26:45.982655] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f241b8f6dc5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5)                      [0x7f241cf70915] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f241cf7078b] ) 0-: received signum (15), shutting down

The first log entry indicates that posix health check failed with ENOENT which means the brick path was deleted manually and that too when the volume was already up and running.

From gprfs13 cmd_history.log

[2016-05-25 11:25:27.752059]  : v stop cgluster : SUCCESS

Volume was stopped. Note that even if brick path is deleted volume stop will work successfully.

[2016-05-25 11:25:40.199575]  : v start cgluster : FAILED : Pre Validation failed on 172.17.40.14. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g.      Reason : No data available

volume start failed here. This indicates that brick now exists otherwise it would have failed with a different reason saying brick doesn't exist. But here it failed to find the xattrs which indicates that brick was recreated manually but since the volume was already configured then xattrs were all lost and volume couldn't be started.

From the looks of the log files this definitely looks like a set up issue where bricks were removed manually even when the volume was in existence which is *not supported*

Comment 4 Atin Mukherjee 2016-05-31 13:50:44 UTC

Shekhar,

As discussed, do you mind to close this bug now given that you are unable to hit it. Feel free to reopen if you hit it again.

~Atin

Comment 5 Atin Mukherjee 2016-06-03 05:41:52 UTC

I am closing this bug now, please feel free to reopen if you hit it again.

Comment 6 Shekhar Berry 2016-09-12 11:23:04 UTC

As Atin mentoned, its closed