Description of problem: My testing Environment ------------------------------- 1) 6 Node OSE with 1 master and 5 nodes running OSE v3.2 2) 6 RHGS PODs are created on the same OSE environment. (Containerized Gluster) 3) Gluster Volume is mounted on application PODs using Glusterfs Volume Plugin Whenever I run sysbench with mariad workload on my gluster volume the IOPS is served by only one set of bricks in Distributed-replicate 3x2 volume. The other two pairs never get any IO. Once the test is over its seen Gluster related xattr are not present on the bricks and hence they were not serving IOPS. Version-Release number of selected component (if applicable): rpm -qa | grep gluster glusterfs-libs-3.7.9-5.el7rhgs.x86_64 glusterfs-3.7.9-5.el7rhgs.x86_64 glusterfs-api-3.7.9-5.el7rhgs.x86_64 glusterfs-cli-3.7.9-5.el7rhgs.x86_64 glusterfs-server-3.7.9-5.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-5.el7rhgs.x86_64 glusterfs-fuse-3.7.9-5.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-5.el7rhgs.x86_64 rpm -qa | grep atomic atomic-openshift-clients-3.2.0.15-1.git.0.e88b10d.el7.x86_64 atomic-openshift-3.2.0.15-1.git.0.e88b10d.el7.x86_64 atomic-openshift-master-3.2.0.15-1.git.0.e88b10d.el7.x86_64 tuned-profiles-atomic-openshift-node-3.2.0.15-1.git.0.e88b10d.el7.x86_64 atomic-openshift-sdn-ovs-3.2.0.15-1.git.0.e88b10d.el7.x86_64 atomic-openshift-node-3.2.0.15-1.git.0.e88b10d.el7.x86_64 How reproducible: Issue Encountered: Step 1: Creation of Containerized Gluster (GOTO Step 5 to directly view the issue) ------------------------------------------ 6 RHGS POD created on OSE v3.2 rhgs-0 0/1 Running 0 13m rhgs-1 0/1 Running 0 13m rhgs-2 0/1 Running 0 13m rhgs-3 0/1 Running 0 13m rhgs-4 0/1 Running 0 13m rhgs-5 0/1 Running 0 13m A single Gluster Distributed Replicated Replica 2 volume is carved from above pods. oc exec rhgs-0 -- gluster volume create cgluster replica 2 172.17.40.13:/bricks/b/g 172.17.40.14:/bricks/b/g 172.17.40.15:/bricks/b/g 172.17.40.16:/bricks/b/g 172.17.40.22:/bricks/b/g 172.17.40.24:/bricks/b/g oc exec rhgs-0 -- gluster vol info Volume Name: cgluster Type: Distributed-Replicate Volume ID: 7fef1e8f-2dbd-42e8-a3fa-eaa26c9e7489 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 172.17.40.13:/bricks/b/g Brick2: 172.17.40.14:/bricks/b/g Brick3: 172.17.40.15:/bricks/b/g Brick4: 172.17.40.16:/bricks/b/g Brick5: 172.17.40.22:/bricks/b/g Brick6: 172.17.40.24:/bricks/b/g Options Reconfigured: performance.readdir-ahead: on Step 2: Populating gluster with files and verifying its working ----------------------------------------------------------------- mkdir -pv /mnt/glusterfs | mount -t glusterfs 172.17.40.13:/cgluster /mnt/glusterfs touch /mnt/glusterfs/abcd{1..10} oc exec rhgs-0 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd2 -rw-r--r--. 2 root root 0 May 25 01:53 abcd3 -rw-r--r--. 2 root root 0 May 25 01:53 abcd4 -rw-r--r--. 2 root root 0 May 25 01:53 abcd5 -rw-r--r--. 2 root root 0 May 25 01:53 abcd7 oc exec rhgs-2 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd10 -rw-r--r--. 2 root root 0 May 25 01:53 abcd6 -rw-r--r--. 2 root root 0 May 25 01:53 abcd9 oc exec rhgs-4 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd1 -rw-r--r--. 2 root root 0 May 25 01:53 abcd8 As you see above files are going to all servers as per DHT. Its a 3x2 volume and hence I am showing output of only three bricks. rm -rf /mnt/glusterfs/* umount /mnt/glusterfs Step 3: Creation of Application PODs and mounting gluster inside POD using -------------------------------------------------------------------------- gluster volume plugin --------------------- pod-0 below is the application POD existing on node1 of 6 node OSE v3.2 oc get pods -o wide NAME READY STATUS RESTARTS AGE NODE docker-registry-4-tx3s3 1/1 Running 4 40d ose3-node2.example.com pod-0 1/1 Running 0 22s ose3-node1.example.com rhgs-0 0/1 Running 0 33m ose3-master.example.com rhgs-1 0/1 Running 0 33m ose3-node1.example.com rhgs-2 0/1 Running 0 33m ose3-node2.example.com rhgs-3 0/1 Running 0 33m ose3-node3.example.com rhgs-4 0/1 Running 0 33m ose3-node4.example.com rhgs-5 0/1 Running 0 33m ose3-node5.example.com router-2-fatmi 1/1 Running 8 40d ose3-node3.example.com On node 1 if we issue a mount command we will see the details of gluster volume plugin mount| grep gluster 172.17.40.13:cgluster on /var/lib/origin/openshift.local.volumes/pods/b3360cf1-2240-11e6-aa52-782bcb736d36/volumes/kubernetes.io~glusterfs/glusterfsvol type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) On doing a df inside POD we will see the gluster volume mounted on /mnt/glusterfs oc exec pod-0 -- df|grep gluster 172.17.40.13:cgluster 1456411776 545044608 911367168 38% /mnt/glusterfs Step 4: Creation of files on gluster mount point inside POD and seeing where ---------------------------------------------------------------------------- the file lands on rhgs servers. ------------------------------- oc exec pod-0 -- touch /mnt/glusterfs/abcd{1..10} We will see file getting distributed as per DHT oc exec rhgs-0 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd2 -rw-r--r--. 2 root root 0 May 25 01:53 abcd3 -rw-r--r--. 2 root root 0 May 25 01:53 abcd4 -rw-r--r--. 2 root root 0 May 25 01:53 abcd5 -rw-r--r--. 2 root root 0 May 25 01:53 abcd7 oc exec rhgs-2 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd10 -rw-r--r--. 2 root root 0 May 25 01:53 abcd6 -rw-r--r--. 2 root root 0 May 25 01:53 abcd9 oc exec rhgs-4 -- ls -lR /bricks/b/g/ /bricks/b/g/: total 0 -rw-r--r--. 2 root root 0 May 25 01:53 abcd1 -rw-r--r--. 2 root root 0 May 25 01:53 abcd8 Step 5: Executed sysbench test on gluster volume and checked the status after ----------------------------------------------------------------------------- test completed --------------- Only two of the gluster bricks were active and handling IO. The rest of the four bricks has failed. gluster v status Status of volume: cgluster Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.17.40.13:/bricks/b/g 49152 0 Y 464 Brick 172.17.40.14:/bricks/b/g N/A N/A N N/A Brick 172.17.40.15:/bricks/b/g N/A N/A N N/A Brick 172.17.40.16:/bricks/b/g 49152 0 Y 364 Brick 172.17.40.22:/bricks/b/g N/A N/A N N/A Brick 172.17.40.24:/bricks/b/g N/A N/A N N/A NFS Server on localhost 2049 0 Y 486 Self-heal Daemon on localhost N/A N/A Y 491 NFS Server on 172.17.40.16 2049 0 Y 386 Self-heal Daemon on 172.17.40.16 N/A N/A Y 391 NFS Server on 172.17.40.24 2049 0 Y 378 Self-heal Daemon on 172.17.40.24 N/A N/A Y 383 NFS Server on 172.17.40.14 2049 0 Y 385 Self-heal Daemon on 172.17.40.14 N/A N/A Y 390 NFS Server on 172.17.40.22 2049 0 Y 376 Self-heal Daemon on 172.17.40.22 N/A N/A Y 381 NFS Server on 172.17.40.15 2049 0 Y 397 Self-heal Daemon on 172.17.40.15 N/A N/A Y 402 Task Status of Volume cgluster ------------------------------------------------------------------------------ There are no active volume tasks I stopped the gluster volume and tried to restart it. It gave following error: gluster v start cgluster volume start: cgluster: failed: Pre Validation failed on 172.17.40.14. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available Pre Validation failed on 172.17.40.15. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available Pre Validation failed on 172.17.40.24. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available Pre Validation failed on 172.17.40.22. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available Heres an output of 'getfattr -d -m . -e hex /bricks/b/g/' from all my server where my Gluster PODs were created ---172.17.40.13--- getfattr: Removing leading '/' from absolute path names # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.cgluster-client-1=0x000000000000000000000005 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff trusted.glusterfs.volume-id=0x7fef1e8f2dbd42e8a3faeaa26c9e7489 --- 172.17.40.14 --- # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff getfattr: Removing leading '/' from absolute path names --- 172.17.40.15 --- # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.glusterfs.dht=0x00000001000000000000000055555554 getfattr: Removing leading '/' from absolute path names --- 172.17.40.16 --- # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.cgluster-client-2=0x000000000000000000000003 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000000000000055555554 trusted.glusterfs.volume-id=0x7fef1e8f2dbd42e8a3faeaa26c9e7489 getfattr: Removing leading '/' from absolute path names --- 172.17.40.22 --- # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 getfattr: Removing leading '/' from absolute path names --- 172.17.40.24 --- # file: bricks/b/g/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 getfattr: Removing leading '/' from absolute path names I have attached the /var/log/glusterfs/* files from each server below.
Link to Log files: http://perf1.perf.lab.eng.bos.redhat.com/pub/mpillai/aplo/
So here is the analysis: From gprfs13 cmd_history.log: [2016-05-25 05:47:53.805251] : volume create cgluster replica 2 172.17.40.13:/bricks/b/g 172.17.40.14:/bricks/b/g 172.17.40.15:/bricks/b/g 172.17.40.16:/bricks/b/g 172.17.40.22:/bricks/b/ g 172.17.40.24:/bricks/b/g : SUCCESS [2016-05-25 05:47:54.632645] : v start cgluster : SUCCESS So the volume was created and started at 05:47 From gprfs14 bricks/bricks-b-g.log [2016-05-25 10:26:15.925553] W [MSGID: 113075] [posix-helpers.c:1824:posix_health_check_thread_proc] 0-cgluster-posix: health_check on /bricks/b/g returned [No such file or directory] [2016-05-25 10:26:15.925563] M [MSGID: 113075] [posix-helpers.c:1845:posix_health_check_thread_proc] 0-cgluster-posix: health-check failed, going down [2016-05-25 10:26:45.965533] M [MSGID: 113075] [posix-helpers.c:1851:posix_health_check_thread_proc] 0-cgluster-posix: still alive! -> SIGTERM [2016-05-25 10:26:45.982655] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f241b8f6dc5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f241cf70915] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f241cf7078b] ) 0-: received signum (15), shutting down The first log entry indicates that posix health check failed with ENOENT which means the brick path was deleted manually and that too when the volume was already up and running. From gprfs13 cmd_history.log [2016-05-25 11:25:27.752059] : v stop cgluster : SUCCESS Volume was stopped. Note that even if brick path is deleted volume stop will work successfully. [2016-05-25 11:25:40.199575] : v start cgluster : FAILED : Pre Validation failed on 172.17.40.14. Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /bricks/b/g. Reason : No data available volume start failed here. This indicates that brick now exists otherwise it would have failed with a different reason saying brick doesn't exist. But here it failed to find the xattrs which indicates that brick was recreated manually but since the volume was already configured then xattrs were all lost and volume couldn't be started. From the looks of the log files this definitely looks like a set up issue where bricks were removed manually even when the volume was in existence which is *not supported*
Shekhar, As discussed, do you mind to close this bug now given that you are unable to hit it. Feel free to reopen if you hit it again. ~Atin
I am closing this bug now, please feel free to reopen if you hit it again.
As Atin mentoned, its closed