Description of problem: Mount point inaccessible when try to access. Version-Release number of selected component (if applicable): [root@dhcp46-231 gluster]# rpm -qa | grep gluster gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-server-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-api-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-cli-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.el7rhgs.noarch glusterfs-libs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-fuse-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 How reproducible: Hit it once logs placed @ rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug> Steps performed: 1. Create an arbiter volume 1*3 volume named mdcache 2. Mount the volume on two different clients /mnt on the both the clients. 3. Replace the brick0 with new brick. check for heal info wait for it to complete 4. touch files{1..10000} from one client 5. Replace the brick brick2(arbiter) with new brick simultaneously create newfiles{1..10000} on the mount point from second client. 4. When completed. echo 1234 > newfiles from (1..10000) using script.sh placed with log files from first client. 5 Check for gluster volume heal mdcache info 6. / directory of the brick and one more file needs to be healed. [root@dhcp46-231 gluster]# gluster volume heal mdcache info Brick dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick1/mdcache / - Possibly undergoing heal /newfiles0 Status: Connected Number of entries: 2 Brick dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/mdcache Status: Connected Number of entries: 0 Brick dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick1/mdcache / - Possibly undergoing heal /newfiles0 Status: Connected Number of entries: 2 ################################################################## [root@dhcp46-231 gluster]# getfattr -d -m . -e hex /bricks/brick1/mdcache/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/mdcache/ security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.mdcache-client-1=0x000000000000000000000008 trusted.afr.mdcache-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x5a5aa31b79d84f458641f7c032141e53 ***************************** [root@dhcp47-111 gluster]# getfattr -d -m . -e hex /bricks/brick1/mdcache/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick1/mdcache/ security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.mdcache-client-1=0x000000000000000000000008 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x5a5aa31b79d84f458641f7c032141e53 Actual results: 1) There were hangs observed on the mount points. 2) Heals couldn't get completed. 3) Transport end point not connected errors observed in the logs of client and bricks 4) Multiple Blocked locks observed in the statedumps of the bricks. 5) Mount point not accessible. Expected results: No hangs should be observed No pending heals should be there. Additional info:
Karan, Can you attach brick and client log files? regards, Raghavendra
Also, has this test case been tried on 3.2 build without md-cache options?
Poornima, yes i tried without MDCACHE build but i wasn't able to hit it. Thanks & regards Karan Sandha
*** Bug 1388414 has been marked as a duplicate of this bug. ***
I hit this case, in my systemic testing, where the replica pair has one brick down. However the client sees that both the bricks are down inspite of one being up. Hence if we try to cat a file sitting on the brick, we get transportendpoint error and if we try to write to a file on this brick we get EIO version:3.8.4-5
sosreport of client is availble at [qe@rhsqe-repo nchilaka]$ pwd /var/www/html/sosreports/nchilaka [qe@rhsqe-repo nchilaka]$ /var/www/html/sosreports/nchilaka/bug.1385605 [root@dhcp35-191 ~]# gluster v info gl Volume Name: sysvol Type: Distributed-Replicate Volume ID: b1ef4d84-0614-4d5d-9e2e-b19183996e43 Status: Started Snapshot Count: 0 Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.191:/rhs/brick1/sysvol Brick2: 10.70.37.108:/rhs/brick1/sysvol Brick3: 10.70.35.3:/rhs/brick1/sysvol Brick4: 10.70.37.66:/rhs/brick1/sysvol Brick5: 10.70.35.191:/rhs/brick2/sysvol Brick6: 10.70.37.108:/rhs/brick2/sysvol Brick7: 10.70.35.3:/rhs/brick2/sysvol Brick8: 10.70.37.66:/rhs/brick2/sysvol Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.stat-prefetch: on performance.cache-invalidation: on cluster.shd-max-threads: 10 features.cache-invalidation-timeout: 400 features.cache-invalidation: on performance.md-cache-timeout: 300 features.uss: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@dhcp35-191 ~]# gluster v status Status of volume: sysvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.191:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.108:/rhs/brick1/sysvol 49152 0 Y 27848 Brick 10.70.35.3:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.66:/rhs/brick1/sysvol 49152 0 Y 28853 Brick 10.70.35.191:/rhs/brick2/sysvol 49153 0 Y 18344 Brick 10.70.37.108:/rhs/brick2/sysvol N/A N/A N N/A Brick 10.70.35.3:/rhs/brick2/sysvol 49153 0 Y 11727 Brick 10.70.37.66:/rhs/brick2/sysvol N/A N/A N N/A Snapshot Daemon on localhost 49154 0 Y 18461 Self-heal Daemon on localhost N/A N/A Y 18364 Quota Daemon on localhost N/A N/A Y 18410 Snapshot Daemon on 10.70.35.3 49154 0 Y 11826 Self-heal Daemon on 10.70.35.3 N/A N/A Y 11747 Quota Daemon on 10.70.35.3 N/A N/A Y 11779 Snapshot Daemon on 10.70.37.66 49154 0 Y 28970 Self-heal Daemon on 10.70.37.66 N/A N/A Y 28892 Quota Daemon on 10.70.37.66 N/A N/A Y 28923 Snapshot Daemon on 10.70.37.108 49154 0 Y 27965 Self-heal Daemon on 10.70.37.108 N/A N/A Y 27887 Quota Daemon on 10.70.37.108 N/A N/A Y 27918 Task Status of Volume sysvol ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-191 ~]#
*** Bug 1392906 has been marked as a duplicate of this bug. ***
Patch posted upstream at http://review.gluster.org/#/c/15916
Upstream master : http://review.gluster.org/15916 Upstream release-3.8 : http://review.gluster.org/16025 Upstream release-3.9 : http://review.gluster.org/16026 Downstream : https://code.engineering.redhat.com/gerrit/92095
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html
Thanks Nag for the update. @Rejy : do we need hotfix flag set on this bug?
*** Bug 1429145 has been marked as a duplicate of this bug. ***