Bug 1385605 - fuse mount point not accessible
Summary: fuse mount point not accessible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rpc
Version: rhgs-3.2
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.2.0
Assignee: Raghavendra Talur
QA Contact: Karan Sandha
URL:
Whiteboard:
: 1388414 1429145 (view as bug list)
Depends On:
Blocks: 1351528 1386626 1388323 1392906 1397267 1398930 1401534 1408949 1474007
TreeView+ depends on / blocked
 
Reported: 2016-10-17 12:06 UTC by Karan Sandha
Modified: 2018-01-16 05:53 UTC (History)
22 users (show)

Fixed In Version: glusterfs-3.8.4-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1386626 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:11:05 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Karan Sandha 2016-10-17 12:06:52 UTC
Description of problem:
Mount point inaccessible when try to access.

Version-Release number of selected component (if applicable):
[root@dhcp46-231 gluster]# rpm -qa | grep gluster
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-server-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-api-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-cli-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.el7rhgs.noarch
glusterfs-libs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64
glusterfs-fuse-3.8.4-2.26.git0a405a4.el7rhgs.x86_64


How reproducible: Hit it once
logs placed @ rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps performed:
1. Create an arbiter volume 1*3 volume named mdcache 
2. Mount the volume on two different clients /mnt on the both the clients.
3. Replace the brick0 with new brick. check for heal info wait for it to complete 
4. touch files{1..10000} from one client 
5. Replace the brick brick2(arbiter) with new brick simultaneously create newfiles{1..10000} on the mount point from second client. 
4. When completed. echo 1234 > newfiles from (1..10000) using script.sh placed with log files from first client.
5  Check for gluster volume heal mdcache info
6. / directory of the brick and one more file needs to be healed.

 [root@dhcp46-231 gluster]# gluster volume heal mdcache info 
Brick dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick1/mdcache
/ - Possibly undergoing heal

/newfiles0 
Status: Connected
Number of entries: 2

Brick dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/mdcache
Status: Connected
Number of entries: 0

Brick dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick1/mdcache
/ - Possibly undergoing heal

/newfiles0 
Status: Connected
Number of entries: 2


##################################################################
[root@dhcp46-231 gluster]# getfattr -d -m . -e hex /bricks/brick1/mdcache/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/mdcache/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.mdcache-client-1=0x000000000000000000000008
trusted.afr.mdcache-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5a5aa31b79d84f458641f7c032141e53

*****************************
[root@dhcp47-111 gluster]#  getfattr -d -m . -e hex /bricks/brick1/mdcache/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/mdcache/
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.mdcache-client-1=0x000000000000000000000008
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x5a5aa31b79d84f458641f7c032141e53

Actual results:

1) There were hangs observed on the mount points.
2) Heals couldn't get completed.
3) Transport end point not connected errors observed in the logs of client and bricks
4) Multiple Blocked locks observed in the statedumps of the bricks.
5) Mount point not accessible. 


Expected results:
No hangs should be observed 
No pending heals should be there.

Additional info:

Comment 2 Raghavendra G 2016-10-18 04:56:23 UTC
Karan,

Can you attach brick and client log files?

regards,
Raghavendra

Comment 5 Poornima G 2016-10-20 07:03:34 UTC
Also, has this test case been tried on 3.2 build without md-cache options?

Comment 7 Karan Sandha 2016-10-24 06:58:50 UTC
Poornima,

yes i tried without MDCACHE build but i wasn't able to hit it. 

Thanks & regards
Karan Sandha

Comment 11 Pranith Kumar K 2016-10-25 11:14:40 UTC
*** Bug 1388414 has been marked as a duplicate of this bug. ***

Comment 15 nchilaka 2016-11-15 09:39:40 UTC
I hit this case, in my systemic testing, where the replica pair has one brick down.
However the client sees that both the bricks are down inspite of one being up.
Hence if we try to cat  a file sitting on the brick, we get transportendpoint error
and if we try to write to a file on this brick we get EIO

version:3.8.4-5

Comment 16 nchilaka 2016-11-15 10:03:33 UTC
sosreport of client is availble at [qe@rhsqe-repo nchilaka]$ pwd
/var/www/html/sosreports/nchilaka
[qe@rhsqe-repo nchilaka]$ /var/www/html/sosreports/nchilaka/bug.1385605

[root@dhcp35-191 ~]# gluster v info
gl 
Volume Name: sysvol
Type: Distributed-Replicate
Volume ID: b1ef4d84-0614-4d5d-9e2e-b19183996e43
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/sysvol
Brick2: 10.70.37.108:/rhs/brick1/sysvol
Brick3: 10.70.35.3:/rhs/brick1/sysvol
Brick4: 10.70.37.66:/rhs/brick1/sysvol
Brick5: 10.70.35.191:/rhs/brick2/sysvol
Brick6: 10.70.37.108:/rhs/brick2/sysvol
Brick7: 10.70.35.3:/rhs/brick2/sysvol
Brick8: 10.70.37.66:/rhs/brick2/sysvol
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.stat-prefetch: on
performance.cache-invalidation: on
cluster.shd-max-threads: 10
features.cache-invalidation-timeout: 400
features.cache-invalidation: on
performance.md-cache-timeout: 300
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp35-191 ~]# gluster v status
Status of volume: sysvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/sysvol       N/A       N/A        N       N/A  
Brick 10.70.37.108:/rhs/brick1/sysvol       49152     0          Y       27848
Brick 10.70.35.3:/rhs/brick1/sysvol         N/A       N/A        N       N/A  
Brick 10.70.37.66:/rhs/brick1/sysvol        49152     0          Y       28853
Brick 10.70.35.191:/rhs/brick2/sysvol       49153     0          Y       18344
Brick 10.70.37.108:/rhs/brick2/sysvol       N/A       N/A        N       N/A  
Brick 10.70.35.3:/rhs/brick2/sysvol         49153     0          Y       11727
Brick 10.70.37.66:/rhs/brick2/sysvol        N/A       N/A        N       N/A  
Snapshot Daemon on localhost                49154     0          Y       18461
Self-heal Daemon on localhost               N/A       N/A        Y       18364
Quota Daemon on localhost                   N/A       N/A        Y       18410
Snapshot Daemon on 10.70.35.3               49154     0          Y       11826
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       11747
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       11779
Snapshot Daemon on 10.70.37.66              49154     0          Y       28970
Self-heal Daemon on 10.70.37.66             N/A       N/A        Y       28892
Quota Daemon on 10.70.37.66                 N/A       N/A        Y       28923
Snapshot Daemon on 10.70.37.108             49154     0          Y       27965
Self-heal Daemon on 10.70.37.108            N/A       N/A        Y       27887
Quota Daemon on 10.70.37.108                N/A       N/A        Y       27918
 
Task Status of Volume sysvol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-191 ~]#

Comment 17 Jiffin 2016-11-16 12:46:11 UTC
*** Bug 1392906 has been marked as a duplicate of this bug. ***

Comment 18 Raghavendra Talur 2016-11-23 12:49:40 UTC
Patch posted upstream at http://review.gluster.org/#/c/15916

Comment 19 rjoseph 2016-12-05 14:22:09 UTC
Upstream master      : http://review.gluster.org/15916
Upstream release-3.8 : http://review.gluster.org/16025
Upstream release-3.9 : http://review.gluster.org/16026

Downstream : https://code.engineering.redhat.com/gerrit/92095

Comment 23 errata-xmlrpc 2017-03-23 06:11:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Comment 40 Bipin Kunal 2017-06-19 07:11:08 UTC
Thanks Nag for the update.

@Rejy : do we need hotfix flag set on this bug?

Comment 44 Raghavendra G 2017-12-01 06:35:21 UTC
*** Bug 1429145 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.