Bug 1443941 - Brick Multiplexing: seeing Input/Output Error for .trashcan
Summary: Brick Multiplexing: seeing Input/Output Error for .trashcan
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.3.0
Assignee: Mohit Agrawal
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: brick-multiplexing
Depends On: 1447389 1450728 1450729
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-04-20 10:04 UTC by Nag Pavan Chilakam
Modified: 2018-11-30 05:38 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-09-21 04:39:40 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Nag Pavan Chilakam 2017-04-20 10:04:50 UTC
Description of problem:
=====================
Observation: I had enabled brick multiplexing 
I am seeing EIO for .trashcan folder on the mount for an ecvol as below
[root@dhcp35-103 ecv82]# ls -la
total 12
drwxr-xr-x.  4 root root 4096 Apr 20 15:22 .
drwxr-xr-x. 15 root root 4096 Apr 20 15:17 ..
drwxr-xr-x.  2 root root 4096 Apr 20 15:22 dir1
[root@dhcp35-103 ecv82]# ls -lA
ls: cannot access .trashcan: Input/output error
total 4
drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1
d?????????? ? ?    ?       ?            ? .trashcan
[root@dhcp35-103 ecv82]# 



Steps
=====
Step1:
I had 6 node setup on which i created below volumes

Step2:
enable brick multiplexing

Step3:
ecreated below vols
ecv82-->an ec volume of 2x(8+2) spanning across nodes n1..n5 
distrep3-->a distrep x3 volume 2x3 spanning across n1..n3

now as expected the brick PIDs for bricks hosted by one node is same due to brick mux enabled (check under logs)

Step4:
I then went ahead and changed the log level to debug for distrep3 (which should use same brick log as ecv82)

Step5:
I then mounted distrep3 on  a fuse client
=======>NOTE: I am not seeing .trashcan folder on the mount, don't know why

Step6:
Did some IOs

Step7:
Set min-free disk limit to 50% for distrep3

step8:
did IOs to see if i am getting a warn for breaching minfree and got the below on client fuse log

[2017-04-20 09:45:58.717617] W [MSGID: 109033] [dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on subvolume 'distrep3-replicate-1' is getting full (55.00 %), consider adding more bricks
[2017-04-20 09:46:50.749409] W [MSGID: 109033] [dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on subvolume 'distrep3-replicate-2' is getting full (54.00 %), consider adding more bricks


Step9:
Now mounted ecv82 on a fuse client

Step10:
did an ls -lA and got the EIO


[root@dhcp35-103 ecv82]# ls -lA
ls: cannot access .trashcan: Input/output error
total 4
drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1
d?????????? ? ?    ?       ?            ? .trashcan
[root@dhcp35-103 ecv82]# 






#############logs ##############
[root@dhcp35-45 ~]# 
[root@dhcp35-45 ~]# gluster v status
Status of volume: distrep3
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.138:/rhs/brick11/distrep3    49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick11/distrep3    49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick11/distrep3    49152     0          Y       26260
Brick 10.70.35.138:/rhs/brick12/distrep3    49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick12/distrep3    49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick12/distrep3    49152     0          Y       26260
Brick 10.70.35.138:/rhs/brick13/distrep3    49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick13/distrep3    49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick13/distrep3    49152     0          Y       26260
Self-heal Daemon on localhost               N/A       N/A        Y       27630
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       11436
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       27693
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       26404
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       26077
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       6247 
 
Task Status of Volume distrep3
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: ecv82
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.138:/rhs/brick1/ecv82        49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick1/ecv82        49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick1/ecv82        49152     0          Y       26260
Brick 10.70.35.23:/rhs/brick1/ecv82         49152     0          Y       11328
Brick 10.70.35.112:/rhs/brick1/ecv82        49152     0          Y       27485
Brick 10.70.35.138:/rhs/brick2/ecv82        49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick2/ecv82        49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick2/ecv82        49152     0          Y       26260
Brick 10.70.35.23:/rhs/brick2/ecv82         49152     0          Y       11328
Brick 10.70.35.112:/rhs/brick2/ecv82        49152     0          Y       27485
Brick 10.70.35.138:/rhs/brick3/ecv82        49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick3/ecv82        49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick3/ecv82        49152     0          Y       26260
Brick 10.70.35.23:/rhs/brick3/ecv82         49152     0          Y       11328
Brick 10.70.35.112:/rhs/brick3/ecv82        49152     0          Y       27485
Brick 10.70.35.138:/rhs/brick4/ecv82        49152     0          Y       25929
Brick 10.70.35.130:/rhs/brick4/ecv82        49152     0          Y       6100 
Brick 10.70.35.122:/rhs/brick4/ecv82        49152     0          Y       26260
Brick 10.70.35.23:/rhs/brick4/ecv82         49152     0          Y       11328
Brick 10.70.35.112:/rhs/brick4/ecv82        49152     0          Y       27485
Self-heal Daemon on localhost               N/A       N/A        Y       27630
Self-heal Daemon on 10.70.35.122            N/A       N/A        Y       26404
Self-heal Daemon on 10.70.35.138            N/A       N/A        Y       26077
Self-heal Daemon on 10.70.35.23             N/A       N/A        Y       11436
Self-heal Daemon on 10.70.35.130            N/A       N/A        Y       6247 
Self-heal Daemon on 10.70.35.112            N/A       N/A        Y       27693
 
Task Status of Volume ecv82
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-45 ~]# gluster v info
 
Volume Name: distrep3
Type: Distributed-Replicate
Volume ID: 28a6c08e-b7a0-4135-88fa-4b9ae250d609
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 10.70.35.138:/rhs/brick11/distrep3
Brick2: 10.70.35.130:/rhs/brick11/distrep3
Brick3: 10.70.35.122:/rhs/brick11/distrep3
Brick4: 10.70.35.138:/rhs/brick12/distrep3
Brick5: 10.70.35.130:/rhs/brick12/distrep3
Brick6: 10.70.35.122:/rhs/brick12/distrep3
Brick7: 10.70.35.138:/rhs/brick13/distrep3
Brick8: 10.70.35.130:/rhs/brick13/distrep3
Brick9: 10.70.35.122:/rhs/brick13/distrep3
Options Reconfigured:
cluster.min-free-disk: 50
cluster.quorum-count: 1
diagnostics.brick-log-level: DEBUG
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
 
Volume Name: ecv82
Type: Distributed-Disperse
Volume ID: c2a84a0f-a95f-4264-984b-2e0879da7f99
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (8 + 2) = 20
Transport-type: tcp
Bricks:
Brick1: 10.70.35.138:/rhs/brick1/ecv82
Brick2: 10.70.35.130:/rhs/brick1/ecv82
Brick3: 10.70.35.122:/rhs/brick1/ecv82
Brick4: 10.70.35.23:/rhs/brick1/ecv82
Brick5: 10.70.35.112:/rhs/brick1/ecv82
Brick6: 10.70.35.138:/rhs/brick2/ecv82
Brick7: 10.70.35.130:/rhs/brick2/ecv82
Brick8: 10.70.35.122:/rhs/brick2/ecv82
Brick9: 10.70.35.23:/rhs/brick2/ecv82
Brick10: 10.70.35.112:/rhs/brick2/ecv82
Brick11: 10.70.35.138:/rhs/brick3/ecv82
Brick12: 10.70.35.130:/rhs/brick3/ecv82
Brick13: 10.70.35.122:/rhs/brick3/ecv82
Brick14: 10.70.35.23:/rhs/brick3/ecv82
Brick15: 10.70.35.112:/rhs/brick3/ecv82
Brick16: 10.70.35.138:/rhs/brick4/ecv82
Brick17: 10.70.35.130:/rhs/brick4/ecv82
Brick18: 10.70.35.122:/rhs/brick4/ecv82
Brick19: 10.70.35.23:/rhs/brick4/ecv82
Brick20: 10.70.35.112:/rhs/brick4/ecv82
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
cluster.brick-multiplex: enable
[root@dhcp35-45 ~]# 




Rationale Of the testing:
I wanted to check the behavior when we have brick mux in effect and we try to change some brick settings


Version-Release number of selected component (if applicable):
========
3.8.4-22

Comment 2 Nag Pavan Chilakam 2017-04-20 10:20:20 UTC
logs available at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1443941/

Comment 3 Nag Pavan Chilakam 2017-04-20 10:28:57 UTC
fuse mount log:
[2017-04-20 09:48:19.851779] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve (Input/output error)
[2017-04-20 09:48:19.854563] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:48:19.855996] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error]
[2017-04-20 09:48:19.856077] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 23: LOOKUP() /.trashcan => -1 (Input/output error)
[2017-04-20 09:52:35.145993] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:52:35.147543] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error]
[2017-04-20 09:52:35.147583] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve (Input/output error)
[2017-04-20 09:52:35.152434] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 866: LOOKUP() /.trashcan => -1 (Input/output error)
[2017-04-20 09:52:35.150938] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0
[2017-04-20 09:52:35.152401] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error]

Comment 5 Atin Mukherjee 2017-05-09 15:52:03 UTC
upstream patch : https://review.gluster.org/#/c/17225

Comment 8 Atin Mukherjee 2017-05-15 04:45:16 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106137

Comment 9 Atin Mukherjee 2017-05-17 05:24:19 UTC
Looks like we have an issue with this patch, moving this bug to POST.

Comment 10 Atin Mukherjee 2017-06-05 04:50:22 UTC
downstream patch :https://code.engineering.redhat.com/gerrit/#/c/108021/

Comment 12 Nag Pavan Chilakam 2017-06-10 07:26:25 UTC
tested on 3.8.4-27:
with above steps not seeing the problem anymore hence moving to verified
Also now i see .trashcan on all volumes with brick mux enabled

Comment 14 errata-xmlrpc 2017-09-21 04:39:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.