+++ This bug was initially created as a clone of Bug #1443941 +++ Description of problem: ===================== Observation: I had enabled brick multiplexing I am seeing EIO for .trashcan folder on the mount for an ecvol as below # ls -la total 12 drwxr-xr-x. 4 root root 4096 Apr 20 15:22 . drwxr-xr-x. 15 root root 4096 Apr 20 15:17 .. drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1 # ls -lA ls: cannot access .trashcan: Input/output error total 4 drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1 d?????????? ? ? ? ? ? .trashcan Steps ===== Step1: I had 6 node setup on which i created below volumes Step2: enable brick multiplexing Step3: ecreated below vols ecv82-->an ec volume of 2x(8+2) spanning across nodes n1..n5 distrep3-->a distrep x3 volume 2x3 spanning across n1..n3 now as expected the brick PIDs for bricks hosted by one node is same due to brick mux enabled (check under logs) Step4: I then went ahead and changed the log level to debug for distrep3 (which should use same brick log as ecv82) Step5: I then mounted distrep3 on a fuse client =======>NOTE: I am not seeing .trashcan folder on the mount, don't know why Step6: Did some IOs Step7: Set min-free disk limit to 50% for distrep3 step8: did IOs to see if i am getting a warn for breaching minfree and got the below on client fuse log [2017-04-20 09:45:58.717617] W [MSGID: 109033] [dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on subvolume 'distrep3-replicate-1' is getting full (55.00 %), consider adding more bricks [2017-04-20 09:46:50.749409] W [MSGID: 109033] [dht-diskusage.c:263:dht_is_subvol_filled] 0-distrep3-dht: disk space on subvolume 'distrep3-replicate-2' is getting full (54.00 %), consider adding more bricks Step9: Now mounted ecv82 on a fuse client Step10: did an ls -lA and got the EIO [root@dhcp35-103 ecv82]# ls -lA ls: cannot access .trashcan: Input/output error total 4 drwxr-xr-x. 2 root root 4096 Apr 20 15:22 dir1 d?????????? ? ? ? ? ? .trashcan [root@dhcp35-103 ecv82]# #############logs ############## Task Status of Volume ecv82 ------------------------------------------------------------------------------ There are no active volume tasks # gluster v info Volume Name: distrep3 Type: Distributed-Replicate Volume ID: 28a6c08e-b7a0-4135-88fa-4b9ae250d609 Status: Started Snapshot Count: 0 Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: 10.70.35.138:/rhs/brick11/distrep3 Brick2: 10.70.35.130:/rhs/brick11/distrep3 Brick3: 10.70.35.122:/rhs/brick11/distrep3 Brick4: 10.70.35.138:/rhs/brick12/distrep3 Brick5: 10.70.35.130:/rhs/brick12/distrep3 Brick6: 10.70.35.122:/rhs/brick12/distrep3 Brick7: 10.70.35.138:/rhs/brick13/distrep3 Brick8: 10.70.35.130:/rhs/brick13/distrep3 Brick9: 10.70.35.122:/rhs/brick13/distrep3 Options Reconfigured: cluster.min-free-disk: 50 cluster.quorum-count: 1 diagnostics.brick-log-level: DEBUG transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable Volume Name: ecv82 Type: Distributed-Disperse Volume ID: c2a84a0f-a95f-4264-984b-2e0879da7f99 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (8 + 2) = 20 Transport-type: tcp Bricks: Brick1: 10.70.35.138:/rhs/brick1/ecv82 Brick2: 10.70.35.130:/rhs/brick1/ecv82 Brick3: 10.70.35.122:/rhs/brick1/ecv82 Brick4: 10.70.35.23:/rhs/brick1/ecv82 Brick5: 10.70.35.112:/rhs/brick1/ecv82 Brick6: 10.70.35.138:/rhs/brick2/ecv82 Brick7: 10.70.35.130:/rhs/brick2/ecv82 Brick8: 10.70.35.122:/rhs/brick2/ecv82 Brick9: 10.70.35.23:/rhs/brick2/ecv82 Brick10: 10.70.35.112:/rhs/brick2/ecv82 Brick11: 10.70.35.138:/rhs/brick3/ecv82 Brick12: 10.70.35.130:/rhs/brick3/ecv82 Brick13: 10.70.35.122:/rhs/brick3/ecv82 Brick14: 10.70.35.23:/rhs/brick3/ecv82 Brick15: 10.70.35.112:/rhs/brick3/ecv82 Brick16: 10.70.35.138:/rhs/brick4/ecv82 Brick17: 10.70.35.130:/rhs/brick4/ecv82 Brick18: 10.70.35.122:/rhs/brick4/ecv82 Brick19: 10.70.35.23:/rhs/brick4/ecv82 Brick20: 10.70.35.112:/rhs/brick4/ecv82 Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.brick-multiplex: enable [root@dhcp35-45 ~]# Rationale Of the testing: I wanted to check the behavior when we have brick mux in effect and we try to change some brick settings --- Additional comment from nchilaka on 2017-04-20 06:28:57 EDT --- fuse mount log: [2017-04-20 09:48:19.851779] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve (Input/output error) [2017-04-20 09:48:19.854563] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0 [2017-04-20 09:48:19.855996] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error] [2017-04-20 09:48:19.856077] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 23: LOOKUP() /.trashcan => -1 (Input/output error) [2017-04-20 09:52:35.145993] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0 [2017-04-20 09:52:35.147543] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error] [2017-04-20 09:52:35.147583] W [fuse-resolve.c:61:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/.trashcan: failed to resolve (Input/output error) [2017-04-20 09:52:35.152434] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 866: LOOKUP() /.trashcan => -1 (Input/output error) [2017-04-20 09:52:35.150938] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-ecv82-dht: Found anomalies in /.trashcan (gfid = 00000000-0000-0000-0000-000000000005). Holes=1 overlaps=0 [2017-04-20 09:52:35.152401] W [MSGID: 109065] [dht-selfheal.c:1410:dht_selfheal_dir_mkdir_lock_cbk] 0-ecv82-dht: acquiring inodelk failed for /.trashcan [Input/output error] --- Additional comment from Jiffin on 2017-04-27 06:17:27 EDT --- While trying out this bug, i have found the following. When volume is started with brick multiplexing is enabled, ".trashcan" was created only on three bricks out of 20 bricks(2x(8+2)) , not on all the subvolume. I have gut feeling that this might cause for this bug and bz1443939. P.S : I don't have enough knowledge about brick multiplexing to command why it is happening and the test was performed in my workstation. Also if possible I request QA to retest above scenario using following steps 1.) create the volume 2.) start the volume 3.) enable brick-mulitplexing 4.) restart the volume(stop and start) 5.) Then retest the case
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#3) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#4) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#5) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#6) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17225 (glusterfsd: send PARENT_UP on brick attach) posted (#7) for review on master by Atin Mukherjee (amukherj)
COMMIT: https://review.gluster.org/17225 committed in master by Jeff Darcy (jeff.us) ------ commit 86ad032949cb80b6ba3df9dc8268243529d4eb84 Author: Atin Mukherjee <amukherj> Date: Tue May 9 21:05:50 2017 +0530 glusterfsd: send PARENT_UP on brick attach With brick multiplexing being enabled, if a brick is instance attached to a process then a PARENT_UP event is needed so that it reaches right till posix layer and then from posix CHILD_UP event is sent back to all the children. Change-Id: Ic341086adb3bbbde0342af518e1b273dd2f669b9 BUG: 1447389 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17225 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us>
https://review.gluster.org/17225 patch is broken, moving it back to ASSIGNED.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/