Bug 1508999 - [Fuse Sub-dir] After performing add-brick on volume,doing rm -rf * on subdir mount point fails with "Transport endpoint is not connected"
Summary: [Fuse Sub-dir] After performing add-brick on volume,doing rm -rf * on subdir ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: fuse
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: RHGS 3.4.0
Assignee: Amar Tumballi
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1503134 1549915
TreeView+ depends on / blocked
 
Reported: 2017-11-02 16:17 UTC by Manisha Saini
Modified: 2018-09-20 16:00 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.12.2-5
Doc Type: Bug Fix
Doc Text:
'subdir' mounted clients cannot heal the directory structure when an 'add-brick' is performed because distribute layer would not know the parent directories of subdirectory which is mounted, while performing directory self-heal. You can fix this by mounting the volume (without the subdirectory) on one of the server after add-brick, and run self-heal operations on the 'subdir' directories. This is now performed using 'hook' scripts, so that no user intervention is required.
Clone Of:
: 1549915 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:38:02 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:39:45 UTC

Description Manisha Saini 2017-11-02 16:17:47 UTC
Description of problem:

While sub-dir is mounted on client and add-brick is performed,doing rm -rf * on mount point fails to delete the directories present on mount point 



Version-Release number of selected component (if applicable):
glusterfs-api-3.8.4-51.el7rhgs.x86_64

How reproducible:
2/2

Steps to Reproduce:
1.Create 3 x (2 + 1) = 9 Arbiter volume.
2.Mount the volume on client via Fuse
3.Create a directory say "dir1" inside the mount point
4.Set permissions for the directory on volume 
# gluster v set glustervol auth.allow "/dir1(10.70.37.192)"
volume set: success

5.Mount the sub-dir "dir1" on client.
 mount -t glusterfs dhcp42-125.lab.eng.blr.redhat.com:glustervol/dir1 /mnt/posix_Parent/

5.Create 1000 directories on mount point

6.Perform add brick 
# gluster v add-brick glustervol dhcp42-127.lab.eng.blr.redhat.com:/gluster/brick3/3 dhcp42-129.lab.eng.blr.redhat.com:/gluster/brick3/3 dhcp42-119.lab.eng.blr.redhat.com:/gluster/brick3/3 
volume add-brick: success

7.After performing add-brick,do rm -rf * on mount point




Actual results:

rm -rf * on mount point results in "Transport endpoint is not connected".Even though the subdir is mounted on client.

rm: cannot remove ‘sd979’: Transport endpoint is not connected
rm: cannot remove ‘sd98’: Transport endpoint is not connected
rm: cannot remove ‘sd980’: Transport endpoint is not connected
rm: cannot remove ‘sd981’: Transport endpoint is not connected
rm: cannot remove ‘sd982’: Transport endpoint is not connected
rm: cannot remove ‘sd983’: Transport endpoint is not connected
rm: cannot remove ‘sd984’: Transport endpoint is not connected
rm: cannot remove ‘sd985’: Transport endpoint is not connected
rm: cannot remove ‘sd986’: Transport endpoint is not connected

]# df
Filesystem                                         1K-blocks      Used  Available Use% Mounted on
/dev/mapper/rhel_dhcp37--192-root                   17811456   2959232   14852224  17% /
devtmpfs                                             1930048         0    1930048   0% /dev
tmpfs                                                1940904         0    1940904   0% /dev/shm
tmpfs                                                1940904     75644    1865260   4% /run
tmpfs                                                1940904         0    1940904   0% /sys/fs/cgroup
/dev/sda1                                            1038336    219524     818812  22% /boot
rhsqe-repo.lab.eng.blr.redhat.com:/opt            1953887232 405861376 1448791040  22% /opt
tmpfs                                                 388184         0     388184   0% /run/user/0
dhcp42-125.lab.eng.blr.redhat.com:glustervol/dir1   62539776    163712   62376064   1% /mnt/posix_Parent


Client mount logs-

================

[2017-11-02 15:48:31.961601] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11954: RMDIR() /sd974 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:31.977846] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11962: RMDIR() /sd975 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:31.992027] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11970: RMDIR() /sd976 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.003641] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11978: RMDIR() /sd977 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.014953] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11986: RMDIR() /sd978 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.026279] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 11994: RMDIR() /sd979 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.038034] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12002: RMDIR() /sd98 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.050025] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12010: RMDIR() /sd980 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.062428] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12018: RMDIR() /sd981 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.074406] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12026: RMDIR() /sd982 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.085732] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12034: RMDIR() /sd983 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.099808] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12042: RMDIR() /sd984 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.114376] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12050: RMDIR() /sd985 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.128303] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12058: RMDIR() /sd986 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.146789] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12066: RMDIR() /sd987 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.161191] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12074: RMDIR() /sd988 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.174965] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12082: RMDIR() /sd989 => -1 (Transport endpoint is not connected)
[2017-11-02 15:48:32.189015] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 12090: RMDIR() /sd99 => -1 (Transport endpoint is not connected)
=================

Expected results:
rm -rf * should delete the directories present on mount point

Additional info:

Attaching sosreports shortly

Comment 3 Amar Tumballi 2017-11-03 05:18:50 UTC
I propose to make this a known issue as the feature is in TP.

The steps to resolve this issues are:

* After 'add-brick' operation, do a 'stat ${all_subdirs_exported}' on the full volume mount, and then continue the operations in subdir mount-points.

Or,

* After 'add-brick' operation, run 'rebalance' (even just rebalance fix-layout alone is good enough), and then continue rm -rf operations on subdir mount points.

Comment 4 Amar Tumballi 2017-11-03 06:36:17 UTC
https://review.gluster.org/18645 is a method to fix it.. but the patch needs more review and more testing, doesn't look like we can fix it by 3.3.1 and hence I still recommend this as the 'known issue'.

Marking it as POST as the RCA is known, and a patch to automatically handle it is posted upstream. (Note that we may need similar hook script in replace-brick too).

Comment 8 Amar Tumballi 2017-11-17 05:30:18 UTC
DocText Looks fine.

Comment 14 Manisha Saini 2018-03-28 07:24:23 UTC
Verified this BZ on glusterfs-3.12.2-6.el7rhgs.x86_64

Steps-
1.Create 4*3 dist-replicate volume.
2.Mount the volume on client via FUSE
3.Create 4 dirs inside the mount point
4.Set auth allow permissions on volume
#gluster v set Ganeshavol1 auth.allow "/dir1(10.70.46.125),/dir2(10.70.46.20),/dir3(10.70.47.33),/(*)"
5.Mount the subdirs on respective client
6.Perform some IO's(Create directories)
7.Perform add-brick operation on that volume
#gluster v add-brick Ganeshavol1 dhcp47-193.lab.eng.blr.redhat.com:/gluster/brick1/new1 dhcp46-116.lab.eng.blr.redhat.com:/gluster/brick1/new1 dhcp46-184.lab.eng.blr.redhat.com:/gluster/brick1/new1
8.Perform rm -rf * from all the mount points


Moving this BZ to verified state.

Comment 17 errata-xmlrpc 2018-09-04 06:38:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 18 Amar Tumballi 2018-09-04 07:22:57 UTC
Made minor changes, and everything looks good now, IMO.


Note You need to log in before you can comment on or make changes to this bug.