Description of problem: ======================= Had a 2*2 as cold and 2*2 as hot tier on a 6node cluster. Had I/O taking place from 2 clients over fuse and nfs (ofcourse) on different directory paths. Took a couple of snapshots _while_ the I/O was going on. Stopped I/O, got the volume offline, did a snapshot restore, and got the volume back online. Post that, did a 'rm -rf *' on the fuse as well as nfs clients in their respective directory paths. The command on fuse client failed with "Directory not empty" on one of the nested directory paths. The mountpoint does not list any file/directory being present in the said location. The backend bricks were checked for the presence of any file/directory, and it showed the presence of 1 file in hot tier, and the corresponding T file in cold tier. The file was present only on ONE of the replica pairs in cold as well as hot (and not on both). However, 'gluster volume heal info' showed no pending heals. When attached to gdb, it showed that READDIRP chose that particular replica pair which did not have the file, resulting in directory contents being shown as empty @mountpoint. Ravishankar suggested a workaround, by executing the command 'ls <filename>' or 'stat <filename>' on the mountpoint path, thereby triggering an internal heal. And post that 'rm -rf *', which should successfully delete the file and the directory. That worked! As to why-did-the-filesystem-end-up-in-this-state front, 'snapshot create' command when it would have been triggered, it would have taken the snapshot at THAT instant when this file would have been copied to only _one_ of the replica pairs, or maybe some other race there.. and the corresponding snapshot restore would have restored the filesystem back to that same incorrect state - just one of the hypothesis. Sosreports will be copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/ Version-Release number of selected component (if applicable): ============================================================= 3.8.4-14 How reproducible: ================ 1:1 Additional info: ================ [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster v info vola Volume Name: vola Type: Tier Volume ID: 48ab6954-765c-46e2-846b-cfc1412c96ad Status: Started Snapshot Count: 2 Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.46.222:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick1/vola_tier3 Brick2: 10.70.46.221:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick2/vola_tier2 Brick3: 10.70.46.222:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick3/vola_tier1 Brick4: 10.70.46.221:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick4/vola_tier0 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: 10.70.46.239:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick5/vola_0 Brick6: 10.70.46.240:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick6/vola_1 Brick7: 10.70.46.242:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick7/vola_2 Brick8: 10.70.46.218:/run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick8/vola_3 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: off features.quota: off features.inode-quota: off features.quota-deem-statfs: off [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster snap list ozone_snap1 ozone_snap2 ozone_snap3 vola_snap1_GMT-2017.02.22-06.03.37 vola_snap2 [root@dhcp46-239 ~]# rpm -qa | grep gluster glusterfs-events-3.8.4-14.el7rhgs.x86_64 glusterfs-3.8.4-14.el7rhgs.x86_64 glusterfs-server-3.8.4-14.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-15.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-14.el7rhgs.x86_64 glusterfs-fuse-3.8.4-14.el7rhgs.x86_64 glusterfs-rdma-3.8.4-14.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-api-3.8.4-14.el7rhgs.x86_64 glusterfs-libs-3.8.4-14.el7rhgs.x86_64 glusterfs-cli-3.8.4-14.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch python-gluster-3.8.4-14.el7rhgs.noarch glusterfs-geo-replication-3.8.4-14.el7rhgs.x86_64 gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster peer status Number of Peers: 5 Hostname: dhcp46-242.lab.eng.blr.redhat.com Uuid: 838465bf-1fd8-4f85-8599-dbc8367539aa State: Peer in Cluster (Connected) Other names: 10.70.46.242 Hostname: 10.70.46.240 Uuid: 5bff39d7-cd9c-4dbb-86eb-2a7ba6dfea3d State: Peer in Cluster (Connected) Hostname: 10.70.46.218 Uuid: c2fbc432-b7a9-4db1-9b9d-a8d82e998923 State: Peer in Cluster (Connected) Hostname: 10.70.46.221 Uuid: 1277cf78-640e-46e8-a3d1-46e067508814 State: Peer in Cluster (Connected) Hostname: 10.70.46.222 Uuid: 81184471-cbf7-47aa-ba41-21f32bb644b0 State: Peer in Cluster (Connected) [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster v status vola Status of volume: vola Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.46.222:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick1/vola_tie r3 49153 0 Y 10182 Brick 10.70.46.221:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick2/vola_tie r2 49153 0 Y 9472 Brick 10.70.46.222:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick3/vola_tie r1 49156 0 Y 10202 Brick 10.70.46.221:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick4/vola_tie r0 49156 0 Y 9492 Cold Bricks: Brick 10.70.46.239:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick5/vola_0 49152 0 Y 29348 Brick 10.70.46.240:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick6/vola_1 49152 0 Y 29232 Brick 10.70.46.242:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick7/vola_2 49153 0 Y 26742 Brick 10.70.46.218:/run/gluster/snaps/45480 7c64c5249a6a2c8fe20def23806/brick8/vola_3 49155 0 Y 13921 NFS Server on localhost 2049 0 Y 14049 Self-heal Daemon on localhost N/A N/A Y 29398 NFS Server on dhcp46-242.lab.eng.blr.redhat .com 2049 0 Y 14531 Self-heal Daemon on dhcp46-242.lab.eng.blr. redhat.com N/A N/A Y 26803 NFS Server on 10.70.46.240 2049 0 Y 8021 Self-heal Daemon on 10.70.46.240 N/A N/A Y 29281 NFS Server on 10.70.46.218 2049 0 Y 21403 Self-heal Daemon on 10.70.46.218 N/A N/A Y 13961 NFS Server on 10.70.46.222 2049 0 Y 21343 Self-heal Daemon on 10.70.46.222 N/A N/A Y 10253 NFS Server on 10.70.46.221 2049 0 Y 20786 Self-heal Daemon on 10.70.46.221 N/A N/A Y 9545 Task Status of Volume vola ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp46-239 ~]# <COLD TIER> =========== [root@dhcp46-240 ~]# ll /run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick6/vola_1/fuse/crefi/level05/level15/level25/58ad2d28%%4GV4TOB7H4 ---------T. 2 root root 0 Feb 22 11:48 /run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick6/vola_1/fuse/crefi/level05/level15/level25/58ad2d28%%4GV4TOB7H4 [root@dhcp46-240 ~]# <HOT TIER> ========== [root@dhcp46-222 ~]# ll /run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick1/vola_tier3/fuse/crefi/level05/level15/level25/ -rw-r--r--. 2 root root 0 Feb 22 11:48 /run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick1/vola_tier3/fuse/crefi/level05/level15/level25/58ad2d28%%4GV4TOB7H4 [root@dhcp46-222 ~]# [root@dhcp46-221 ~]# ll /run/gluster/snaps/454807c64c5249a6a2c8fe20def23806/brick2/vola_tier2/fuse/crefi/level05/level15/level25/ total 0 [root@dhcp46-221 ~]# ======================== CLIENT LOGS ############################## [root@dhcp46-245 ~]# cd /mnt/vola_new/fuse/crefi/ [root@dhcp46-245 crefi]# ls level05 [root@dhcp46-245 crefi]# [root@dhcp46-245 crefi]# [root@dhcp46-245 crefi]# rm -rf * rm: cannot remove ‘level05/level15/level25’: Directory not empty [root@dhcp46-245 crefi]# [root@dhcp46-245 crefi]# [root@dhcp46-245 crefi]# ls -al level05/level15/level25 total 8 drwxr-xr-x. 2 root root 4096 Feb 22 12:11 . drwxr-xr-x. 3 root root 4096 Feb 22 12:12 .. [root@dhcp46-245 crefi]# pwd /mnt/vola_new/fuse/crefi [root@dhcp46-245 crefi]#
[qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$ pwd /home/repo/sosreports/1426128 [qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$ ll total 339168 -rwxr-xr-x. 1 qe qe 58412064 Feb 23 14:55 sosreport-dhcp46-218.lab.eng.blr.redhat.com-20170222175628.tar.xz -rwxr-xr-x. 1 qe qe 48999220 Feb 23 14:55 sosreport-dhcp46-221.lab.eng.blr.redhat.com-20170222175638.tar.xz -rwxr-xr-x. 1 qe qe 47173564 Feb 23 14:55 sosreport-dhcp46-222.lab.eng.blr.redhat.com-20170222175652.tar.xz -rwxr-xr-x. 1 qe qe 61076304 Feb 23 14:55 sosreport-dhcp46-239.lab.eng.blr.redhat.com-20170222175607.tar.xz -rwxr-xr-x. 1 qe qe 61326704 Feb 23 14:55 sosreport-dhcp46-240.lab.eng.blr.redhat.com-20170222175612.tar.xz -rwxr-xr-x. 1 qe qe 70310468 Feb 23 14:55 sosreport-dhcp46-242.lab.eng.blr.redhat.com-20170222175620.tar.xz [qe@rhsqe-repo 1426128]$ [qe@rhsqe-repo 1426128]$
Some additional information: Just to confirm the snapshot vs create race that could have happened as described in the BZ, I also had a look at the xattrs of file/T-file and the corresponding parent dirs on the bricks of the replica. There were no afr-xattrs on them and there were no entry-self-heal log messages on the mount or shd logs. Pranith, 1) I'm wondering if we should we document this as a known issue with the workaround that we need to stat the file from the mount if rmdir fails with ENOTEMPY. This would involve comparing the dir on bricks of the replica subol to find the list of missing files in the first place. 2)Disabling optimistic-change-log for entry txns could solve the issue where the presence of the dirty-xattr on the parent dir can trigger a heal (conservative merge). But this is not exposed via the CLI even if were, toggling it every time we take snapshot does not seem practical.
Edited the doc text slightly for the release notes
What's the plan on addressing this bug? Are we even going to address this known issue in coming future?