Tiered volume files disappear when a hot brick is failed/restored until the tier detached. Files residing in the hot tier of distributed hot-tiered volume disappear when a hot tier brick is failed/restored. The missing files do not reappear until the tier is detached. The hot tier resident files are expected to disappear when the brick it is on fails. When the brick is restored, they should come back, but often they do not. This occurs when they stop showing up in mount point 'ls -lsh' with results '??????????'. In some cases, doing an ls or open on the full path name of the file will bring it back, other times it will not. In those cases the hot tier needs to be detached to get them back. The problem occurs using either NFS or CIFS/Fuse mounts. The problem was first seen with the cold tier being a Disperse volume, but also occurs with a Distributed cold tier volume. The problem was first seen on GlusterFS 3.12.14, and has been reproduced on GlusterFS 5.2. Note that this first happened on a production system, and was then reproduced in a lab environment. Test plan below. # glusterd -V glusterfs 5.2 ##### Create the brick dirs and cold tier volume. # mkdir /exports/cold-brick-1/dir # mkdir /exports/cold-brick-2/dir # mkdir /exports/cold-brick-3/dir # mkdir /exports/hot-brick-1/dir # mkdir /exports/hot-brick-2/dir # mkdir /exports/hot-brick-3/dir # gluster volume create tiered-vol transport tcp 10.0.0.5:/exports/cold-brick-1/dir volume create: tiered-vol: success: please start the volume to access data # gluster volume start tiered-vol volume start: tiered-vol: success ##### Expand the cold tier volume. # gluster volume add-brick tiered-vol 10.0.0.5:/exports/cold-brick-2/dir/ volume add-brick: success # gluster volume add-brick tiered-vol 10.0.0.5:/exports/cold-brick-3/dir/ volume add-brick: success ##### Mount the volume. # gluster volume set tiered-vol nfs.disable off volume set: success # mount 127.0.0.1:tiered-vol /mnt/tiered-vol/ ##### Create volumes on the volume, not tiered yet. # xfs_mkfile 1G /mnt/tiered-vol/file-1 # xfs_mkfile 1G /mnt/tiered-vol/file-2 # xfs_mkfile 1G /mnt/tiered-vol/file-3 # ls -lsh /mnt/tiered-vol/ total 3.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 # gluster volume info tiered-vol Volume Name: tiered-vol Type: Distribute Volume ID: 0639e4e4-249d-485c-9995-90aa8be9c94e Status: Started Snapshot Count: 0 Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.0.0.5:/exports/cold-brick-1/dir Brick2: 10.0.0.5:/exports/cold-brick-2/dir Brick3: 10.0.0.5:/exports/cold-brick-3/dir Options Reconfigured: transport.address-family: inet nfs.disable: off # gluster volume status tiered-vol Status of volume: tiered-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.0.5:/exports/cold-brick-1/dir 62002 0 Y 120790 Brick 10.0.0.5:/exports/cold-brick-2/dir 62003 0 Y 120929 Brick 10.0.0.5:/exports/cold-brick-3/dir 62004 0 Y 120978 NFS Server on localhost 2049 0 Y 121103 Task Status of Volume tiered-vol ------------------------------------------------------------------------------ Task : Rebalance ID : 13a856c2-f511-475c-b2ff-f6e0190ade50 Status : completed ##### Kill one of the brick processes, and note that the files on that brick disappear. This is normal, and expected. # kill 120929 # ls -lsh /mnt/tiered-vol/ total 2.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 ##### Start the brick processes, and see that all files are back. # gluster volume start tiered-vol force volume start: tiered-vol: success # ls -lsh /mnt/tiered-vol/ total 3.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 ##### Attach the hot tier, and create new files that are stored there. # gluster volume tier tiered-vol attach 10.0.0.5:/exports/hot-brick-1/dir 10.0.0.5:/exports/hot-brick-2/dir 10.0.0.5:/exports/hot-brick-3/dir volume attach-tier: success # xfs_mkfile 1G /mnt/tiered-vol/file-hot-1 # xfs_mkfile 1G /mnt/tiered-vol/file-hot-2 # xfs_mkfile 1G /mnt/tiered-vol/file-hot-3 # ls -lsh /mnt/tiered-vol/ total 6.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 0 ---------T 2 root root 0 Jan 28 08:57 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 0 ---------T 2 root root 0 Jan 28 08:58 /exports/cold-brick-3/dir/file-hot-2 0 ---------T 2 root root 0 Jan 28 08:58 /exports/cold-brick-3/dir/file-hot-3 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-3/dir/file-hot-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:59 /exports/hot-brick-3/dir/file-hot-3 # gluster volume status tiered-vol Status of volume: tiered-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.0.0.5:/exports/hot-brick-3/dir 62007 0 Y 127766 Brick 10.0.0.5:/exports/hot-brick-2/dir 62006 0 Y 127744 Brick 10.0.0.5:/exports/hot-brick-1/dir 62003 0 Y 127722 Cold Bricks: Brick 10.0.0.5:/exports/cold-brick-1/dir 62002 0 Y 120790 Brick 10.0.0.5:/exports/cold-brick-2/dir 62005 0 Y 123087 Brick 10.0.0.5:/exports/cold-brick-3/dir 62004 0 Y 120978 Tier Daemon on localhost N/A N/A Y 127804 NFS Server on localhost 2049 0 Y 127795 ##### Kill a brick process for the distributed hot tier volume. See that the files stored there cannot be accessed. This is normal and expected. This is a case where things worked as expected. # kill 127744 # ls -lsh /mnt/tiered-vol/ ls: cannot access /mnt/tiered-vol/file-hot-1: No such file or directory total 5.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 ? ?????????? ? ? ? ? ? file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 # ls -lsh /mnt/tiered-vol/ ls: cannot access /mnt/tiered-vol/file-hot-1: No such file or directory total 5.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 ? ?????????? ? ? ? ? ? file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 ##### Start the hot tier brick process, and note that all files are back. # gluster volume start tiered-vol force volume start: tiered-vol: success # ls -lsh /mnt/tiered-vol/ total 6.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 0 ---------T 2 root root 0 Jan 28 08:57 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 0 ---------T 2 root root 0 Jan 28 08:58 /exports/cold-brick-3/dir/file-hot-2 0 ---------T 2 root root 0 Jan 28 08:58 /exports/cold-brick-3/dir/file-hot-3 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-3/dir/file-hot-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:59 /exports/hot-brick-3/dir/file-hot-3 # gluster volume status tiered-vol Status of volume: tiered-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.0.0.5:/exports/hot-brick-3/dir 62007 0 Y 127766 Brick 10.0.0.5:/exports/hot-brick-2/dir 62010 0 Y 130185 Brick 10.0.0.5:/exports/hot-brick-1/dir 62003 0 Y 127722 Cold Bricks: Brick 10.0.0.5:/exports/cold-brick-1/dir 62002 0 Y 120790 Brick 10.0.0.5:/exports/cold-brick-2/dir 62005 0 Y 123087 Brick 10.0.0.5:/exports/cold-brick-3/dir 62004 0 Y 120978 Tier Daemon on localhost N/A N/A Y 127804 NFS Server on localhost 2049 0 Y 130217 ##### Kill another brick process for the distributed hot tier volume. See that the files stored there cannot be accessed. The first 'ls' shows the missing files, but the second one does not. This time the files will *not* come back when the brick is restored. This is a problem. # kill 127766 # ls -lsh /mnt/tiered-vol/ ls: cannot access /mnt/tiered-vol/file-hot-2: No such file or directory ls: cannot access /mnt/tiered-vol/file-hot-3: No such file or directory total 4.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 ? ?????????? ? ? ? ? ? file-hot-2 ? ?????????? ? ? ? ? ? file-hot-3 # ls -lsh /mnt/tiered-vol/ total 4.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 ##### Restore the failed brick, but note that the files on it are still gone. They still exist on the bricks though. # gluster volume start tiered-vol force volume start: tiered-vol: success # ls -lsh /mnt/tiered-vol/ total 4.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 0 ---------T 2 root root 0 Jan 28 08:57 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-3/dir/file-hot-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:59 /exports/hot-brick-3/dir/file-hot-3 ##### Accessing the missing files by their full path sometimes brings them back, but not in this case. # ls -lsh /mnt/tiered-vol/file-hot-2 ls: cannot access /mnt/tiered-vol/file-hot-2: No such file or directory # file /mnt/tiered-vol/file-hot-2 /mnt/tiered-vol/file-hot-2: cannot open `/mnt/tiered-vol/file-hot-2' (No such file or directory) # ls -lsh /mnt/tiered-vol/ total 4.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 ##### Stopping and starting the volume does not help. # gluster volume stop tiered-vol Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: tiered-vol: success # gluster volume start tiered-vol volume start: tiered-vol: success # ls -lsh /mnt/tiered-vol/ total 4.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 0 ---------T 2 root root 0 Jan 28 08:57 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-3/dir/file-hot-2 1.1G -rw------- 2 root root 1.0G Jan 28 08:59 /exports/hot-brick-3/dir/file-hot-3 ##### Detaching the hot tier does usually bring the missing files back. # gluster volume tier tiered-vol detach start volume detach tier start: success ID: cec68278-f0b9-4289-81ab-6f3a60246c3e # gluster volume tier tiered-vol detach status volume detach tier status: success Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 7 0 0 in progress 0:00:21 # ls -lsh /mnt/tiered-vol/ total 6.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 1.0G ---------T 2 root root 1.0G Jan 28 09:09 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 1.0G ---------T 2 root root 1.0G Jan 28 09:09 /exports/cold-brick-3/dir/file-hot-2 1.0G ---------T 2 root root 1.0G Jan 28 09:09 /exports/cold-brick-3/dir/file-hot-3 1.1G -rw---S--T 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-2/dir/file-hot-1 1.1G -rw---S--T 2 root root 1.0G Jan 28 08:58 /exports/hot-brick-3/dir/file-hot-2 1.1G -rw---S--T 2 root root 1.0G Jan 28 08:59 /exports/hot-brick-3/dir/file-hot-3 # gluster volume tier tiered-vol detach status volume detach tier status: success Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 3 3.0GB 7 0 0 completed 0:01:25 # gluster volume tier tiered-vol detach commit volume detach tier commit: success # ls -lsh /mnt/tiered-vol/ total 6.0G 1.0G -rw------- 1 root root 1.0G Jan 28 08:49 file-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:50 file-3 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-1 1.0G -rw------- 1 root root 1.0G Jan 28 08:58 file-hot-2 1.0G -rw------- 1 root root 1.0G Jan 28 08:59 file-hot-3 # ls -lsh /exports/*brick*/dir/* 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-2/dir/file-3 1.0G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/cold-brick-2/dir/file-hot-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:49 /exports/cold-brick-3/dir/file-1 1.1G -rw------- 2 root root 1.0G Jan 28 08:50 /exports/cold-brick-3/dir/file-2 1.0G -rw------- 2 root root 1.0G Jan 28 08:58 /exports/cold-brick-3/dir/file-hot-2 1.0G -rw------- 2 root root 1.0G Jan 28 08:59 /exports/cold-brick-3/dir/file-hot-3 EOM
Patch https://review.gluster.org/#/c/glusterfs/+/21331/ removes tier functionality from GlusterFS. Recommendation is to convert your tier volume to regular volume (either replicate, ec, or plain distribute) with "tier detach" command before upgrade, and use backend features like dm-cache etc to utilize the caching from backend to provide better performance and functionality.
(In reply to hari gowtham from comment #1) > Patch https://review.gluster.org/#/c/glusterfs/+/21331/ removes tier > functionality from GlusterFS. Therefore, CLOSE-WONTFIX this one.