Description of problem: ======================= Created an ec volume (8+4) and attached a rep 2 tier volume. Created 1000 files and waited till demotion of all files. Repeated the step till 5000 files. During file creation, brought down 2 of the cold tier bricks and all the files got healed. Some of the files didn't get demoted and file type shows as DBase 3 data file while it should be a data file. Checked the gfid's of the file on cold and hot tier and both are different. The file sizes are also different. [root@transformers ~]# file /rhs/brick1/b1/files/testfile.5241 /rhs/brick1/b1/files/testfile.5241: sticky data [root@transformers ~]# file /rhs/brick12/vol1-tier1/files/testfile.5241 /rhs/brick12/vol1-tier1/files/testfile.5241: DBase 3 data file [root@transformers ~]# [root@transformers ~]# ls -lh /rhs/brick1/b1/files/testfile.5241 ---------T. 2 root root 2.6M Oct 23 12:12 /rhs/brick1/b1/files/testfile.5241 [root@transformers ~]# ls -lh /rhs/brick12/vol1-tier1/files/testfile.5241 -rw-r--r--. 2 root root 1.0M Oct 26 14:40 /rhs/brick12/vol1-tier1/files/testfile.5241 [root@transformers ~]# [root@transformers ~]# getfattr -d -e hex -m. /rhs/brick1/b1/files/testfile.5241 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/b1/files/testfile.5241 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x0200000000000000562728e200093316 trusted.ec.config=0x0000080c04000200 trusted.ec.size=0x0000000001479000 trusted.ec.version=0x00000000000005870000000000000b0e trusted.gfid=0x91acb23edeb04945a8fc73023d897e41 trusted.pgfid.3eec8567-4b3b-4890-8858-55142006e8e7=0x00000001 trusted.tier-gfid.linkto=0x766f6c312d686f742d64687400 [root@transformers ~]# getfattr -d -e hex -m. /rhs/brick12/vol1-tier1/files/testfile.5241 getfattr: Removing leading '/' from absolute path names # file: rhs/brick12/vol1-tier1/files/testfile.5241 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x020000000000000056273ba7000c2168 trusted.gfid=0xa5caddeda04441faae826fa3edaff790 trusted.glusterfs.quota.3eec8567-4b3b-4890-8858-55142006e8e7.contri=0x00000000001000000000000000000001 trusted.pgfid.3eec8567-4b3b-4890-8858-55142006e8e7=0x00000001 [root@transformers ~]# Version-Release number of selected component (if applicable): ============================================================= 3.7.5.0-3 How reproducible: ================= Tried once Steps to Reproduce: =================== As in description Actual results: =============== Data corruption Expected results: ================= No data corruption Additional info: ================ sosreports will be copied to rhsqe-repo/sosreports/<bugid>
I tried to reproduce this on cold/hot : 1 x (4 + 2) / 3 x 2 1. create 5000 50K files 2. in parallel, killed two cold EC bricks 3. waited for all files to demote 4. Observed they all demoted successfully. Observed their size/type was correct. There have been many EC related fixes for tiering since this bug was open. Can QE also try to reproduce it again, if we both are unable we should close it. Otherwise, we should exchange information on how to reproduce the problem.
I have tried couple of times reproducing this but not successful. I am closing this bug for now. Will reopen if its seen again.
Note, this can be a "potential risk" for the feature