Description of problem: Have a tiered volume and bit rot enabled on that. When ever the file gets signed using bitd daemon the file gets promoted and demoted and this happens in a continuous loop. After some time i see that the files in the cold tier does not have bit rot version and signature. Version-Release number of selected component (if applicable): glusterfs-3.7.5-5.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. Create a tiered volume with cold tier being EC and hot tier being replicate volume. 2. Enable bit rot on the volume. 3. Set the promote and demote frequency as 360 4. Fuse mount the volume and create some files. Actual results: promotions and demotions of the files happens in a continous loop and after some time, files in the cold tier does not have bit rot version and signature. Expected results: Bit rot version and signature should not be missing from files. Additional info:
SOS Reports are present at the link below: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1277368/
After doing some investigation we (Kasturi and me) found the following observation, 1) bitd calculate signature for linkto file on the hot tier and also remembers the previous bit-rot.version before the file migration to the cold tier. 2) when a file gets migrated to a tier, immediate write io's are considers as part of the migration and there is no version change until the version timeout. (the point 2 is not a problem but just a observation) Proof: ===== Setup : ======= [root@fedora1 test]# gluster volume info Volume Name: test Type: Tier Volume ID: 888f73b8-b5bc-4f0f-91ba-bf8dd39884d5 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: fedora1:/home/ssd/small_brick3/s3 Brick2: fedora1:/home/ssd/small_brick2/s2 Brick3: fedora1:/home/ssd/small_brick1/s1 Brick4: fedora1:/home/ssd/small_brick0/s0 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: fedora1:/home/disk/d1 Brick6: fedora1:/home/disk/d2 Brick7: fedora1:/home/disk/d3 Brick8: fedora1:/home/disk/d4 Options Reconfigured: features.scrub: Active features.bitrot: on features.record-counters: on features.ctr-enabled: on performance.readdir-ahead: on [root@fedora1 test]# Create a file called "file1" [root@fedora1 test]# echo "hello world" > file1 This is how the bricks look after the creation of the file [root@fedora1 test]# ls -l /home/disk/d* /home/ssd/small_brick*/s* /home/disk/d1: total 0 /home/disk/d2: total 0 /home/disk/d3: total 0 /home/disk/d4: total 0 /home/ssd/small_brick0/s0: total 8 -rw-r--r-- 2 root root 12 Nov 11 14:46 file1 /home/ssd/small_brick1/s1: total 8 -rw-r--r-- 2 root root 12 Nov 11 14:46 file1 /home/ssd/small_brick2/s2: total 0 /home/ssd/small_brick3/s3: total 0 [root@fedora1 test]# and this is the bit-rot version, Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 Wed Nov 11 14:47:41 2015 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005643068900006547 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 After 2 mins we see the signature for the bit-rot-version 2 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.signature=0x010200000000000000a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447 trusted.bit-rot.version=0x02000000000000005643068900006547 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 Let write some more data to bump-up the version, (The file is still in the hot tier) echo "hello world" >> file1 This is the signature for the version 3 Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 Wed Nov 11 14:51:43 2015 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.signature=0x010300000000000000ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847 trusted.bit-rot.version=0x03000000000000005643068900006547 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 Now let the file get demoted to cold tier, And this is now the bricks look after the demotion [root@fedora1 test]# ls -l /home/disk/d* /home/ssd/small_brick*/s*/home/disk/d1: total 0 /home/disk/d2: total 0 /home/disk/d3: total 8 -rw-r--r-- 2 root root 24 Nov 11 14:49 file1 /home/disk/d4: total 8 -rw-r--r-- 2 root root 24 Nov 11 14:49 file1 /home/ssd/small_brick0/s0: total 0 ---------T 2 root root 0 Nov 11 14:52 file1 /home/ssd/small_brick1/s1: total 0 ---------T 2 root root 0 Nov 11 14:52 file1 /home/ssd/small_brick2/s2: total 0 /home/ssd/small_brick3/s3: total 0 [root@fedora1 test]# The hot tier has the linkto file and cold tier as the actual file. And this is the xattrs on the linkto file and actual file immediately after the demotion. getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 trusted.tier.tier-dht.linkto=0x746573742d636f6c642d64687400 Every 1.0s: getfattr -d -m . -e hex /home/disk/d3/* Wed Nov 11 14:52:49 2015 getfattr: Removing leading '/' from absolute path names # file: home/disk/d3/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005643067600008eeb trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 Observe that the linkto file has no version or signature now and the actual file has the new fresh version 2 This is how the xattrs look like after the signing of the files by bitd Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 Wed Nov 11 14:54:08 2015 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.bit-rot.signature=0x010100000000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 trusted.tier.tier-dht.linkto=0x746573742d636f6c642d64687400 Every 1.0s: getfattr -d -m . -e hex /home/disk/d3/* Wed Nov 11 14:54:38 2015 getfattr: Removing leading '/' from absolute path names # file: home/disk/d3/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.signature=0x010200000000000000ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847 trusted.bit-rot.version=0x02000000000000005643067600008eeb trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 Please observe the xattrs on the linkto file on the hot tier, 1) there is not version on it 2) but there is a signature for the version 3 ! which was the version on the file when it was last on hot tier 3) the signature "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" on the linkto file is same as the checksum we calculated directly using sha256sum [root@fedora1 ~]# sha256sum /home/ssd/small_brick0/s0/file1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 /home/ssd/small_brick0/s0/file1 But different than the actual file which is "ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847" [root@fedora1 ~]# sha256sum /home/disk/d3/file1 ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847 /home/disk/d3/file1 now lets heat up the file [root@fedora1 test]# echo "hello world" >> file1 [root@fedora1 test]# [root@fedora1 test]# [root@fedora1 test]# echo "hello world" >> file1 [root@fedora1 test]# and let it promote to the hot tier [root@fedora1 test]# ls -l /home/disk/d* /home/ssd/small_brick*/s* /home/disk/d1: total 0 /home/disk/d2: total 0 /home/disk/d3: total 0 /home/disk/d4: total 0 /home/ssd/small_brick0/s0: total 8 -rw-r--r-- 2 root root 48 Nov 11 14:55 file1 /home/ssd/small_brick1/s1: total 8 -rw-r--r-- 2 root root 48 Nov 11 14:55 file1 /home/ssd/small_brick2/s2: total 0 /home/ssd/small_brick3/s3: total 0 [root@fedora1 test]# Now observe the signature and version on the file in the hot tier immediately after the promotion. We have a signature of linkto file of version 3 and the incremented version 4 from the previous stale version. Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 Wed Nov 11 18:04:06 2015 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.signature=0x010300000000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 trusted.bit-rot.version=0x04000000000000005643068900006547 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 After some time bitd signs the file with version 4. Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 Wed Nov 11 18:06:03 2015 getfattr: Removing leading '/' from absolute path names # file: home/ssd/small_brick0/s0/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.signature=0x010400000000000000e54e4db5a177bd4d986796a020f202ec0f90f4aef037769bda181efd79cff9b2 trusted.bit-rot.version=0x04000000000000005643068900006547 trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4
(In reply to Joseph Elwin Fernandes from comment #5) > After doing some investigation we (Kasturi and me) found the following > observation, > 1) bitd calculate signature for linkto file on the hot tier and also > remembers the previous bit-rot.version before the file migration to the cold > tier. > 2) when a file gets migrated to a tier, immediate write io's are considers > as part of the migration and there is no version change until the version > timeout. > (the point 2 is not a problem but just a observation) > > Proof: > ===== > > Setup : > ======= > [root@fedora1 test]# gluster volume info > > Volume Name: test > Type: Tier > Volume ID: 888f73b8-b5bc-4f0f-91ba-bf8dd39884d5 > Status: Started > Number of Bricks: 8 > Transport-type: tcp > Hot Tier : > Hot Tier Type : Distributed-Replicate > Number of Bricks: 2 x 2 = 4 > Brick1: fedora1:/home/ssd/small_brick3/s3 > Brick2: fedora1:/home/ssd/small_brick2/s2 > Brick3: fedora1:/home/ssd/small_brick1/s1 > Brick4: fedora1:/home/ssd/small_brick0/s0 > Cold Tier: > Cold Tier Type : Distributed-Replicate > Number of Bricks: 2 x 2 = 4 > Brick5: fedora1:/home/disk/d1 > Brick6: fedora1:/home/disk/d2 > Brick7: fedora1:/home/disk/d3 > Brick8: fedora1:/home/disk/d4 > Options Reconfigured: > features.scrub: Active > features.bitrot: on > features.record-counters: on > features.ctr-enabled: on > performance.readdir-ahead: on > [root@fedora1 test]# > > > Create a file called "file1" > > [root@fedora1 test]# echo "hello world" > file1 > > This is how the bricks look after the creation of the file > > [root@fedora1 test]# ls -l /home/disk/d* /home/ssd/small_brick*/s* > /home/disk/d1: > total 0 > > /home/disk/d2: > total 0 > > /home/disk/d3: > total 0 > > /home/disk/d4: > total 0 > > /home/ssd/small_brick0/s0: > total 8 > -rw-r--r-- 2 root root 12 Nov 11 14:46 file1 > > /home/ssd/small_brick1/s1: > total 8 > -rw-r--r-- 2 root root 12 Nov 11 14:46 file1 > > /home/ssd/small_brick2/s2: > total 0 > > /home/ssd/small_brick3/s3: > total 0 > [root@fedora1 test]# > > and this is the bit-rot version, > > Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 > Wed Nov 11 14:47:41 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x02000000000000005643068900006547 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > > > After 2 mins we see the signature for the bit-rot-version 2 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot. > signature=0x010200000000000000a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec > 0fb85d299a192a447 > trusted.bit-rot.version=0x02000000000000005643068900006547 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > > > Let write some more data to bump-up the version, (The file is still in the > hot tier) > > echo "hello world" >> file1 > > This is the signature for the version 3 > > Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 > Wed Nov 11 14:51:43 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot. > signature=0x010300000000000000ec498a36221dd860c6f24ea26cb29cec68a38479496f78e > 54ce35f34c8106847 > trusted.bit-rot.version=0x03000000000000005643068900006547 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 Everything is good till here. > > Now let the file get demoted to cold tier, > And this is now the bricks look after the demotion > > [root@fedora1 test]# ls -l /home/disk/d* > /home/ssd/small_brick*/s*/home/disk/d1: > total 0 > > /home/disk/d2: > total 0 > > /home/disk/d3: > total 8 > -rw-r--r-- 2 root root 24 Nov 11 14:49 file1 > > /home/disk/d4: > total 8 > -rw-r--r-- 2 root root 24 Nov 11 14:49 file1 > > /home/ssd/small_brick0/s0: > total 0 > ---------T 2 root root 0 Nov 11 14:52 file1 > > /home/ssd/small_brick1/s1: > total 0 > ---------T 2 root root 0 Nov 11 14:52 file1 > > /home/ssd/small_brick2/s2: > total 0 > > /home/ssd/small_brick3/s3: > total 0 > [root@fedora1 test]# > > The hot tier has the linkto file and cold tier as the actual file. > And this is the xattrs on the linkto file and actual file immediately after > the demotion. > > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > trusted.tier.tier-dht.linkto=0x746573742d636f6c642d64687400 The data object gets converted to a link-to file. The xattrs got removed here. I guess the code migrates the object and does a ftruncate() followed by some setattr() calls. The ftruncate() should have resulted in version getting incremented. Here the version and signature xattrs are missing. Need to examine why. > > > Every 1.0s: getfattr -d -m . -e hex /home/disk/d3/* > Wed Nov 11 14:52:49 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/disk/d3/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x02000000000000005643067600008eeb > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > > Observe that the linkto file has no version or signature now and the actual > file has the new fresh version 2 This is fine. > > This is how the xattrs look like after the signing of the files by bitd > > Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 > Wed Nov 11 14:54:08 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.bit-rot. > signature=0x010100000000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934 > ca495991b7852b855 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > trusted.tier.tier-dht.linkto=0x746573742d636f6c642d64687400 Normallly versioning would start with version = 2, it's kind of fish here in two ways: One, the version xattr is missing (it was already missing when the data object got converted to link-to). Second, the signature xattr shows up from nowhere with version = 1. This is fishy as the starting version for an object is 2. > > > Every 1.0s: getfattr -d -m . -e hex /home/disk/d3/* > Wed Nov 11 14:54:38 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/disk/d3/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot. > signature=0x010200000000000000ec498a36221dd860c6f24ea26cb29cec68a38479496f78e > 54ce35f34c8106847 > trusted.bit-rot.version=0x02000000000000005643067600008eeb > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 This is fine. > > Please observe the xattrs on the linkto file on the hot tier, > 1) there is not version on it > 2) but there is a signature for the version 3 ! which was the version on the > file when it was last on hot tier > 3) the signature > "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" > on the linkto file is same as the checksum we calculated directly using > sha256sum > [root@fedora1 ~]# sha256sum /home/ssd/small_brick0/s0/file1 > e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 > /home/ssd/small_brick0/s0/file1 > But different than the actual file which is > "ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847" > [root@fedora1 ~]# sha256sum /home/disk/d3/file1 > ec498a36221dd860c6f24ea26cb29cec68a38479496f78e54ce35f34c8106847 > /home/disk/d3/file1 > > > now lets heat up the file Before debugging further, the two "unknowns" relating the link-to needs to be solved. > > [root@fedora1 test]# echo "hello world" >> file1 > [root@fedora1 test]# > [root@fedora1 test]# > [root@fedora1 test]# echo "hello world" >> file1 > [root@fedora1 test]# > > and let it promote to the hot tier > > [root@fedora1 test]# ls -l /home/disk/d* /home/ssd/small_brick*/s* > /home/disk/d1: > total 0 > > /home/disk/d2: > total 0 > > /home/disk/d3: > total 0 > > /home/disk/d4: > total 0 > > /home/ssd/small_brick0/s0: > total 8 > -rw-r--r-- 2 root root 48 Nov 11 14:55 file1 > > /home/ssd/small_brick1/s1: > total 8 > -rw-r--r-- 2 root root 48 Nov 11 14:55 file1 > > /home/ssd/small_brick2/s2: > total 0 > > /home/ssd/small_brick3/s3: > total 0 > [root@fedora1 test]# > > Now observe the signature and version on the file in the hot tier > immediately after the promotion. We have a signature of linkto file of > version 3 and the incremented version 4 from the previous stale version. > > Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 > Wed Nov 11 18:04:06 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot. > signature=0x010300000000000000e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934 > ca495991b7852b855 > trusted.bit-rot.version=0x04000000000000005643068900006547 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4 > > > After some time bitd signs the file with version 4. > > Every 1.0s: getfattr -d -m . -e hex /home/ssd/small_brick0/s0/file1 > Wed Nov 11 18:06:03 2015 > > getfattr: Removing leading '/' from absolute path names > # file: home/ssd/small_brick0/s0/file1 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot. > signature=0x010400000000000000e54e4db5a177bd4d986796a020f202ec0f90f4aef037769 > bda181efd79cff9b2 > trusted.bit-rot.version=0x04000000000000005643068900006547 > trusted.gfid=0xb2cb50645c644b1995299097bd8e44e4
[Commenting on behalf of Johnny, who is busy with other things] So, this is what Johnny (rabhat@) found in the QA cluster: the file which had bitrot extended attributes missing had the .glusterfs linkage missing. Bitrot relies on GFID based operations for it's correct functioning. This looks like the likely cause for the absence of the xattrs on the file. By the looks of it (give the above cause), this does not look like a bug directly related to bitrot. However, how did the file end up in such a state is currently unknown.
I looked into a fresh setup provided by QE and was able to see the bitrot anomaly once in several runs. The issue always happens in the hot tier and not in the cold tier. Before I could debug further, the file got migrated back to the cold tier and got the version/signature xattrs (cold tier). To debug further, I changed the promote/demote frequencies for quick promotion and delaying demotion by setting the value to 15 and 300 respectively which triggered something unusual - none of the files are getting promoted to the hot tier, but freshly created files do go in hot tier and get demoted eventually. Also, demotions no don't leave a link-to file in the hot tier and log files get filled with: /var/log/glusterfs/bricks/rhgs-brick6-b13.log:[2015-11-19 13:32:06.951477] I [MSGID: 115060] [server-rpc-fops.c:890:_gf_server_log_setxattr_failure] 0-vol1-server: 26221: SETXATTR /h11 (47389175-7d5d-48eb-a05a-36a69872defa) ==> trusted.tier.tier-dht.linkto Setting them back to 240/240 (which is what I got in the setup) seems to get the migrations working. So, my debugging start again...
> Setting them back to 240/240 (which is what I got in the setup) seems to get > the migrations working. So, my debugging start again... Ummm.. migrations are still stuck.
(In reply to Venky Shankar from comment #9) > > Setting them back to 240/240 (which is what I got in the setup) seems to get > > the migrations working. So, my debugging start again... > > Ummm.. migrations are still stuck. I get this is the tier log file (<volume>-tier.log) [2015-11-19 16:28:03.702276] I [MSGID: 109070] [dht-common.c:1840:dht_lookup_linkfile_cbk] 0-vol1-tier-dht: Lookup of //file13 on vol1-cold-dht (following linkfile) failed ,gfid = 2801b8d5-5646-487d-9597-08a6b3087e7c [Invalid argument] [2015-11-19 16:28:03.705075] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 0-vol1-tier-dht: Returned with op_ret -1 and op_errno 16 for //file13 [2015-11-19 16:28:03.705109] E [MSGID: 109037] [tier.c:418:tier_migrate_using_query_file] 0-vol1-tier-dht: Failed to do lookup on file file13 [2015-11-19 16:28:03.708875] I [MSGID: 109070] [dht-common.c:1840:dht_lookup_linkfile_cbk] 0-vol1-tier-dht: Lookup of //file13 on vol1-cold-dht (following linkfile) failed ,gfid = b3c92be7-80b5-4a64-b93d-e0f2e8b40325 [Invalid argument] [2015-11-19 16:28:03.711957] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 0-vol1-tier-dht: Returned with op_ret -1 and op_errno 16 for //file13 [2015-11-19 16:28:03.711999] E [MSGID: 109037] [tier.c:418:tier_migrate_using_query_file] 0-vol1-tier-dht: Failed to do lookup on file file13 [2015-11-19 16:28:03.990334] W [MSGID: 109023] [dht-rebalance.c:591:__dht_rebalance_create_dst_file] 0-vol1-tier-dht: //file13: failed to set xattr on vol1-hot-dht (Permission denied) [2015-11-19 16:28:04.337050] W [MSGID: 109023] [dht-rebalance.c:1462:dht_migrate_file] 0-vol1-tier-dht: Migrate file failed://file13: failed to set xattr on vol1-hot-dht (Permission denied) any clues?
I will have a look at this.
I just cross verified on the upstream build. The T file is left on the hot tier after demotion. Might be some fix didnt make it to downstream. To test with the current upstream please get the gluster volume set test cluster.watermark-low 1 this will make sure files get demoted for a small data set. Please refer the help of this volume set help for details on this option. Let me know if this helps
The downstream patch which prevents the linkfile from being demoted was merged recently. It should be available in the next build. (https://code.engineering.redhat.com/gerrit/#/c/61840/) Instead of setting cluster.watermark-low , you can use cluster.tier-mode to ignore watermarks. gluster volume set <volname> cluster.tier-mode test
(In reply to Nithya Balachandran from comment #14) > The downstream patch which prevents the linkfile from being demoted was > merged recently. It should be available in the next build. > (https://code.engineering.redhat.com/gerrit/#/c/61840/) Thanks! Another problem in the cluster was that when a file get's demoted to cold tier, further I/Os on the file does not migrate it to the hot tier with some lookup() failures in the tier log file as per Comment #10. If this a side effect or something else? > > > > Instead of setting cluster.watermark-low , you can use cluster.tier-mode to > ignore watermarks. > > gluster volume set <volname> cluster.tier-mode test
That needs to be analysed. Do these errors show up in the latest master where the linkto file is not deleted?
(In reply to Nithya Balachandran from comment #16) > That needs to be analysed. Do these errors show up in the latest master > where the linkto file is not deleted? I haven't tried tier with latest master. Johnny (rabhat@) was able to give it a run and found that files were not getting demoted. I guess the watermark option (or cluster.tier-mode) needs to be set for aggressive demotions. I'll try to give it a run in sometime.
This issue seems to be working with upstream build, seems the patch did not make to the last build which is now merged, moving this to modified.
Hi, Can you please add the link to the patch which has the fix for this issue ? Thanks kasturi
1) The problem of deleting the T file is solved by Nithya's patch, which she has mentioned in the bug, refer comment https://bugzilla.redhat.com/show_bug.cgi?id=1277368#c14 2) The patch for ignoring bit rot internal traffic by ctr : https://code.engineering.redhat.com/gerrit/60999 https://code.engineering.redhat.com/gerrit/61241
Verified and works fine with glusterfs-3.7.5-7.el7rhgs.x86_64. Did not observe bit rot version and signature being missed for files when promotions and demotions happen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html