Description of problem: ======================= While renames are in-progress from client, tier logs records following continuous messages: [2015-11-05 06:44:03.199205] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-tiervolume-tier-dht: ERROR -22 in current migration 563a464d%%U6G090S2MV /thread2/level01/level11/level21/level31/level41/level51/563a464d%%U6G090S2MV [2015-11-05 06:44:03.238177] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-18: remote operation failed [Invalid argument] [2015-11-05 06:44:03.238701] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-19: remote operation failed [Invalid argument] [2015-11-05 06:44:03.239626] E [MSGID: 109023] [dht-rebalance.c:598:__dht_rebalance_create_dst_file] 0-tiervolume-tier-dht: ftruncate failed for /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG on tiervolume-hot-dht (Invalid argument) [2015-11-05 06:44:03.323548] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-tiervolume-client-19: remote operation failed [Bad file descriptor] [2015-11-05 06:44:03.330006] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-tiervolume-client-18: remote operation failed [Bad file descriptor] [2015-11-05 06:44:05.183978] W [dht-rebalance.c:114:dht_write_with_holes] 0-tiervolume-tier-dht: failed to write (Bad file descriptor) [2015-11-05 06:44:05.184050] E [MSGID: 109023] [dht-rebalance.c:1337:dht_migrate_file] 0-tiervolume-tier-dht: Migrate file failed: /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG: failed to migrate data [2015-11-05 06:44:05.197632] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-19: remote operation failed [Invalid argument] [2015-11-05 06:44:05.197714] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-18: remote operation failed [Invalid argument] [2015-11-05 06:44:05.199015] E [MSGID: 109023] [dht-rebalance.c:1587:dht_migrate_file] 0-tiervolume-tier-dht: Migrate file failed: /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG: failed to reset target size back to 0 [Invalid argument] [2015-11-05 06:44:05.201792] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-tiervolume-tier-dht: ERROR -22 in current migration 563a4674%%UHH1Y238FG /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-5.el7rhgs.x86_64 How reproducible: ================= happening frequently Steps to Reproduce: =================== 1. Create and start tier volume with cold tier {2x(4+2)} and hot tier {6x2} 2. mount the volume and start creating data [root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/ 3. Perform ops like chmod,chown,chgrp,symlink,truncate from client on the existing data 4. Perform rename operation on existing data Actual results: =============== Lots of following errors are observed: Migrate file failed remote operation failed [Invalid argument] ftruncate failed remote operation failed [Bad file descriptor]
I was not able to reproduce. spoke with Rahul Hinduja, He suggested to use a tool crefi to do operations on files. Now trying it.
I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps, but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I am trying to find the root cause for the problem and why it is not reproducible in latest.
(In reply to Mohamed Ashiq from comment #6) > I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps, > but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I > am trying to find the root cause for the problem and why it is not > reproducible in latest. With the latest build watermarks are enables by default. The failure reported in the bug is during migration. Please check if promotes/demotes are happening. If the hot tier has lots of space, please make use of options like "cluster.watermark-hi" and "cluster.watermark-low" OR use test mode: "cluster.tier-mode test"
(In reply to Rahul Hinduja from comment #7) > (In reply to Mohamed Ashiq from comment #6) > > I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps, > > but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I > > am trying to find the root cause for the problem and why it is not > > reproducible in latest. > > With the latest build watermarks are enables by default. The failure > reported in the bug is during migration. Please check if promotes/demotes > are happening. If the hot tier has lots of space, please make use of options > like "cluster.watermark-hi" and "cluster.watermark-low" OR use test mode: > "cluster.tier-mode test" It was not reproducible in 3.7.5-7 since the watermarks are enabled by default. By keeping the cluster.tier-mode test as suggested by Rahul, I am able to reproduce this bug. Its because when tier tries to migrate file, posix_ftruncate fails with EINVAL. Nithya has filed a bug[1] to address the issue and has a patch sent upstream[2]. I applied the patch and tried reproducing the issue. I am not able to see the logs anymore. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1284823 [2] http://review.gluster.org/12750 could you please check the same.
Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64 Performed create,chmod,chown,chgroup,symlink,truncate,rename. No errors related to "failed to reset" logged. Moving the bug to verified state. [root@dhcp37-165 glusterfs]# grep -i "failed to reset" vol0-tier.log [root@dhcp37-165 glusterfs]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html