1278270 – [Tier]: "failed to reset target size back to 0" errors in tier logs while performing rename ops

Bug 1278270 - [Tier]: "failed to reset target size back to 0" errors in tier logs while performing rename ops

Summary: [Tier]: "failed to reset target size back to 0" errors in tier logs while per...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1260783 1260923
TreeView+	depends on / blocked

Reported:	2015-11-05 06:52 UTC by Rahul Hinduja
Modified:	2016-09-17 15:41 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.5-13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-03-01 05:51:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description Rahul Hinduja 2015-11-05 06:52:16 UTC

Description of problem:
=======================

While renames are in-progress from client, tier logs records following continuous messages:


[2015-11-05 06:44:03.199205] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-tiervolume-tier-dht: ERROR -22 in current migration 563a464d%%U6G090S2MV /thread2/level01/level11/level21/level31/level41/level51/563a464d%%U6G090S2MV

[2015-11-05 06:44:03.238177] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-18: remote operation failed [Invalid argument]
[2015-11-05 06:44:03.238701] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-19: remote operation failed [Invalid argument]
[2015-11-05 06:44:03.239626] E [MSGID: 109023] [dht-rebalance.c:598:__dht_rebalance_create_dst_file] 0-tiervolume-tier-dht: ftruncate failed for /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG on tiervolume-hot-dht (Invalid argument)
[2015-11-05 06:44:03.323548] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-tiervolume-client-19: remote operation failed [Bad file descriptor]
[2015-11-05 06:44:03.330006] W [MSGID: 114031] [client-rpc-fops.c:904:client3_3_writev_cbk] 0-tiervolume-client-18: remote operation failed [Bad file descriptor]
[2015-11-05 06:44:05.183978] W [dht-rebalance.c:114:dht_write_with_holes] 0-tiervolume-tier-dht: failed to write (Bad file descriptor)
[2015-11-05 06:44:05.184050] E [MSGID: 109023] [dht-rebalance.c:1337:dht_migrate_file] 0-tiervolume-tier-dht: Migrate file failed: /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG: failed to migrate data
[2015-11-05 06:44:05.197632] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-19: remote operation failed [Invalid argument]
[2015-11-05 06:44:05.197714] W [MSGID: 114031] [client-rpc-fops.c:1512:client3_3_ftruncate_cbk] 0-tiervolume-client-18: remote operation failed [Invalid argument]
[2015-11-05 06:44:05.199015] E [MSGID: 109023] [dht-rebalance.c:1587:dht_migrate_file] 0-tiervolume-tier-dht: Migrate file failed: /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG: failed to reset target size back to 0 [Invalid argument]
[2015-11-05 06:44:05.201792] E [MSGID: 109037] [tier.c:523:tier_migrate_using_query_file] 0-tiervolume-tier-dht: ERROR -22 in current migration 563a4674%%UHH1Y238FG /thread2/level01/level11/level21/level31/level41/level51/level61/563a4674%%UHH1Y238FG



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-5.el7rhgs.x86_64


How reproducible:
=================

happening frequently


Steps to Reproduce:
===================
1. Create and start tier volume with cold tier {2x(4+2)} and hot tier {6x2}
2. mount the volume and start creating data
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/
3. Perform ops like chmod,chown,chgrp,symlink,truncate from client on the existing data
4. Perform rename operation on existing data

Actual results:
===============

Lots of following errors are observed:

Migrate file failed
remote operation failed [Invalid argument]
ftruncate failed 
remote operation failed [Bad file descriptor]

Comment 5 Mohamed Ashiq 2015-11-24 11:07:12 UTC

I was not able to reproduce. spoke with  Rahul Hinduja, He suggested to use a tool crefi to do operations on files. Now trying it.

Comment 6 Mohamed Ashiq 2015-11-26 06:16:08 UTC

I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps, but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I am trying to find the root cause for the problem and why it is not reproducible in latest.

Comment 7 Rahul Hinduja 2015-11-26 06:39:37 UTC

(In reply to Mohamed Ashiq from comment #6)
> I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps,
> but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I
> am trying to find the root cause for the problem and why it is not
> reproducible in latest.

With the latest build watermarks are enables by default. The failure reported in the bug is during migration. Please check if promotes/demotes are happening. If the hot tier has lots of space, please make use of options like "cluster.watermark-hi" and "cluster.watermark-low" OR use test mode: "cluster.tier-mode test"

Comment 8 Mohamed Ashiq 2015-11-27 15:26:39 UTC

(In reply to Rahul Hinduja from comment #7)
> (In reply to Mohamed Ashiq from comment #6)
> > I am able to reproduce this bug in glusterfs-3.7.5-5 with the above steps,
> > but I am not able to reproduce this bug in latest build glusterfs-3.7.5-7. I
> > am trying to find the root cause for the problem and why it is not
> > reproducible in latest.
> 
> With the latest build watermarks are enables by default. The failure
> reported in the bug is during migration. Please check if promotes/demotes
> are happening. If the hot tier has lots of space, please make use of options
> like "cluster.watermark-hi" and "cluster.watermark-low" OR use test mode:
> "cluster.tier-mode test"

It was not reproducible in 3.7.5-7 since the watermarks are enabled by default. By keeping the cluster.tier-mode test as suggested by Rahul, I am able to reproduce this bug. Its because when tier tries to migrate file, posix_ftruncate fails with EINVAL.
Nithya has filed a bug[1] to address the issue and has a patch sent upstream[2].
I applied the patch and tried reproducing the issue. I am not able to see the logs anymore.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1284823
[2] http://review.gluster.org/12750

could you please check the same.

Comment 10 Rahul Hinduja 2015-12-29 09:13:03 UTC

Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64

Performed create,chmod,chown,chgroup,symlink,truncate,rename. No errors related to "failed to reset" logged. Moving the bug to verified state. 

[root@dhcp37-165 glusterfs]# grep -i "failed to reset" vol0-tier.log 
[root@dhcp37-165 glusterfs]#

Comment 13 errata-xmlrpc 2016-03-01 05:51:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.