1289228 – [Tiering] + [DHT] - Detach tier fails to migrate the files when there are corrupted objects in hot tier.

Bug 1289228 - [Tiering] + [DHT] - Detach tier fails to migrate the files when there are corrupted objects in hot tier.

Summary: [Tiering] + [DHT] - Detach tier fails to migrate the files when there are cor...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Bug Updates Notification Mailing List
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1290965 1293300
TreeView+	depends on / blocked

Reported:	2015-12-07 17:04 UTC by RamaKasturi
Modified:	2016-09-17 15:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.5-13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1290965 (view as bug list)
Environment:
Last Closed:	2016-03-01 06:01:53 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description RamaKasturi 2015-12-07 17:04:33 UTC

Description of problem:
When there are corrupted objects in hot tier, running detach tier on the volume fails to migrate the files.Detach tier should display a message saying there are some corrupted files, please recover them before performing detach tier.

When there is a corrupted file in one of the subvolume in replica pair in hot tier and another subvolume has a good copy, detach tier fails to migrate the good files to cold tier.Detach tier should migrate the files since there is a good copy of the file.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-9.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a tiered volume with both hot and cold tier as distribute replicate.
2. Mount the volume using NFS and create some data.
3. Edit the files from backend and wait for scrubber to mark it as corrupted files.
4. Now run the command 'gluster volume detach tier start' to detach the hot tier.

Actual results:
Detach tier does not demote the files to cold tier or does not complain anything about corrupted files. This will lead to data loss if user removes the tier by committing it.

Expected results:
Detach tier should complain about corrupted files or it should migrate the files since there is a good copy available in the other subvolume of replica pair.

Additional info:

Comment 2 RamaKasturi 2015-12-07 17:11:11 UTC

gluster vol info ouput:
===========================
[root@rhs-client2 ~]# gluster vol info vol1
 
Volume Name: vol1
Type: Tier
Volume ID: 385fdb1e-1034-40ca-9a14-e892e68b500b
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: rhs-client38:/bricks/brick3/h4
Brick2: rhs-client2:/bricks/brick3/h3
Brick3: rhs-client38:/bricks/brick2/h2
Brick4: rhs-client2:/bricks/brick2/h1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: rhs-client2:/bricks/brick0/c1
Brick6: rhs-client38:/bricks/brick0/c2
Brick7: rhs-client2:/bricks/brick1/c3
Brick8: rhs-client38:/bricks/brick1/c4
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
cluster.watermark-hi: 2
cluster.watermark-low: 1
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
performance.readdir-ahead: on



Files inside the bricks before detach tier:
=================================================
[root@rhs-client2 ~]# ls -l /bricks/brick*/h*
/bricks/brick2/h1:
total 4
-rw-r--r--. 2 root root 75 Dec  7 12:17 ff1

/bricks/brick3/h3:
total 1361028
-rw-r--r--. 2 root root         76 Dec  7 12:17 ff2
-rw-r--r--. 2 root root 1393688576 Dec  7 12:18 rhgsc-appliance005
[root@rhs-client2 ~]# getfattr -d -m . -e hex /bricks/brick2/h1/ff1
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick2/h1/ff1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x0102000000000000006fde5302afacc9901c00de8ffc8c7aeaa4ea094d7edfe0e216b094d3877660da
trusted.bit-rot.version=0x02000000000000005665763d00014924
trusted.gfid=0xc00e4e43eb5849618f0a0f37501f7613

[root@rhs-client2 ~]# getfattr -d -m . -e hex /bricks/brick3/h3/ff2
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick3/h3/ff2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.bad-file=0x3100
trusted.bit-rot.signature=0x0102000000000000000afb51faf10aa5d634c55290d8a3b579d1935c80c3c1a3f7f92fd812239c5ef8
trusted.bit-rot.version=0x02000000000000005665763d0001f921
trusted.gfid=0x810ff1ae3dd84d1a8422afa1283d2a78

[root@rhs-client2 ~]# ls -l /bricks/brick*/c*
/bricks/brick0/c1:
total 2722064
---------T. 2 root root          0 Dec  7 12:14 ff2
-rw-r--r--. 2 root root         21 Dec  4 12:21 file2_hot
-rw-r--r--. 2 root root          8 Dec  4 07:42 file3
-rw-r--r--. 2 root root          8 Dec  4 07:42 file4
-rw-r--r--. 2 root root         67 Dec  7 11:40 file_demote1
-rw-r--r--. 2 root root 1393688576 Dec  7 09:36 rhgsc-appliance004
---------T. 2 root root          0 Dec  7 12:14 rhgsc-appliance005
-rw-r--r--. 2 root root 1393688576 Dec  7 09:28 rhgsc-appliance-03

/bricks/brick1/c3:
total 4083100
---------T. 2 root root          0 Dec  7 12:14 ff1
-rw-r--r--. 2 root root        237 Dec  7 09:18 file1
-rw-r--r--. 2 root root        201 Dec  7 09:12 file2
-rw-r--r--. 2 root root         51 Dec  4 12:22 file2_hot1
-rw-r--r--. 2 root root          8 Dec  4 07:42 file5
-rw-r--r--. 2 root root          8 Dec  4 07:42 file6
-rw-r--r--. 2 root root         77 Dec  7 11:40 file_demote
-rw-r--r--. 2 root root         19 Dec  7 10:58 file_demote2
-rw-r--r--. 2 root root 1393688576 Dec  7 09:31 rhgsc-appliance00
-rw-r--r--. 2 root root 1393688576 Dec  7 09:26 rhgsc-appliance-02
-rw-r--r--. 2 root root 1393688576 Dec  4 12:06 rhgsc-appliance03


gluster volume detach-tier status after starting it:
==========================================================

[root@rhs-client2 ~]# gluster volume detach-tier vol1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes            19             0             0            completed               0.00
                             10.70.36.62                3         1.3GB             3             0             0            completed              42.00


Once the status says migration is completed, still bricks are seen in hot tier:
===============================================================================

[root@rhs-client2 ~]# ls -l /bricks/brick*/c*
/bricks/brick0/c1:
total 4083088
---------T. 2 root root          0 Dec  7 16:53 ff2
-rw-r--r--. 2 root root         21 Dec  4 12:21 file2_hot
-rw-r--r--. 2 root root          8 Dec  4 07:42 file3
-rw-r--r--. 2 root root          8 Dec  4 07:42 file4
-rw-r--r--. 2 root root         67 Dec  7 11:40 file_demote1
-rw-r--r--. 2 root root 1393688576 Dec  7 09:36 rhgsc-appliance004
-rw-r--r--. 2 root root 1393688576 Dec  7 12:17 rhgsc-appliance005
-rw-r--r--. 2 root root 1393688576 Dec  7 09:28 rhgsc-appliance-03

/bricks/brick1/c3:
total 4083100
---------T. 2 root root          0 Dec  7 16:53 ff1
-rw-r--r--. 2 root root        237 Dec  7 09:18 file1
-rw-r--r--. 2 root root        201 Dec  7 09:12 file2
-rw-r--r--. 2 root root         51 Dec  4 12:22 file2_hot1
-rw-r--r--. 2 root root          8 Dec  4 07:42 file5
-rw-r--r--. 2 root root          8 Dec  4 07:42 file6
-rw-r--r--. 2 root root         77 Dec  7 11:40 file_demote
-rw-r--r--. 2 root root         19 Dec  7 10:58 file_demote2
-rw-r--r--. 2 root root 1393688576 Dec  7 09:31 rhgsc-appliance00
-rw-r--r--. 2 root root 1393688576 Dec  7 09:26 rhgsc-appliance-02
-rw-r--r--. 2 root root 1393688576 Dec  4 12:06 rhgsc-appliance03

Comment 3 RamaKasturi 2015-12-07 17:28:13 UTC

sos reports can be found at the link below:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1289228/

Comment 9 RamaKasturi 2015-12-23 14:02:35 UTC

Verified and works fine with build glusterfs-3.7.5-13.el7rhgs.x86_64.

Good files gets migrated to cold tier when there are corrupted files in hot tier and when user performs a deatch tier on the volume.

Comment 11 errata-xmlrpc 2016-03-01 06:01:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.