Bug 1229233

Summary: Data Tiering:3.7.0:data loss:detach-tier not flushing data to cold-tier
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Dan Lambright <dlambrig>
Status: CLOSED CURRENTRELEASE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.1CC: asrivast, bugs, dlambrig, gluster-bugs, josferna, ndevos, nsathyan, rhs-bugs, storage-qa-internal, trao
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1205540 Environment:
Last Closed: 2015-10-30 12:39:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1205540    
Bug Blocks: 1186580, 1202842, 1219513, 1220047    
Attachments:
Description Flags
server#1 logs sosreports failed_qa
none
server#2 logs sosreports failed_qa
none
server#3 logs sosreports failed_qa
none
problem 1 console logs failed qa none

Description Nag Pavan Chilakam 2015-06-08 10:15:15 UTC
+++ This bug was initially created as a clone of Bug #1205540 +++

Description of problem:
=======================
In a tiered volume, when we detach a tier, the operation passes successfully, but doesnt flush data to cold tier.
This leads to data loss.


Version-Release number of selected component (if applicable):
============================================================
3.7 upstream nightlies build http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/epel-6-x86_64/glusterfs-3.7dev-0.777.git2308c07.autobuild/


How reproducible:
=================
Easy to reproduce


Steps to Reproduce:
==================
1.create a gluster volume(i created a distribute type) and start the volume
2.attach a tier to the volume using attach-tier
3.now write some files to the volume. All files(if sufficient space available) will be written to the hot-tier
4. Now detach the tier using detach-tier command.


Actual results:
===============
When we detach the tier, the tier gets detached without flushing the data in hot tier to cold. Due to this there is data loss

Expected results:
================
Detach tier should succeed only after all data is flushed to cold tier.


Additional info(CLI logs):
===============
[root@rhs-client44 everglades]# gluster v info vol1
 
Volume Name: vol1
Type: Distribute
Volume ID: 3382e788-ee37-4d6c-b214-8469ca68e376
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-client44:/pavanbrick1/vol1/b1
Brick2: rhs-client38:/pavanbrick1/vol1/b1
Brick3: rhs-client37:/pavanbrick1/vol1/b1
[root@rhs-client44 everglades]# gluster v status vol1
Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhs-client44:/pavanbrick1/vol1/b1     49152     0          Y       29969
Brick rhs-client38:/pavanbrick1/vol1/b1     49152     0          Y       30514
Brick rhs-client37:/pavanbrick1/vol1/b1     49152     0          Y       29475
NFS Server on localhost                     2049      0          Y       29993
NFS Server on rhs-client38                  2049      0          Y       30538
NFS Server on rhs-client37                  2049      0          Y       29499
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@rhs-client44 everglades]# gluster v attach-tier vol1 rhs-client44:/pavanbrick2/vol1_hot/hb1 rhs-client37:/pavanbrick2/vol1_hot/hb1
volume add-brick: success
[root@rhs-client44 everglades]# gluster v info vol1
 
Volume Name: vol1
Type: Tier
Volume ID: 3382e788-ee37-4d6c-b214-8469ca68e376
Status: Started
Number of Bricks: 5 x 1 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client37:/pavanbrick2/vol1_hot/hb1
Brick2: rhs-client44:/pavanbrick2/vol1_hot/hb1
Brick3: rhs-client44:/pavanbrick1/vol1/b1
Brick4: rhs-client38:/pavanbrick1/vol1/b1
Brick5: rhs-client37:/pavanbrick1/vol1/b1



[root@rhs-client44 everglades]# gluster v detach-tier vol1
volume remove-brick unknown: success
[root@rhs-client44 everglades]# gluster v info vol1
 
Volume Name: vol1
Type: Distribute
Volume ID: 3382e788-ee37-4d6c-b214-8469ca68e376
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhs-client44:/pavanbrick1/vol1/b1
Brick2: rhs-client38:/pavanbrick1/vol1/b1
Brick3: rhs-client37:/pavanbrick1/vol1/b1

--- Additional comment from Anand Avati on 2015-04-01 18:56:08 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: WIP support for tier volumes 'detach start' and 'detach commit') posted (#1) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-07 07:14:48 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: WIP support for tier volumes 'detach start' and 'detach commit') posted (#2) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-09 06:08:37 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: WIP support for tier volumes 'detach start' and 'detach commit') posted (#3) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-14 00:12:50 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#4) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-16 06:19:59 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#5) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-18 07:59:25 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#6) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-21 16:52:11 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#7) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-22 06:20:24 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#8) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-22 10:39:46 EDT ---

REVIEW: http://review.gluster.org/10108 (glusterd: support for tier volumes 'detach start' and 'detach commit') posted (#9) for review on master by Kaleb KEITHLEY (kkeithle)

--- Additional comment from Anand Avati on 2015-04-22 10:51:06 EDT ---

COMMIT: http://review.gluster.org/10108 committed in master by Kaleb KEITHLEY (kkeithle) 
------
commit 86b02afab780e559e82399b9e96381d8df594ed6
Author: Dan Lambright <dlambrig>
Date:   Mon Apr 13 02:42:12 2015 +0100

    glusterd: support for tier volumes 'detach start' and 'detach commit'
    
    These commands work in a manner analagous to rebalancing when removing a
    brick. The existing migration daemon detects "detach start" and switches
    to moving data off the hot tier. While in this state all lookups are
    directed to the cold tier.
    
    gluster v detach-tier <vol> start
    gluster v detach-tier <vol> commit
    
    The status and stop cli commands shall be submitted separately.
    
    Change-Id: I24fda5cc3ba74f5fb8aa9a3234ad51f18b80a8a0
    BUG: 1205540
    Signed-off-by: Dan Lambright <dlambrig>
    Signed-off-by: root <root>
    Signed-off-by: Dan Lambright <dlambrig>
    Reviewed-on: http://review.gluster.org/10108
    Reviewed-by: Kaleb KEITHLEY <kkeithle>
    Tested-by: NetBSD Build System

--- Additional comment from Niels de Vos on 2015-05-15 06:44:50 EDT ---

Bugs should only move to ON_QA when there is a alpha/beta version available. If this is available in a nightly build, please set the "fixed in version" field accordingly.

Comment 5 Nag Pavan Chilakam 2015-06-29 12:05:58 UTC
Created attachment 1044337 [details]
server#1 logs sosreports failed_qa

Comment 6 Nag Pavan Chilakam 2015-06-29 12:08:46 UTC
Created attachment 1044339 [details]
server#2 logs sosreports failed_qa

Comment 7 Nag Pavan Chilakam 2015-06-29 12:10:40 UTC
Created attachment 1044340 [details]
server#3 logs sosreports failed_qa

Comment 8 Nag Pavan Chilakam 2015-06-29 12:12:08 UTC
Created attachment 1044341 [details]
problem 1 console logs failed qa

Comment 9 Nag Pavan Chilakam 2015-06-29 12:26:13 UTC
Moving the bug to failed QA due to below
=====Problem 1===(refer attachment 1044341 [details])
1) had a setup with 3 nodes, A(tettnang), B(zod) and C(yarrow)
2)Now created a 2x2 dist-rep volume with bricks belonging to only node B and C
[root@tettnang ~]# gluster v info v1
 
Volume Name: v1
Type: Distributed-Replicate
Volume ID: acd70756-8a8c-4cd9-a4c4-b5cc4bfad8ee
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: zod:/rhs/brick1/v1
Brick2: yarrow:/rhs/brick1/v1
Brick3: zod:/rhs/brick2/v1
Brick4: yarrow:/rhs/brick2/v1
Options Reconfigured:
performance.readdir-ahead: on
[root@tettnang ~]# gluster v status v1
Status of volume: v1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick zod:/rhs/brick1/v1                    49160     0          Y       26484
Brick yarrow:/rhs/brick1/v1                 49159     0          Y       11082
Brick zod:/rhs/brick2/v1                    49161     0          Y       26504
Brick yarrow:/rhs/brick2/v1                 49160     0          Y       11100
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       21765
NFS Server on yarrow                        N/A       N/A        N       N/A  
Self-heal Daemon on yarrow                  N/A       N/A        Y       11130
NFS Server on zod                           N/A       N/A        N       N/A  
Self-heal Daemon on zod                     N/A       N/A        Y       26548
 
Task Status of Volume v1
------------------------------------------------------------------------------
There are no active volume tasks

3)attached hot tier pure distribute again with brick belonging to only node B and C
4)now created some files on the mount point (fuse mount)
6)I did a detach-tier start. On detach-tier start, i check the backend bricks and found that link files were created on the cold tier with the hot tier still having the cached file contents
7)now i did a detach-tier commit, and the commit passed.
But the files even though still existed on the mount(which means the files are migrated to the cold, atleast the same filenames are created due to T files)
But the file contents are missing.

I check the backend bricks and found that if i read the cold brick files(T files) the content is missing
But on hot bricks, the file contents remain, which means the data of files are not getting flushed.

Note: I have done a read or access of another file after detach start but before commit, that file has its contents to moved to cold brick.


-====Problem2 ===
On same setup I created a tier volume with distribute over dist-rep, but this time used all nodes.
On detach i got following error
2)[root@tettnang ~]# gluster v detach-tier v2 start
volume detach-tier start: failed: Bricks not from same subvol for distribute


[2015-06-29 11:21:40.950707] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:41.487010] I [MSGID: 106484] [glusterd-brick-ops.c:819:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-06-29 11:21:41.494410] E [MSGID: 106265] [glusterd-brick-ops.c:1063:__glusterd_handle_remove_brick] 0-management: Bricks not from same subvol for distribute
[2015-06-29 11:21:43.951084] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:46.951408] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:49.951739] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:52.952016] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:55.952298] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/54cb1d4c2770a75d3e2bccd62ecdecc8.socket failed (Invalid argument)
[2015-06-29 11:21:58.444632] E [MSGID: 106301] [glusterd-op-sm.c:4043:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Rebalance' failed on localhost : Detach-tier not started.

Comment 10 Nag Pavan Chilakam 2015-06-29 12:52:33 UTC
I Have tried the validation on a tier volume with hot layer as distribute and cold as disperse. The data gets flushed but i see following errors in rebalance log of the volume
[2015-06-29 09:42:56.872045] E [MSGID: 109023] [dht-rebalance.c:553:__dht_rebalance_create_dst_file] 0-vol1-tier-dht: ftruncate failed for /coldir/hotf on vol1-cold-dht (Input/output error)
[2015-06-29 09:42:56.872082] E [MSGID: 108008] [afr-transaction.c:1984:afr_transaction] 0-vol1-cold-replicate-0: Failing FSETATTR on gfid 00000000-0000-0000-0000-000000000000: split-brain observed. [Input/output error]
[2015-06-29 09:42:56.872342] E [MSGID: 109023] [dht-rebalance.c:562:__dht_rebalance_create_dst_file] 0-vol1-tier-dht: chown failed for /coldir/hotf on vol1-cold-dht (Input/output error)
[2015-06-29 09:42:56.875000] E [MSGID: 109039] [dht-helper.c:1162:dht_rebalance_inprogress_task] 0-vol1-hot-dht: /coldir/hotf: failed to get the 'linkto' xattr [No data available]
[2015-06-29 09:42:56.875321] E [MSGID: 109023] [dht-rebalance.c:792:__dht_rebalance_open_src_file] 0-vol1-tier-dht: failed to set xattr on /coldir/hotf in vol1-hot-dht (Invalid argument)
[2015-06-29 09:42:56.875335] E [MSGID: 109023] [dht-rebalance.c:1098:dht_migrate_file] 0-vol1-tier-dht: Migrate file failed: failed to open /coldir/hotf on vol1-hot-dht
[2015-06-29 09:42:56.875794] I [MSGID: 109028] [dht-rebalance.c:3029:gf_defrag_status_get] 0-vol1-tier-dht: Rebalance is completed. Time taken is 0.00 secs
[2015-06-29 09:42:56.875816] I [MSGID: 109028] [dht-rebalance.c:3033:gf_defrag_status_get] 0-vol1-tier-dht: Files migrated: 0, size: 0, lookups: 9, failures: 0, skipped: 3



Failing to set an xattr seems to be valid, as ecvolume doesnt have hashranges, I suppose.

From a sanity perspective the ecvol data flushing seems to work

Comment 11 Nagaprasad Sathyanarayana 2015-06-30 05:26:12 UTC
*** Bug 1227485 has been marked as a duplicate of this bug. ***