Bug 1331628 - [Tiering]: detach tier operation fails
Summary: [Tiering]: detach tier operation fails
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: hari gowtham
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1332957
TreeView+ depends on / blocked
 
Reported: 2016-04-29 05:38 UTC by krishnaram Karthick
Modified: 2016-09-17 15:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-16 03:11:01 UTC
Embargoed:


Attachments (Terms of Use)

Description krishnaram Karthick 2016-04-29 05:38:55 UTC
Description of problem:
Detach tier operation on tiered volume failed. Although ganesha mount was used for the testing, it shouldn't be the cause as no IO was performed during detach tier except for tier migrations.

Volume Name: ganesha-tier
Type: Tier
Volume ID: 5dc054c0-b15c-49dc-9494-38bd04d05819
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: 10.70.47.156:/bricks/brick1/l1
Brick2: 10.70.47.156:/bricks/brick0/l1
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick3: 10.70.47.192:/bricks/brick0/l1
Brick4: 10.70.47.178:/bricks/brick0/l1
Brick5: 10.70.47.160:/bricks/brick0/l1
Brick6: 10.70.47.192:/bricks/brick1/l1
Brick7: 10.70.47.178:/bricks/brick1/l1
Brick8: 10.70.47.160:/bricks/brick1/l1
Options Reconfigured:
cluster.watermark-hi: 10
cluster.watermark-low: 5
cluster.tier-mode: cache
features.ctr-enabled: on
features.inode-quota: off
features.quota: off
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable



[root@dhcp47-156 gluster]# gluster v status ganesha-tier
Status of volume: ganesha-tier
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.47.156:/bricks/brick1/l1        49153     0          Y       19944
Brick 10.70.47.156:/bricks/brick0/l1        49152     0          Y       19924
Cold Bricks:
Brick 10.70.47.192:/bricks/brick0/l1        49153     0          Y       8824 
Brick 10.70.47.178:/bricks/brick0/l1        49152     0          Y       4509 
Brick 10.70.47.160:/bricks/brick0/l1        49152     0          Y       692  
Brick 10.70.47.192:/bricks/brick1/l1        49154     0          Y       8869 
Brick 10.70.47.178:/bricks/brick1/l1        49153     0          Y       4528 
Brick 10.70.47.160:/bricks/brick1/l1        49153     0          Y       766  
Self-heal Daemon on localhost               N/A       N/A        Y       20019
Self-heal Daemon on 10.70.47.178            N/A       N/A        Y       16055
Self-heal Daemon on 10.70.47.192            N/A       N/A        Y       8247 
Self-heal Daemon on 10.70.47.160            N/A       N/A        Y       14582
 
Task Status of Volume ganesha-tier
------------------------------------------------------------------------------
Task                 : Detach tier         
ID                   : 91018093-6969-4406-b494-9007a661d167
Status               : failed              


Following messages are seen in the tier logs.

[2016-04-29 10:01:32.676177] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7f35a8be0fe2] -->/usr/sbin/glusterfs(glusterfs_handle_terminate+0x15) [0x7f35a9075415] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f35a9072739] ) 0-: received signum (15), shutting down
[2016-04-29 10:01:34.772142] I [timer.c:48:gf_timer_call_after] (-->/lib64/libglusterfs.so.0(gf_timer_proc+0x11b) [0x7f35a8bbe98b] -->/lib64/libgfrpc.so.0(+0xff83) [0x7f35a896cf83] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x166) [0x7f35a8bbe6c6] ) 0-timer: ctx cleanup started
[2016-04-29 10:01:34.772196] W [rpc-clnt.c:170:call_bail] 0-glusterfs: Cannot create bailout timer for 127.0.0.1:24007 	


[2016-04-29 10:01:44.314059] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file omap2420-n8x0-common.dtsi
 [Invalid argument]
[2016-04-29 10:00:23.626146] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file snvs-pwrkey.txt
 [Invalid argument]
[2016-04-29 10:00:23.634539] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file map.h
 [Invalid argument]
[2016-04-29 10:00:23.784007] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file gpio.txt
 [Invalid argument]
[2016-04-29 10:00:23.920081] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file file-957
 [Invalid argument]
[2016-04-29 10:01:48.799826] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7fb760552fe2] -->/usr/sbin/glusterfs(glusterfs_handle_terminate+0x15) [0x7fb7609e7415] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7fb7609e4739] ) 0-: received signum (15), shutting down


Version-Release number of selected component (if applicable):
glusterfs-server-3.7.9-2.el7rhgs.x86_64

How reproducible:
frequently

Steps to Reproduce:
1. create a dispersed volume
2. create bunch of files, dirs, kernel untar
3. while step 2 is in progress, attach tier
4. Allow files to be promoted, new files to be written in hot tier
5. reduce watermark levels so that high watermark is hit
6. detach tier

Actual results:
detach tier starts, but fails eventually.

Expected results:
detach tier should succeed.

Additional info:
sosreports shall be attached shortly.

Comment 2 krishnaram Karthick 2016-04-29 06:06:29 UTC
In the test run, detach tier was executed after fix layout was complete.

Comment 6 Joseph Elwin Fernandes 2016-05-06 13:13:10 UTC

*** This bug has been marked as a duplicate of bug 1332957 ***

Comment 7 Joseph Elwin Fernandes 2016-05-06 13:14:41 UTC
*** Bug 1333804 has been marked as a duplicate of this bug. ***

Comment 8 hari gowtham 2016-05-09 09:54:20 UTC
Partial RCA: there was a GFID mismatch found during detach operation.

Comment 9 hari gowtham 2016-05-09 11:41:44 UTC
the error messages are :
[2016-05-01 08:07:31.938737] W [MSGID: 122019] [ec-helpers.c:361:ec_loc_gfid_check] 0-ganesha-tier-disperse-0: Mismatching GFID's in loc
[2016-05-01 08:07:31.938886] E [MSGID: 109023] [dht-rebalance.c:2353:gf_defrag_get_entry] 0-ganesha-tier-tier-dht: Migrate file failed:/linux-kernel/linux-4.5.2/Kbuild lookup failed
[2016-05-01 08:07:31.938945] I [dht-rebalance.c:2672:gf_defrag_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry
[2016-05-01 08:07:31.939203] E [MSGID: 109111] [dht-rebalance.c:2943:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: gf_defrag_process_dir failed for directory: /linux-kernel/linux-4.5.2
[2016-05-01 08:07:31.939245] E [MSGID: 109016] [dht-rebalance.c:3120:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: Fix layout failed for /linux-kernel/linux-4.5.2
[2016-05-01 08:07:31.939264] E [MSGID: 109016] [dht-rebalance.c:3120:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: Fix layout failed for /linux-kernel

Comment 10 Dan Lambright 2016-05-16 15:32:39 UTC
Discussed in scrum, QE is working on steps to make  this reproducable.

Comment 14 Dan Lambright 2016-07-16 03:11:01 UTC
Closing per discussion with QE.

Nithya: We could not reproduce this. Karthick, can we close this as WorksForMe and
reopen if seen again?

Karthik: Yes, This issue wasn't seen in later stages of 3.1.3


Note You need to log in before you can comment on or make changes to this bug.