Description of problem: Detach tier operation on tiered volume failed. Although ganesha mount was used for the testing, it shouldn't be the cause as no IO was performed during detach tier except for tier migrations. Volume Name: ganesha-tier Type: Tier Volume ID: 5dc054c0-b15c-49dc-9494-38bd04d05819 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: 10.70.47.156:/bricks/brick1/l1 Brick2: 10.70.47.156:/bricks/brick0/l1 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick3: 10.70.47.192:/bricks/brick0/l1 Brick4: 10.70.47.178:/bricks/brick0/l1 Brick5: 10.70.47.160:/bricks/brick0/l1 Brick6: 10.70.47.192:/bricks/brick1/l1 Brick7: 10.70.47.178:/bricks/brick1/l1 Brick8: 10.70.47.160:/bricks/brick1/l1 Options Reconfigured: cluster.watermark-hi: 10 cluster.watermark-low: 5 cluster.tier-mode: cache features.ctr-enabled: on features.inode-quota: off features.quota: off ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@dhcp47-156 gluster]# gluster v status ganesha-tier Status of volume: ganesha-tier Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.47.156:/bricks/brick1/l1 49153 0 Y 19944 Brick 10.70.47.156:/bricks/brick0/l1 49152 0 Y 19924 Cold Bricks: Brick 10.70.47.192:/bricks/brick0/l1 49153 0 Y 8824 Brick 10.70.47.178:/bricks/brick0/l1 49152 0 Y 4509 Brick 10.70.47.160:/bricks/brick0/l1 49152 0 Y 692 Brick 10.70.47.192:/bricks/brick1/l1 49154 0 Y 8869 Brick 10.70.47.178:/bricks/brick1/l1 49153 0 Y 4528 Brick 10.70.47.160:/bricks/brick1/l1 49153 0 Y 766 Self-heal Daemon on localhost N/A N/A Y 20019 Self-heal Daemon on 10.70.47.178 N/A N/A Y 16055 Self-heal Daemon on 10.70.47.192 N/A N/A Y 8247 Self-heal Daemon on 10.70.47.160 N/A N/A Y 14582 Task Status of Volume ganesha-tier ------------------------------------------------------------------------------ Task : Detach tier ID : 91018093-6969-4406-b494-9007a661d167 Status : failed Following messages are seen in the tier logs. [2016-04-29 10:01:32.676177] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7f35a8be0fe2] -->/usr/sbin/glusterfs(glusterfs_handle_terminate+0x15) [0x7f35a9075415] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f35a9072739] ) 0-: received signum (15), shutting down [2016-04-29 10:01:34.772142] I [timer.c:48:gf_timer_call_after] (-->/lib64/libglusterfs.so.0(gf_timer_proc+0x11b) [0x7f35a8bbe98b] -->/lib64/libgfrpc.so.0(+0xff83) [0x7f35a896cf83] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x166) [0x7f35a8bbe6c6] ) 0-timer: ctx cleanup started [2016-04-29 10:01:34.772196] W [rpc-clnt.c:170:call_bail] 0-glusterfs: Cannot create bailout timer for 127.0.0.1:24007 [2016-04-29 10:01:44.314059] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file omap2420-n8x0-common.dtsi [Invalid argument] [2016-04-29 10:00:23.626146] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file snvs-pwrkey.txt [Invalid argument] [2016-04-29 10:00:23.634539] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file map.h [Invalid argument] [2016-04-29 10:00:23.784007] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file gpio.txt [Invalid argument] [2016-04-29 10:00:23.920081] E [MSGID: 109037] [tier.c:694:tier_migrate_using_query_file] 0-ganesha-tier-tier-dht: Failed to lookup file file-957 [Invalid argument] [2016-04-29 10:01:48.799826] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7fb760552fe2] -->/usr/sbin/glusterfs(glusterfs_handle_terminate+0x15) [0x7fb7609e7415] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7fb7609e4739] ) 0-: received signum (15), shutting down Version-Release number of selected component (if applicable): glusterfs-server-3.7.9-2.el7rhgs.x86_64 How reproducible: frequently Steps to Reproduce: 1. create a dispersed volume 2. create bunch of files, dirs, kernel untar 3. while step 2 is in progress, attach tier 4. Allow files to be promoted, new files to be written in hot tier 5. reduce watermark levels so that high watermark is hit 6. detach tier Actual results: detach tier starts, but fails eventually. Expected results: detach tier should succeed. Additional info: sosreports shall be attached shortly.
In the test run, detach tier was executed after fix layout was complete.
*** This bug has been marked as a duplicate of bug 1332957 ***
*** Bug 1333804 has been marked as a duplicate of this bug. ***
Partial RCA: there was a GFID mismatch found during detach operation.
the error messages are : [2016-05-01 08:07:31.938737] W [MSGID: 122019] [ec-helpers.c:361:ec_loc_gfid_check] 0-ganesha-tier-disperse-0: Mismatching GFID's in loc [2016-05-01 08:07:31.938886] E [MSGID: 109023] [dht-rebalance.c:2353:gf_defrag_get_entry] 0-ganesha-tier-tier-dht: Migrate file failed:/linux-kernel/linux-4.5.2/Kbuild lookup failed [2016-05-01 08:07:31.938945] I [dht-rebalance.c:2672:gf_defrag_process_dir] 0-DHT: Found critical error from gf_defrag_get_entry [2016-05-01 08:07:31.939203] E [MSGID: 109111] [dht-rebalance.c:2943:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: gf_defrag_process_dir failed for directory: /linux-kernel/linux-4.5.2 [2016-05-01 08:07:31.939245] E [MSGID: 109016] [dht-rebalance.c:3120:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: Fix layout failed for /linux-kernel/linux-4.5.2 [2016-05-01 08:07:31.939264] E [MSGID: 109016] [dht-rebalance.c:3120:gf_defrag_fix_layout] 0-ganesha-tier-tier-dht: Fix layout failed for /linux-kernel
Discussed in scrum, QE is working on steps to make this reproducable.
Closing per discussion with QE. Nithya: We could not reproduce this. Karthick, can we close this as WorksForMe and reopen if seen again? Karthik: Yes, This issue wasn't seen in later stages of 3.1.3