Description of problem: ======================= On longevity setup consists of hot tier {6x2} and cold tier {2x (4 + 2)}. Stopping volume and starting back triggers the layout fixing and eventually fails on the local host. Node 1: which is used in stop and start of volume: Tier logs: ========== [2015-10-29 08:17:06.988839] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of / [2015-10-29 08:17:06.988865] W [MSGID: 109016] [dht-selfheal.c:1487:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: Layout fix failed: 1 subvolume(s) are down. Skipping fix layout. [2015-10-29 08:17:06.989085] E [MSGID: 109026] [dht-rebalance.c:2992:gf_defrag_start_crawl] 0-tiervolume-tier-dht: fix layout on / failed [2015-10-29 08:17:06.989127] I [MSGID: 109028] [dht-rebalance.c:3327:gf_defrag_status_get] 0-tiervolume-tier-dht: Rebalance is failed. Time taken is 0.00 secs All the other nodes logs the following: ======================================= [2015-10-29 08:41:58.665874] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d7312%%DIIOR3J5QX lookup failed [2015-10-29 08:41:58.679069] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d72ec%%J3OIWCRM3T lookup failed [2015-10-29 08:41:58.687659] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d72ed%%NO1PDGAEIH lookup failed [2015-10-29 08:41:58.698332] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d7315%%ZPOFELKK0S lookup failed [2015-10-29 08:41:58.706011] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d732a%%JZAJN3YPML lookup failed [2015-10-29 08:41:58.716345] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d72e9%%NBAS57MF0H lookup failed [2015-10-29 08:41:58.724244] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d72ed%%NL8UAU9PYM lookup failed [2015-10-29 08:41:58.735774] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d733a%%RJS5J2ETDI lookup failed [2015-10-29 08:41:58.743622] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d72eb%%TLH672U96B lookup failed [2015-10-29 08:41:58.749406] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d7322%%04BKVRVH3U lookup failed [2015-10-29 08:41:58.760800] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d732c%%TV3DZHKDER lookup failed [2015-10-29 08:41:58.764740] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread0/level01/562d733b%%1F737UYTWG lookup failed [2015-10-29 08:41:58.789417] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02 [2015-10-29 08:41:58.789460] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:41:58.789472] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:41:58.841486] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02/level12 [2015-10-29 08:41:58.841531] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:41:58.841542] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:41:58.930999] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02/level12/level22 [2015-10-29 08:41:58.931045] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:41:58.931056] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:41:58.968563] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02/level12/level22/level32 [2015-10-29 08:41:58.968608] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:41:58.968627] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:41:59.009282] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02/level12/level22/level32/level42 [2015-10-29 08:41:59.009340] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:41:59.009371] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:41:59.089329] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread0/level02/level12/level22/level32/level42/level52 Numbers in which these logs reported as: [root@dhcp37-133 glusterfs]# grep "dht-common.c:3810:dht_setxattr" tiervolume-tier.log | wc -l 16545 [root@dhcp37-133 glusterfs]# grep "gf_fix_layout_tier_attach_lookup" tiervolume-tier.log | wc -l 24178 [root@dhcp37-133 glusterfs]# Even after 15 mins, the logs keep generating for same lookup and fixlayout issues. [2015-10-29 08:44:09.023724] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:44:09.093113] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread2/level07/level17/level27/level37/level47/level57/level67/level77/level87/level97 [2015-10-29 08:44:09.093166] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:44:09.093179] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:44:09.197870] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread2/level07/level17/level27/level37/level47/level57/level67/level77/level87/level97/562d85cb%%NU7OTMO8OQ lookup failed [2015-10-29 08:44:09.231579] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread2/level07/level17/level27/level37/level47/level57/level67/level77/level87/level97/562d85d4%%I0FZDK9ZMY lookup failed [2015-10-29 08:44:09.243277] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread2/level07/level17/level27/level37/level47/level57/level67/level77/level87/level97/562d85e5%%5DMZ9FFO9H lookup failed [2015-10-29 08:44:09.249661] E [MSGID: 109037] [dht-rebalance.c:2666:gf_fix_layout_tier_attach_lookup] 0-tiervolume-tier-dht: /thread2/level07/level17/level27/level37/level47/level57/level67/level77/level87/level97/562d85ed%%NYTHVBACBH lookup failed [2015-10-29 08:44:10.654213] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread2/level08 [2015-10-29 08:44:10.654266] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:44:10.654284] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:44:10.701865] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread2/level08/level18 [2015-10-29 08:44:10.701917] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 0 (tiervolume-cold-dht): 1222023 chunks [2015-10-29 08:44:10.701937] I [MSGID: 109045] [dht-selfheal.c:1509:dht_fix_layout_of_directory] 0-tiervolume-tier-dht: subvolume 1 (tiervolume-hot-dht): 916518 chunks [2015-10-29 08:44:10.749581] I [MSGID: 109081] [dht-common.c:3810:dht_setxattr] 0-tiervolume-tier-dht: fixing the layout of /thread2/level08/level18/level28 Rebalance on local node failed as: =================================== [root@dhcp37-165 glusterfs]# gluster v rebal tiervolume status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 1 0 failed 0.00 10.70.37.133 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.160 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.158 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.110 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.155 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.99 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.88 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.112 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.199 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.162 0 0Bytes 0 0 0 in progress 1672.00 10.70.37.87 0 0Bytes 0 0 0 in progress 1672.00 volume rebalance: tiervolume: success: [root@dhcp37-165 glusterfs]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-0.3.el7rhgs.x86_64 Steps carried: ============== 1. 12 node cluster 2. Hot tier {6x2} , Cold tier {2x(4=2)} 3. Mounted the volume on 7.2,7.1 and 6.7 clients 4. Huge set of data is created on volume {148GB} 5. Stopped the volume {No data creation or IO was in progress at this time} 6. Started the volume
*** Bug 1229270 has been marked as a duplicate of this bug. ***
master : http://review.gluster.org/#/c/12718/ release-3.7 : http://review.gluster.org/#/c/12749/
During the verification of this bug hit another issue mentioned in bz 1288051 . Since the tierd goes to faulty, verification of this bug depends on the closure of bz 1288051 . Marking dependent.
Verified with build: glusterfs-3.7.5-14.el7rhgs.x86_64 Restarting a volume triggers fix layout which is known to tier team but the issue mentioned in this bug for failing fixing layout is not seen. Moving this bug to verified state. [root@dhcp37-165 glusterfs]# grep -i "gf_defrag_start_crawl" tiervolume-tier.log | grep -i "failed" [root@dhcp37-165 glusterfs]# [root@dhcp37-165 glusterfs]# grep -i "failed" tiervolume-tier.log | grep -i "fix" [root@dhcp37-165 glusterfs]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html