Bug 1258144

Summary: Data Tiering: Tier deamon crashed when detach tier start was issued while IOs were happening
Product: [Community] GlusterFS Reporter: Nag Pavan Chilakam <nchilaka>
Component: tieringAssignee: Dan Lambright <dlambrig>
Status: CLOSED EOL QA Contact: bugs <bugs>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.7.5CC: bugs, dlambrig, rkavunga, sankarshan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-08 10:52:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1260923    

Description Nag Pavan Chilakam 2015-08-29 15:40:56 UTC
Description of problem:
=========================
I created a replicate tier over dist-rep volume. Mounted volume over nfs and I turned on ctr 
I had done quite some IOs by untarring linux kernel tar.
The files were demoted after some time as expected.
Now. I renamed the existing untarred dir and issued an untar again.
While this was going on, I issued a detach tier start.
I noted the following observations:
1)the tier deamon crashed
2)obviously, the rebalance tier status and rebalance status shows as failed
 as below:
gluster v rebal g1  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0               failed               0.00
                             10.70.46.36                0        0Bytes             0             0             0               failed               0.00



3)*IMPORTANT* The IOs were however happening still and getting populated in hot tier only(this could eventually fill the hot tier)
4)After some time, when i issued "gluster v status <vname>, it failed as below
[root@nag-manual-node1 ~]# gluster v status g1
Commit failed on localhost. Please check the log file for more details.
5)The AFR deamons too were not showing up in ps -ef


Version-Release number of selected component (if applicable):
=================================================================

[root@nag-manual-node1 ~]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
glpython-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
[root@nag-manual-node1 ~]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.




Steps to Reproduce:
===================
1.created a 2x2 vol and start it
2.attached a 1x2 replica hot tier and mounted on nfs
3.performed linux untar
4. demotes happened after some(expected)
5. Now again did a linux untar, after renaming old dir
6. While in progress, issued a detach-tier start.
7. This caused tier deamon crash(and probably even replica crash, but not sure,as the ps -ef didn't show them, but the files which were still getting untarred  were avialable on both the bricks of the hot pair)



CRASH
======
[2015-08-29 16:02:01.669901] E [MSGID: 109037] [tier.c:898:tier_start] 0-g1-tier-dht: Demotion failed!
[2015-08-29 16:02:00.311020] I [MSGID: 109038] [tier.c:350:tier_migrate_using_query_file] 0-g1-tier-dht: Tier 0 src_subvol g1-hot-dht file .gitignore
[2015-08-29 16:02:00.312280] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-g1-tier-dht: /linux-4.1.6/.gitignore does not belong to this node
[2015-08-29 16:04:00.698176] I [MSGID: 109038] [tier.c:574:tier_build_migration_qfile] 0-g1-tier-dht: Failed to remove /var/run/gluster/demotequeryfile-20559
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-08-29 16:04:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.3
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3560c25936]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x3560c4549f]
/lib64/libc.so.6[0x340e8326a0]
/lib64/libc.so.6[0x340e93372f]
/usr/lib64/libgfdb.so.0(gf_sql_query_function+0xdf)[0x7fa812066dcf]
/usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd5)[0x7fa81206bb05]
/usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x4f)[0x7fa812065f1f]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x5410d)[0x7fa81266f10d]
/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x74)[0x3560c1d2d4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x3560c1d388]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x55ea7)[0x7fa812670ea7]
/lib64/libpthread.so.0[0x340ec07a51]
/lib64/libc.so.6(clone+0x6d)[0x340e8e89ad]

Comment 1 Nag Pavan Chilakam 2015-08-29 15:49:06 UTC
sosreports @ rhsqe-repo bug.1258144]# pwd
/home/repo/sosreports/bug.1258144

Comment 2 Kaushal 2017-03-08 10:52:04 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.