Bug 1258144 - Data Tiering: Tier deamon crashed when detach tier start was issued while IOs were happening
Data Tiering: Tier deamon crashed when detach tier start was issued while IOs...
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: tiering (Show other bugs)
3.7.5
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Dan Lambright
bugs@gluster.org
: Triaged
Depends On:
Blocks: 1260923
  Show dependency treegraph
 
Reported: 2015-08-29 11:40 EDT by nchilaka
Modified: 2017-03-08 05:52 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-03-08 05:52:04 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2015-08-29 11:40:56 EDT
Description of problem:
=========================
I created a replicate tier over dist-rep volume. Mounted volume over nfs and I turned on ctr 
I had done quite some IOs by untarring linux kernel tar.
The files were demoted after some time as expected.
Now. I renamed the existing untarred dir and issued an untar again.
While this was going on, I issued a detach tier start.
I noted the following observations:
1)the tier deamon crashed
2)obviously, the rebalance tier status and rebalance status shows as failed
 as below:
gluster v rebal g1  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0               failed               0.00
                             10.70.46.36                0        0Bytes             0             0             0               failed               0.00



3)*IMPORTANT* The IOs were however happening still and getting populated in hot tier only(this could eventually fill the hot tier)
4)After some time, when i issued "gluster v status <vname>, it failed as below
[root@nag-manual-node1 ~]# gluster v status g1
Commit failed on localhost. Please check the log file for more details.
5)The AFR deamons too were not showing up in ps -ef


Version-Release number of selected component (if applicable):
=================================================================

[root@nag-manual-node1 ~]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
glpython-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
[root@nag-manual-node1 ~]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.




Steps to Reproduce:
===================
1.created a 2x2 vol and start it
2.attached a 1x2 replica hot tier and mounted on nfs
3.performed linux untar
4. demotes happened after some(expected)
5. Now again did a linux untar, after renaming old dir
6. While in progress, issued a detach-tier start.
7. This caused tier deamon crash(and probably even replica crash, but not sure,as the ps -ef didn't show them, but the files which were still getting untarred  were avialable on both the bricks of the hot pair)



CRASH
======
[2015-08-29 16:02:01.669901] E [MSGID: 109037] [tier.c:898:tier_start] 0-g1-tier-dht: Demotion failed!
[2015-08-29 16:02:00.311020] I [MSGID: 109038] [tier.c:350:tier_migrate_using_query_file] 0-g1-tier-dht: Tier 0 src_subvol g1-hot-dht file .gitignore
[2015-08-29 16:02:00.312280] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-g1-tier-dht: /linux-4.1.6/.gitignore does not belong to this node
[2015-08-29 16:04:00.698176] I [MSGID: 109038] [tier.c:574:tier_build_migration_qfile] 0-g1-tier-dht: Failed to remove /var/run/gluster/demotequeryfile-20559
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-08-29 16:04:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.3
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3560c25936]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x3560c4549f]
/lib64/libc.so.6[0x340e8326a0]
/lib64/libc.so.6[0x340e93372f]
/usr/lib64/libgfdb.so.0(gf_sql_query_function+0xdf)[0x7fa812066dcf]
/usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd5)[0x7fa81206bb05]
/usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x4f)[0x7fa812065f1f]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x5410d)[0x7fa81266f10d]
/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x74)[0x3560c1d2d4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x3560c1d388]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x55ea7)[0x7fa812670ea7]
/lib64/libpthread.so.0[0x340ec07a51]
/lib64/libc.so.6(clone+0x6d)[0x340e8e89ad]
Comment 1 nchilaka 2015-08-29 11:49:06 EDT
sosreports @ rhsqe-repo bug.1258144]# pwd
/home/repo/sosreports/bug.1258144
Comment 2 Kaushal 2017-03-08 05:52:04 EST
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.