Bug 1263585

Summary:	Data Tiering:new crash seen with tier rebalance deamon
Product:	[Community] GlusterFS	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	tiering	Assignee:	Dan Lambright <dlambrig>
Status:	CLOSED DUPLICATE	QA Contact:	bugs <bugs>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	3.7.4	CC:	bugs, dlambrig, nbalacha
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-09-25 19:23:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1260923

Description Nag Pavan Chilakam 2015-09-16 08:55:52 UTC

I create a 2x2 hot over 2x(4+2) ec volume.
I set promote freq as 1000sec and demote as 100 and mounted the tier vol on fuse.
Now I wrote many different folders containing mp3 files(about 15 folders each on an avg holding 5-6 files)

I left this volume for some time so that all files will get demoted.
After about 5 hrs, this crash was hit


[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/gold --xlator-option *d'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007ff8f9f82af0 in syncop_ipc (subvol=0x7ff7a2de5700, op=op@entry=1, 
    xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:2823
2823	        SYNCOP (subvol, (&args), syncop_ipc_cbk, subvol->fops->ipc,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.12.2-14.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.3-9.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-22.el7_1.1.x86_64 openssl-libs-1.0.1e-42.el7_1.9.x86_64 pcre-8.32-14.el7.x86_64 sqlite-3.7.17-6.el7_1.1.x86_64 sssd-client-1.12.2-58.el7_1.14.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0  0x00007ff8f9f82af0 in syncop_ipc (subvol=0x7ff7a2de5700, op=op@entry=1, 
    xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:2823
#1  0x00007ff8e7919fb9 in tier_process_brick_cbk (args=<synthetic pointer>, 
    local_brick=0x7ff8db43bc60) at tier.c:548
#2  tier_build_migration_qfile (is_promotion=_gf_false, 
    query_cbk_args=0x7ff79fddee70, args=0x7ff8db43bcc0) at tier.c:608
#3  tier_demote (args=0x7ff8db43bcc0) at tier.c:668
#4  0x00007ff8f8d99df5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ff8f86e01ad in clone () from /lib64/libc.so.6
(gdb) quit
[root@zod /]# ll core.32719



[root@zod glusterfs]# tail -n 30 gold-tier.log 

2015-09-16 07:00:00.348553] W [MSGID: 101105] [gfdb_sqlite3.c:379:apply_sql_params_db] 0-sqlite3: Failed to retrieve sql-db-autovacuum from params.Assigning default value: none
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-09-16 07:00:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7ff8f9f30fd2]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7ff8f9f4d45d]
/lib64/libc.so.6(+0x35650)[0x7ff8f861f650]
/lib64/libglusterfs.so.0(syncop_ipc+0x140)[0x7ff8f9f82af0]
/usr/lib64/glusterfs/3.7.4/xlator/cluster/tier.so(+0x55fb9)[0x7ff8e7919fb9]
/lib64/libpthread.so.0(+0x7df5)[0x7ff8f8d99df5]
/lib64/libc.so.6(clone+0x6d)[0x7ff8f86e01ad]
---------



Version-Release number of selected component (if applicable):
=============================================================
[root@zod glusterfs]# gluster --version
glusterfs 3.7.4 built on Sep 12 2015 01:35:35
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@zod glusterfs]# rpm -qa|grep gluster
glusterfs-client-xlators-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-api-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-fuse-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-debuginfo-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-server-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-cli-3.7.4-0.33.git1d02d4b.el7.centos.x86_64
glusterfs-libs-3.7.4-0.33.git1d02d4b.el7.centos.x86_64


[root@zod glusterfs]# gluster v info gold;gluster v status gold;gluster v rebal gold status;gluster v tier gold status
 
Volume Name: gold
Type: Tier
Volume ID: ba50bc3f-5b0f-4707-bf4f-cf5a2f5a192b
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: yarrow:/rhs/brick6/gold_hot
Brick2: zod:/rhs/brick6/gold_hot
Brick3: yarrow:/rhs/brick7/gold_hot
Brick4: zod:/rhs/brick7/gold_hot
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: zod:/rhs/brick1/gold
Brick6: yarrow:/rhs/brick1/gold
Brick7: zod:/rhs/brick2/gold
Brick8: yarrow:/rhs/brick2/gold
Brick9: zod:/rhs/brick3/gold
Brick10: yarrow:/rhs/brick3/gold
Brick11: zod:/rhs/brick4/gold
Brick12: yarrow:/rhs/brick4/gold
Brick13: zod:/rhs/brick5/gold
Brick14: yarrow:/rhs/brick5/gold
Brick15: yarrow:/rhs/brick6/gold
Brick16: zod:/rhs/brick6/gold
Options Reconfigured:
cluster.tier-demote-frequency: 100
cluster.tier-promote-frequency: 1000
features.ctr-enabled: on
performance.io-cache: off
performance.quick-read: off
performance.readdir-ahead: on
Status of volume: gold
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick yarrow:/rhs/brick6/gold_hot           49227     0          Y       739  
Brick zod:/rhs/brick6/gold_hot              49234     0          Y       32646
Brick yarrow:/rhs/brick7/gold_hot           49226     0          Y       713  
Brick zod:/rhs/brick7/gold_hot              49233     0          Y       32622
Cold Bricks:
Brick zod:/rhs/brick1/gold                  49227     0          Y       32303
Brick yarrow:/rhs/brick1/gold               49220     0          Y       32644
Brick zod:/rhs/brick2/gold                  49228     0          Y       32321
Brick yarrow:/rhs/brick2/gold               49221     0          Y       32668
Brick zod:/rhs/brick3/gold                  49229     0          Y       32339
Brick yarrow:/rhs/brick3/gold               49222     0          Y       32686
Brick zod:/rhs/brick4/gold                  49230     0          Y       32357
Brick yarrow:/rhs/brick4/gold               49223     0          Y       32704
Brick zod:/rhs/brick5/gold                  49231     0          Y       32375
Brick yarrow:/rhs/brick5/gold               49224     0          Y       32723
Brick yarrow:/rhs/brick6/gold               49225     0          Y       32747
Brick zod:/rhs/brick6/gold                  49232     0          Y       32393
NFS Server on localhost                     2049      0          Y       11286
NFS Server on yarrow                        2049      0          Y       23678
 
Task Status of Volume gold
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 8e8a40b4-be43-4593-a192-3f2b7411bd97
Status               : in progress         
 
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes           108             0             0          in progress             765.00
                                  yarrow                0        0Bytes          7344             0             0          in progress           68476.00
volume rebalance: gold: success: 
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            0                    0                    in progress         
yarrow               0                    0                    in progress         
volume rebalance: gold: success: 
[root@zod glusterfs]#

Comment 1 Nag Pavan Chilakam 2015-09-16 09:04:43 UTC

cores and sosreports @rhsqe-repo.lab.eng.blr.redhat.com
[root@rhsqe-repo sosreports]# /home/repo/sosreports/bug.1263585

Comment 2 Nithya Balachandran 2015-09-16 09:16:37 UTC

Analysis:

The bricklists have an extra entry at the end. I have verified this on a running tier process for a newly created volume as well .


Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id rebalance/gold --xlator-option *d'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007ff8f9f82af0 in syncop_ipc (subvol=0x7ff7a2de5700, op=op@entry=1, xdata_in=xdata_in@entry=0x0, 
    xdata_out=xdata_out@entry=0x0) at syncop.c:2823
2823	        SYNCOP (subvol, (&args), syncop_ipc_cbk, subvol->fops->ipc,

(gdb) bt
#0  0x00007ff8f9f82af0 in syncop_ipc (subvol=0x7ff7a2de5700, op=op@entry=1, xdata_in=xdata_in@entry=0x0, 
    xdata_out=xdata_out@entry=0x0) at syncop.c:2823
#1  0x00007ff8e7919fb9 in tier_process_brick_cbk (args=<synthetic pointer>, local_brick=0x7ff8db43bc60) at tier.c:548
#2  tier_build_migration_qfile (is_promotion=_gf_false, query_cbk_args=0x7ff79fddee70, args=0x7ff8db43bcc0) at tier.c:608
#3  tier_demote (args=0x7ff8db43bcc0) at tier.c:668
#4  0x00007ff8f8d99df5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ff8f86e01ad in clone () from /lib64/libc.so.6



(gdb) p *subvol
$1 = {name = 0x7ff7a2de5700 "", type = 0x7ff8cc037460 "\002", next = 0x7ff7a2de5700, prev = 0x1, parents = 0x0, 
  children = 0x5fd8b3b6b4645700, options = 0x86286dc0e2942214, dlhandle = 0x0, fops = 0x0, cbks = 0x0, dumpops = 0x0, 
  volume_options = {next = 0x0, prev = 0x0}, fini = 0x0, init = 0x0, reconfigure = 0x0, mem_acct_init = 0x0, notify = 0x0, 
  loglevel = GF_LOG_NONE, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 11 times>, {min = 0, 
      max = 0, total = 0, std = 6.9515818434295051e-310, mean = 6.9515814287748329e-310, count = 140527034957824}, {
      min = 140701566130656, max = 140701566130656, total = -nan(0xfffffffffffe0), std = 0, mean = 0, count = 140701566127856}, {
      min = 16, max = 1, total = 0, std = 0, mean = 0, count = 1}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {
      min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {
      min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {
      min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {
      min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {
      min = 0, max = 0, total = 0, std = 0, mean = 0, count = 140701566130704}, {min = 0, max = 0, total = 0, std = 0, mean = 0, 
      count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, 
      count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, 
      count = 0}, {min = 0, max = 0, total = 0, std = 1.4308692302641265e-308, mean = 0, count = 0}, {min = 0, 
      max = 140707013696768, total = 6.9518399606381148e-310, std = 0, mean = 0, count = 0}, {min = 0, max = 0, total = 0, 
      std = 0, mean = 0, count = 140701557739520}, {min = 8392704, max = 4096, total = 2.0236928853657458e-320, std = 0, 
      mean = 0, count = 0}, {min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 17 times>}, history = 0x0, 
  ctx = 0x0, graph = 0x0, itable = 0x0, init_succeeded = 0 '\000', private = 0x0, mem_acct = 0x0, winds = 0, 
  switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}
(gdb) 


The subvol is not a valid xlator. Dereferencing this causes the crash.


(gdb) f 2
#2  tier_build_migration_qfile (is_promotion=_gf_false, query_cbk_args=0x7ff79fddee70, args=0x7ff8db43bcc0) at tier.c:608
608	                ret = tier_process_brick_cbk (local_brick,
(gdb) l
603	        gfdb_brick_dict_info.time_stamp = &time_in_past;
604	        gfdb_brick_dict_info._gfdb_promote = is_promotion;
605	        gfdb_brick_dict_info._query_cbk_args = query_cbk_args;
606	
607	        list_for_each_entry (local_brick, args->brick_list, list) {
608	                ret = tier_process_brick_cbk (local_brick,
609	                                                &gfdb_brick_dict_info);
610	                if (ret) {
611	                        gf_msg (args->this->name, GF_LOG_ERROR, 0,
612	                                DHT_MSG_BRICK_QUERY_FAILED,
(gdb) 
613	                                "Brick query failed\n");
614	                        goto out;
615	                }
616	        }




(gdb) p *args
$2 = {this = 0x0, defrag = 0x0, brick_list = 0x0, freq_time = 0, return_value = 0}


args is NULL at this point because of the same issue as in BZ 1263200. However it looks like the demote thread had already started processing the brick list before the args were cleared.

On switching to the tier_start thread and traversing bricklist_hot:


 (gdb) bt
#0  0x00007ff8f8da099d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007ff8f9f50914 in gf_timer_proc (ctx=0x7ff8fb371010) at timer.c:205
#2  0x00007ff8f8d99df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ff8f86e01ad in clone () from /lib64/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0x7ff8ef549700 (LWP 32723))]
#0  0x00007ff8f86a748d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ff8f86a748d in nanosleep () from /lib64/libc.so.6
#1  0x00007ff8f86a7324 in sleep () from /lib64/libc.so.6
#2  0x00007ff8e791af2a in tier_start (this=0x7ff8e8020920, defrag=0x7ff8e8028c10) at tier.c:860
#3  0x00007ff8e7dafd34 in gf_defrag_start_crawl (data=0x7ff8e8020920) at dht-rebalance.c:2841
#4  0x00007ff8f9f72d72 in synctask_wrap (old_task=<optimized out>) at syncop.c:380
#5  0x00007ff8f86310f0 in ?? () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()

(gdb) f 2
#2  0x00007ff8e791af2a in tier_start (this=0x7ff8e8020920, defrag=0x7ff8e8028c10) at tier.c:860
860	                sleep(1);

(gdb) p *((brick_list_t *)((char *)(bricklist_hot->next)-(unsigned long)(&((brick_list_t *)0)->list)))
$3 = {xlator = 0x7ff8e8009c70, brick_db_path = 0x7ff8cc00c260 "/rhs/brick6/gold_hot/.glusterfs/gold_hot.db", list = {
    next = 0x7ff8cc000db0, prev = 0x7ff8db43bc70}}

(gdb) p *((brick_list_t *)((char *)(bricklist_hot->next->next)-(unsigned long)(&((brick_list_t *)0)->list)))
$4 = {xlator = 0x7ff8e800be70, brick_db_path = 0x7ff8cc00d2b0 "/rhs/brick7/gold_hot/.glusterfs/gold_hot.db", list = {
    next = 0x7ff8db43bc70, prev = 0x7ff8cc000d40}}

(gdb) p *((brick_list_t *)((char *)(bricklist_hot->next->next->next)-(unsigned long)(&((brick_list_t *)0)->list)))
$5 = {xlator = 0x7ff7a2de5700, brick_db_path = 0x7ff79fddf700 "", list = {next = 0x7ff8cc000d40, prev = 0x7ff8cc000db0}} =====> Invalid xlator


(gdb) p *((brick_list_t *)((char *)(bricklist_hot->next->next->next->next)-(unsigned long)(&((brick_list_t *)0)->list)))
$6 = {xlator = 0x7ff8e8009c70, brick_db_path = 0x7ff8cc00c260 "/rhs/brick6/gold_hot/.glusterfs/gold_hot.db", list = {
    next = 0x7ff8cc000db0, prev = 0x7ff8db43bc70}}  ==> First entry

Comment 3 Dan Lambright 2015-09-25 19:23:23 UTC

The extra entry to the list is actually the head of the list. The head of the list is an empty brick entry. Note when we traverse the list using list_for_each_entry, we start from head->next and go until we reach head.. so we skip this "extra" entry. As its build into the API this is normal I believe.

I am fairly confident this a duplicate of  BZ 1263200 . If we see this again we should reopen it, but that is the likeliest explanation.

*** This bug has been marked as a duplicate of bug 1263200 ***