1289414 – [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion failed" seen in the system

Bug 1289414 - [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion failed" seen in the system

Summary: [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion fai...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	tiering
Sub Component:
Version:	3.7.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:	1288995
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-08 03:41 UTC by Nithya Balachandran
Modified:	2016-04-19 07:50 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.7
Clone Of:	1288995
Environment:
Last Closed:	2016-04-19 07:50:21 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nithya Balachandran 2015-12-08 03:41:18 UTC

+++ This bug was initially created as a clone of Bug #1288995 +++

+++ This bug was initially created as a clone of Bug #1288003 +++

Description of problem:

Tier daemon seems to have crashed in two of eight nodes and core files have been generated. 

Huge number of 'demotion failed' error messages are seen in few nodes. No demotion is seen although there are multiple files eligible for demotion crossing watermark levels.

Hot tier size --> 100Gb and watermarks are set so that low watermark is a 10Gb and high watermark is at 30Gb. Currently, size of hot tier has crossed 30Gb and no files are being demoted yet.

Single distributed dispersed volume with 12 bricks was configured to which a distributed replicated(4 bricks) hot tier was attached. Crash was seen few hours after configuring the volume and doing some IO.

please note that the gluster cluster is configured on rhel 6.7.

sosreport and core files shall be attached shortly.

[root@dhcp37-121 ~]# gluster vol info
 
Volume Name: tiering-test-vol-01
Type: Tier
Volume ID: 8afb30c0-bd3e-4248-b4ba-8a6bfe8d237e
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.37.111:/rhs/brick4/leg1
Brick2: 10.70.37.154:/rhs/brick4/leg1
Brick3: 10.70.37.121:/rhs/brick4/leg1
Brick4: 10.70.37.191:/rhs/brick4/leg1
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.37.191:/rhs/brick1/leg1
Brick6: 10.70.37.121:/rhs/brick1/leg1
Brick7: 10.70.37.154:/rhs/brick1/leg1
Brick8: 10.70.37.111:/rhs/brick1/leg1
Brick9: 10.70.37.140:/rhs/brick15/leg1
Brick10: 10.70.37.132:/rhs/brick15/leg1
Brick11: 10.70.37.180:/rhs/brick15/leg1
Brick12: 10.70.37.48:/rhs/brick15/leg1
Brick13: 10.70.37.191:/rhs/brick2/leg1
Brick14: 10.70.37.121:/rhs/brick2/leg1
Brick15: 10.70.37.154:/rhs/brick2/leg1
Brick16: 10.70.37.111:/rhs/brick2/leg1
Options Reconfigured:
cluster.tier-promote-frequency: 7200
cluster.tier-max-files: 1000
cluster.write-freq-threshold: 3
cluster.watermark-hi: 30
cluster.watermark-low: 10
cluster.tier-demote-frequency: 120
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on


Version-Release number of selected component (if applicable):


How reproducible:
No pattern yet found

Steps to Reproduce:
1. configure 8 node gluster cluster
2. configure 2 x (4 + 2) distributed dispersed volume
3. Attach a dist-repl hot tier
4. Made parameter changes w.r.t tiering to the vol. values are set as shown in the above output of vol info
4. Run IO

Actual results:
demotion failures seen along with crash

Expected results:
No crashes or promotion/demotion failure should be seen

Additional info:

<<<log snippet from a node where crash is seen>>>

[2015-12-03 00:01:25.433174] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-tiering-test-vol-01-client-9: server 10.70.37.121:49153 has not responded in the last 42 seconds, disconnecting.
[2015-12-03 00:01:25.433755] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fdf009026eb] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fdf006cd227] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdf006cd33e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fdf006cd40b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7fdf006cda42] ))))) 0-tiering-test-vol-01-client-9: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2015-12-03 00:00:01.130434 (xid=0xcc177)
[2015-12-03 00:01:25.433787] W [MSGID: 114031] [client-rpc-fops.c:2265:client3_3_ipc_cbk] 0-tiering-test-vol-01-client-9: remote operation failed [Transport endpoint is not connected]
[2015-12-03 00:01:25.433966] E [MSGID: 109107] [tier.c:838:tier_process_ctr_query] 0-tiering-test-vol-01-tier-dht: Failed query on /rhs/brick2/leg1/.glusterfs/leg1.db ret -107
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-12-03 00:01:25
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
[2015-12-03 00:01:25.434002] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fdf009026eb] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fdf006cd227] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdf006cd33e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fdf006cd40b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7fdf006cda42] ))))) 0-tiering-test-vol-01-client-9: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-12-03 00:00:01.132380 (xid=0xcc178)
[2015-12-03 00:01:25.435552] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-tiering-test-vol-01-client-9: remote operation failed. Path: <gfid:44203d75-0c8a-4062-a504-ff8a21ec47ef> (44203d75-0c8a-4062-a504-ff8a21ec47ef) [Transport endpoint is not connected]
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5
[2015-12-03 00:01:25.435741] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-tiering-test-vol-01-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=37, bad=8)
[2015-12-03 00:01:25.435762] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-tiering-test-vol-01-disperse-1: Heal failed [Invalid argument]
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7fdf008fe9e6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x7fdf0091e54f]
/lib64/libc.so.6(+0x3d4ac326a0)[0x7fdeff29c6a0]
/usr/lib64/libglusterfs.so.0(__gf_free+0xc5)[0x7fdf00934d75]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x5b267)[0x7fdef2887267]
/usr/lib64/glusterfs/3.7.5/xlator/cluster/tier.so(+0x5ca70)[0x7fdef2888a70]
/lib64/libpthread.so.0(+0x3d4b007a51)[0x7fdeff9e8a51]
/lib64/libc.so.6(clone+0x6d)[0x7fdeff35293d]
---------


<<<log snippet from second node where crash is seen>>>

[2015-12-02 13:52:50.473989] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
The message "I [MSGID: 109103] [dht-shared.c:469:dht_reconfigure] 0-DHT: conf->dthrottle: normal, conf->defrag->recon_thread_count: 2" repeated 5 times between [2015-12-02 13:52:30.764481] and [2015-12-02 13:52:50.473688]
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 
2015-12-03 03:35:12
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.5
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f6d288509e6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x7f6d2887054f]
/lib64/libc.so.6(+0x33ed4326a0)[0x7f6d271ee6a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f6d271ee625]
/lib64/libc.so.6(abort+0x175)[0x7f6d271efe05]
/lib64/libc.so.6(+0x33ed470537)[0x7f6d2722c537]
/lib64/libc.so.6(+0x33ed475f4e)[0x7f6d27231f4e]
/lib64/libc.so.6(+0x33ed476353)[0x7f6d27232353]
/lib64/libc.so.6(+0x33ed479c28)[0x7f6d27235c28]
/lib64/libc.so.6(__libc_malloc+0x5c)[0x7f6d27236b1c]
/usr/lib64/libglusterfs.so.0(glusterfs_lkowner_buf_get+0x22)[0x7f6d2888b7b2]
/usr/lib64/libglusterfs.so.0(lkowner_utoa+0x18)[0x7f6d28870188]
/usr/lib64/libglusterfs.so.0(gf_proc_dump_call_stack+0x1d2)[0x7f6d2888ece2]
/usr/lib64/libglusterfs.so.0(gf_proc_dump_pending_frames+0xde)[0x7f6d2888f04e]
/usr/lib64/libglusterfs.so.0(gf_proc_dump_info+0xccb)[0x7f6d2888dafb]
/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd)[0x7f6d28d1d10d]
/lib64/libpthread.so.0(+0x33ed807a51)[0x7f6d2793aa51]
/lib64/libc.so.6(clone+0x6d)[0x7f6d272a493d]
---------



--- Additional comment from krishnaram Karthick on 2015-12-04 04:22:03 EST ---

Pasting BT for both the cores


<<<<<<BT from dhcp37-121.core.5159>>>>>>

(gdb) bt
#0  0x00007fdf00934d75 in __gf_free (free_ptr=0x7fdebc001970) at mem-pool.c:313
#1  0x00007fdef2887267 in tier_process_ctr_query (args=0x7fdee9fa3c40, query_cbk_args=<value optimized out>, is_promotion=<value optimized out>) at tier.c:878
#2  tier_process_brick (args=0x7fdee9fa3c40, query_cbk_args=<value optimized out>, is_promotion=<value optimized out>) at tier.c:967
#3  tier_build_migration_qfile (args=0x7fdee9fa3c40, query_cbk_args=<value optimized out>, is_promotion=<value optimized out>) at tier.c:1043
#4  0x00007fdef2888a70 in tier_promote (args=0x7fdee9fa3c40) at tier.c:1143
#5  0x00007fdeff9e8a51 in start_thread () from /lib64/libpthread.so.0
#6  0x00007fdeff35293d in clone () from /lib64/libc.so.6

<<<<<<<BT from dhcp37-111.core.5424>>>>>>

#0  0x00007f6d271ee625 in raise () from /lib64/libc.so.6
#1  0x00007f6d271efe05 in abort () from /lib64/libc.so.6
#2  0x00007f6d2722c537 in __libc_message () from /lib64/libc.so.6
#3  0x00007f6d27231f4e in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f6d27232353 in malloc_consolidate () from /lib64/libc.so.6
#5  0x00007f6d27235c28 in _int_malloc () from /lib64/libc.so.6
#6  0x00007f6d27236b1c in malloc () from /lib64/libc.so.6
#7  0x00007f6d2888b7b2 in __gf_default_malloc () at mem-pool.h:106
#8  glusterfs_lkowner_buf_get () at globals.c:329
#9  0x00007f6d28870188 in lkowner_utoa (lkowner=0x7f6d2625f970) at common-utils.c:2407
#10 0x00007f6d2888ece2 in gf_proc_dump_call_stack (call_stack=0x7f6d2625f718, key_buf=<value optimized out>) at stack.c:167
#11 0x00007f6d2888f04e in gf_proc_dump_pending_frames (call_pool=0x7f6d299727a0) at stack.c:210
#12 0x00007f6d2888dafb in gf_proc_dump_info (signum=<value optimized out>, ctx=0x7f6d29950010) at statedump.c:825
#13 0x00007f6d28d1d10d in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:2020
#14 0x00007f6d2793aa51 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f6d272a493d in clone () from /lib64/libc.so.6

--- Additional comment from Vijay Bellur on 2015-12-07 03:08:08 EST ---

REVIEW: http://review.gluster.org/12890 (cluster/tier : Fix double free in tier process) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Nithya Balachandran on 2015-12-07 03:14:42 EST ---

Analysis of dhcp37-121.core.5159:


From the logs:
[2015-12-03 00:01:25.433174] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-tiering-test-vol-01-client-9: server 10.70.37.121:49153 has not responded in the last 42 seconds, disconnecting.
[2015-12-03 00:01:25.433755] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fdf009026eb] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fdf006cd227] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdf006cd33e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fdf006cd40b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7fdf006cda42] ))))) 0-tiering-test-vol-01-client-9: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2015-12-03 00:00:01.130434 (xid=0xcc177)
[2015-12-03 00:01:25.433787] W [MSGID: 114031] [client-rpc-fops.c:2265:client3_3_ipc_cbk] 0-tiering-test-vol-01-client-9: remote operation failed [Transport endpoint is not connected]
[2015-12-03 00:01:25.433966] E [MSGID: 109107] [tier.c:838:tier_process_ctr_query] 0-tiering-test-vol-01-tier-dht: Failed query on /rhs/brick2/leg1/.glusterfs/leg1.db ret -107
pending frames:



This means that the syncop_ipc call in tier_process_ctr_query failed.



        ret = dict_set_bin (ctr_ipc_in_dict, GFDB_IPC_CTR_GET_QUERY_PARAMS,
                                ipc_ctr_params, sizeof (*ipc_ctr_params));
        if (ret) {
                gf_msg (this->name, GF_LOG_ERROR, 0, LG_MSG_SET_PARAM_FAILED,
                        "Failed setting %s to params dictionary",
                        GFDB_IPC_CTR_GET_QUERYsyncop_ipc_PARAMS);
                goto out;
        }

        ret = syncop_ipc (local_brick->xlator, GF_IPC_TARGET_CTR,
                                ctr_ipc_in_dict, &ctr_ipc_out_dict);
        if (ret) {
                gf_msg (this->name, GF_LOG_ERROR, 0,
                        DHT_MSG_LOG_IPC_TIER_ERROR, "Failed query on %s ret %d",
                        local_brick->brick_db_path, ret);
                goto out;
        }

Since the call to syncop_ipc() failed, ctr_ipc_out_dict is NULL. On goto out:


out:

        if (ctr_ipc_in_dict) {
                dict_unref(ctr_ipc_in_dict); <-- this will free ipc_ctr_params
                ctr_ipc_in_dict = NULL;
        }

        if (ctr_ipc_out_dict) {  
                dict_unref(ctr_ipc_out_dict);
                ctr_ipc_out_dict = NULL;
                ipc_ctr_params = NULL; <-- this is not set to NULL
        }

        GF_FREE (ipc_ctr_params);  <--double free

        return ret;
}


The dict_unref(ctr_ipc_in_dict) will call GF_FREE on ipc_ctr_params as part of dict_destroy()->data_unref() as data->is_static is false.

As the memory has already been freed, the second call to GF_FREE (ipc_ctr_params) will crash.

--- Additional comment from Vijay Bellur on 2015-12-07 09:36:04 EST ---

COMMIT: http://review.gluster.org/12890 committed in master by Dan Lambright (dlambrig) 
------
commit 06818a0fd69bb0d6daabde73e5c3cc2661a70854
Author: N Balachandran <nbalacha>
Date:   Mon Dec 7 13:32:57 2015 +0530

    cluster/tier : Fix double free in tier process
    
    The tier process tries to free ipc_ctr_params twice
    if the syncop_ipc call in tier_process_ctr_query fails.
    ipc_ctr_params is freed when ctr_ipc_in_dict is freed.
    But ctr_ipc_out_dict is NULL when syncop_ipc fails, causing
    GF_FREE to be called on a non-NULL ipc_ctr_params ptr again.
    
    Change-Id: Ia15f36dfbcd97be5524588beb7caad5cb79efdb4
    BUG: 1288995
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/12890
    Reviewed-by: Joseph Fernandes
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Dan Lambright <dlambrig>

Comment 1 Vijay Bellur 2015-12-08 03:58:13 UTC

REVIEW: http://review.gluster.org/12904 (cluster/tier : Fix double free in tier process) posted (#1) for review on release-3.7 by N Balachandran (nbalacha)

Comment 2 Vijay Bellur 2015-12-08 18:28:05 UTC

COMMIT: http://review.gluster.org/12904 committed in release-3.7 by Dan Lambright (dlambrig) 
------
commit 88cfaa33b3c9949936490fc0da5add177b4979ad
Author: N Balachandran <nbalacha>
Date:   Mon Dec 7 13:32:57 2015 +0530

    cluster/tier : Fix double free in tier process
    
    The tier process tries to free ipc_ctr_params twice
    if the syncop_ipc call in tier_process_ctr_query fails.
    ipc_ctr_params is freed when ctr_ipc_in_dict is freed.
    But ctr_ipc_out_dict is NULL when syncop_ipc fails, causing
    GF_FREE to be called on a non-NULL ipc_ctr_params ptr again.
    
    > Change-Id: Ia15f36dfbcd97be5524588beb7caad5cb79efdb4
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: http://review.gluster.org/12890
    > Reviewed-by: Joseph Fernandes
    > Tested-by: NetBSD Build System <jenkins.org>
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Dan Lambright <dlambrig>
    > (cherry picked from commit 06818a0fd69bb0d6daabde73e5c3cc2661a70854)
    
    Change-Id: Ida2da0416272ff1b04cf51f76467e27b62121f41
    BUG: 1289414
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/12904
    Reviewed-by: Joseph Fernandes
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Dan Lambright <dlambrig>
    Tested-by: Dan Lambright <dlambrig>

Comment 3 Kaushal 2016-04-19 07:50:21 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.