Bug 1288995
| Summary: | [tiering]: Tier daemon crashed on two of eight nodes and lot of "demotion failed" seen in the system | |||
|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Nithya Balachandran <nbalacha> | |
| Component: | tiering | Assignee: | Nithya Balachandran <nbalacha> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | bugs <bugs> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | mainline | CC: | bugs, dlambrig, kramdoss, nchilaka, rhs-bugs | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1288003 | |||
| : | 1289414 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-16 13:49:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1289414 | |||
|
Description
Nithya Balachandran
2015-12-07 08:00:38 UTC
REVIEW: http://review.gluster.org/12890 (cluster/tier : Fix double free in tier process) posted (#1) for review on master by N Balachandran (nbalacha) Analysis of dhcp37-121.core.5159:
From the logs:
[2015-12-03 00:01:25.433174] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-tiering-test-vol-01-client-9: server 10.70.37.121:49153 has not responded in the last 42 seconds, disconnecting.
[2015-12-03 00:01:25.433755] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fdf009026eb] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fdf006cd227] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdf006cd33e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fdf006cd40b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1c2)[0x7fdf006cda42] ))))) 0-tiering-test-vol-01-client-9: forced unwinding frame type(GlusterFS 3.3) op(IPC(47)) called at 2015-12-03 00:00:01.130434 (xid=0xcc177)
[2015-12-03 00:01:25.433787] W [MSGID: 114031] [client-rpc-fops.c:2265:client3_3_ipc_cbk] 0-tiering-test-vol-01-client-9: remote operation failed [Transport endpoint is not connected]
[2015-12-03 00:01:25.433966] E [MSGID: 109107] [tier.c:838:tier_process_ctr_query] 0-tiering-test-vol-01-tier-dht: Failed query on /rhs/brick2/leg1/.glusterfs/leg1.db ret -107
pending frames:
This means that the syncop_ipc call in tier_process_ctr_query failed.
ret = dict_set_bin (ctr_ipc_in_dict, GFDB_IPC_CTR_GET_QUERY_PARAMS,
ipc_ctr_params, sizeof (*ipc_ctr_params));
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0, LG_MSG_SET_PARAM_FAILED,
"Failed setting %s to params dictionary",
GFDB_IPC_CTR_GET_QUERYsyncop_ipc_PARAMS);
goto out;
}
ret = syncop_ipc (local_brick->xlator, GF_IPC_TARGET_CTR,
ctr_ipc_in_dict, &ctr_ipc_out_dict);
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0,
DHT_MSG_LOG_IPC_TIER_ERROR, "Failed query on %s ret %d",
local_brick->brick_db_path, ret);
goto out;
}
Since the call to syncop_ipc() failed, ctr_ipc_out_dict is NULL. On goto out:
out:
if (ctr_ipc_in_dict) {
dict_unref(ctr_ipc_in_dict); <-- this will free ipc_ctr_params
ctr_ipc_in_dict = NULL;
}
if (ctr_ipc_out_dict) {
dict_unref(ctr_ipc_out_dict);
ctr_ipc_out_dict = NULL;
ipc_ctr_params = NULL; <-- this is not set to NULL
}
GF_FREE (ipc_ctr_params); <--double free
return ret;
}
The dict_unref(ctr_ipc_in_dict) will call GF_FREE on ipc_ctr_params as part of dict_destroy()->data_unref() as data->is_static is false.
As the memory has already been freed, the second call to GF_FREE (ipc_ctr_params) will crash.
COMMIT: http://review.gluster.org/12890 committed in master by Dan Lambright (dlambrig) ------ commit 06818a0fd69bb0d6daabde73e5c3cc2661a70854 Author: N Balachandran <nbalacha> Date: Mon Dec 7 13:32:57 2015 +0530 cluster/tier : Fix double free in tier process The tier process tries to free ipc_ctr_params twice if the syncop_ipc call in tier_process_ctr_query fails. ipc_ctr_params is freed when ctr_ipc_in_dict is freed. But ctr_ipc_out_dict is NULL when syncop_ipc fails, causing GF_FREE to be called on a non-NULL ipc_ctr_params ptr again. Change-Id: Ia15f36dfbcd97be5524588beb7caad5cb79efdb4 BUG: 1288995 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: http://review.gluster.org/12890 Reviewed-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |