Bug 1399034 - Timer errors getting in the glusterd log when updated the rhgs3.1.3 to 3.2.
Summary: Timer errors getting in the glusterd log when updated the rhgs3.1.3 to 3.2.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Samikshan Bairagya
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-28 06:36 UTC by Byreddy
Modified: 2019-08-28 08:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-21 05:56:13 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Byreddy 2016-11-28 06:36:06 UTC
Description of problem:
======================
Getting the below  timer error messages when updated the RHGS3.1.3 nodes to 3.2 using in-service steps.


I am seeing these error messages when i do update from 3.1.3 to 3.2 only.

5] -->glusterd(cleanup_and_exit+0x6b) [0x7fc738835abb] ) 0-: received signum (15), shutting down
[2016-11-28 06:11:39.133153] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-11-28 06:11:39.133693] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5
d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.133788] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [
0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg
[2016-11-28 06:11:39.134861] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5
d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.134958] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg
[2016-11-28 06:11:39.135869] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.135963] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5

How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have two rhgs3.1.3 nodes and one client
2. Fuse mount the volume and start the IO
3. Start the in-service update using the steps documented in the guide
4. Check the glusterd logs after updating each node.

You will see above timer error messages

Actual results:
===============
Timer error messages getting when updated the rhgs version from 3.1.3 to 3.2


Expected results:
=================
No timer error messages should come.


Additional info:
================
I am hitting this issue when ever i do update testing.

Comment 2 Byreddy 2016-11-28 06:39:26 UTC
Full glusterd log:
=================

[2016-11-28 06:11:38.352944] I [MSGID: 100030] [glusterfsd.c:2412:main] 0-glusterd: Started running glusterd version 3.8.4 (args: glusterd --xlator-option *.upgrade=on -N)
[2016-11-28 06:11:38.361307] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on'
[2016-11-28 06:11:38.361488] I [MSGID: 106478] [glusterd.c:1380:init] 0-management: Maximum allowed open file descriptors set to 65536
[2016-11-28 06:11:38.361527] I [MSGID: 106479] [glusterd.c:1429:init] 0-management: Using /var/lib/glusterd as working directory
[2016-11-28 06:11:38.374999] E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.8.4/rpc-transport/rdma.so: cannot open shared object file: No
 such file or directory
[2016-11-28 06:11:38.375040] W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this ma
chine
[2016-11-28 06:11:38.375060] W [rpcsvc.c:1617:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed
[2016-11-28 06:11:38.375081] E [MSGID: 106243] [glusterd.c:1653:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport
[2016-11-28 06:11:38.377503] I [MSGID: 106513] [glusterd-store.c:2101:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712
[2016-11-28 06:11:38.784792] I [MSGID: 106498] [glusterd-handler.c:3660:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2016-11-28 06:11:38.784929] I [rpc-clnt.c:1033:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-11-28 06:11:39.083683] I [MSGID: 106544] [glusterd.c:156:glusterd_uuid_init] 0-management: retrieved UUID: d2e81697-b43e-4988-9603-967788a3b932
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.socket.listen-backlog 128
  8:     option upgrade on
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.read-fail-log off
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-time 10
 14:     option transport-type rdma
 15:     option working-directory /var/lib/glusterd
 16: end-volume
 17:  
+------------------------------------------------------------------------------+

[2016-11-28 06:11:39.132473] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fc7371a1dc5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x7fc738835c4
5] -->glusterd(cleanup_and_exit+0x6b) [0x7fc738835abb] ) 0-: received signum (15), shutting down
[2016-11-28 06:11:39.133153] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-11-28 06:11:39.133693] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5
d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.133788] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [
0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg
[2016-11-28 06:11:39.134861] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5
d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.134958] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg
[2016-11-28 06:11:39.135869] I [timer.c:212:gf_timer_registry_init] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x215) [0x7fc73834df95] ) 0-timer: ctx cleanup started
[2016-11-28 06:11:39.135963] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg
[2016-11-28 06:11:39.136488] I [MSGID: 106163] [glusterd-handshake.c:1274:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30712
[2016-11-28 06:07:18.816149] I [MSGID: 100030] [glusterfsd.c:2412:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.8.4 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

Comment 3 Atin Mukherjee 2016-11-28 07:05:06 UTC
[2016-11-28 06:11:39.133788] E [timer.c:44:gf_timer_call_after] (-->/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3c0) [0x7fc73810ae50] -->/lib64/libgfrpc.so.0(__save_frame+0x5d) [
0x7fc73810968d] -->/lib64/libglusterfs.so.0(gf_timer_call_after+0x25b) [0x7fc73834dfdb] ) 0-timer: !reg

This error log is due to the interim restarts of GlusterD during the upgrade path. When the gf_timer_call_after () was in execution,  ctx->cleanup_started was set to true from the clean up thread which resulted in these error logs. This looks to be a race to me however Byreddy claims that he is only seeing it during upgrade from 3.1.3 to 3.2 which we'd need to find out the reason for.

Given there is no functionality impact here  and after discussing it with Byreddy I am taking this bug out of 3.2.0.


Note You need to log in before you can comment on or make changes to this bug.