Bug 1084432 - Service fails to restart after 3.4.3 update
Summary: Service fails to restart after 3.4.3 update
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.4.3
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-04 10:36 UTC by Adam Huffman
Modified: 2015-08-11 13:09 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-08-11 13:09:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Adam Huffman 2014-04-04 10:36:45 UTC
Description of problem:

After the 3.4.3 update this week, the Gluster service failed to restart. I had to start it manually.

Version-Release number of selected component (if applicable):
glusterfs-3.4.3-2.el6.x86_64

How reproducible:
This happened with the upgrade from 3.4.1 to 3.4.2 as well.

Steps to Reproduce:
1.
2.
3.

Actual results:
Service stopped after the upgrade.

Expected results:
Service should restart cleanly after the upgrade.

Additional info:

Logs from one of the two peers:

[2014-04-04 03:01:34.200464] I [glusterfsd.c:1910:main] 0-glusterd: Started running glusterd version 3.4.3 (glusterd --xlator-option *.upgrade=on -N)
[2014-04-04 03:01:34.128382] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3465ce8b6d] (-->/lib64/libpthread.so.0() [0x34668079d1] (-->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received s
ignum (15), shutting down
[2014-04-04 03:01:34.276339] I [graph.c:239:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on'
[2014-04-04 03:01:34.276405] I [glusterd.c:961:init] 0-management: Using /var/lib/glusterd as working directory
[2014-04-04 03:01:34.279097] I [socket.c:3480:socket_init] 0-socket.management: SSL support is NOT enabled
[2014-04-04 03:01:34.279119] I [socket.c:3495:socket_init] 0-socket.management: using system polling thread
[2014-04-04 03:01:34.284855] E [rpc-transport.c:253:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2014-04-04 03:01:34.284881] W [rpc-transport.c:257:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2014-04-04 03:01:34.284897] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2014-04-04 03:01:34.305013] I [glusterd.c:354:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system
[2014-04-04 03:01:34.306322] I [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2
[2014-04-04 03:01:34.306866] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
[2014-04-04 03:01:34.306887] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
[2014-04-04 03:01:34.306897] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2
[2014-04-04 03:01:34.306907] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3
[2014-04-04 03:01:34.306916] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4
[2014-04-04 03:01:34.306925] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5
[2014-04-04 03:01:34.800739] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
[2014-04-04 03:01:34.800778] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
[2014-04-04 03:01:34.812699] I [glusterd-handler.c:2818:glusterd_friend_add] 0-management: connect returned 0
[2014-04-04 03:01:34.812787] I [rpc-clnt.c:962:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2014-04-04 03:01:34.812857] I [socket.c:3480:socket_init] 0-management: SSL support is NOT enabled
[2014-04-04 03:01:34.812871] I [socket.c:3495:socket_init] 0-management: using system polling thread
[2014-04-04 03:01:34.855467] I [glusterd.c:125:glusterd_uuid_init] 0-management: retrieved UUID: 74a5cdfc-9ae1-4e56-b26c-58a55ec79e0c
Given volfile:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option working-directory /var/lib/glusterd
  4:     option transport-type socket,rdma
  5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7:     option transport.socket.read-fail-log off
  8: #   option base-port 49152
  9: end-volume

+------------------------------------------------------------------------------+
[2014-04-04 03:01:34.899760] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3465ce8b6d] (-->/lib64/libpthread.so.0() [0x34668079d1] (-->glusterd(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15)
, shutting down

Logs from the other peer:

[2014-04-04 03:01:34.227678] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:34.245378] W [socket.c:1962:__socket_proto_state_machine] 0-management: reading from socket failed. Error (No data available), peer (<peer IP:24007)
[2014-04-04 03:01:36.229577] E [socket.c:2157:socket_connect_finish] 0-management: connection to <peer IP>:24007 failed (Connection refused)
[2014-04-04 03:01:36.229645] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:39.237886] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:42.245867] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:45.253866] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:48.260865] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:51.268866] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:54.275863] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:01:57.293863] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:00.301865] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:03.309865] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:06.315745] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:09.323865] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:12.333009] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:15.341909] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:18.347391] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:21.354829] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:24.362682] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:27.370725] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:30.376402] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:33.383732] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:36.391906] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:39.398903] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:42.406912] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:45.414910] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:48.422912] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:51.430908] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:54.437852] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:02:57.445859] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:03:00.469856] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:03:03.482854] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
...
[2014-04-04 03:24:16.200230] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:24:19.207768] W [socket.c:514:__socket_rwv] 0-management: readv failed (No data available)
[2014-04-04 03:24:21.842249] I [glusterfsd.c:1910:main] 0-glusterd: Started running glusterd version 3.4.3 (glusterd --xlator-option *.upgrade=on -N)
[2014-04-04 03:24:21.880991] I [graph.c:239:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on'
[2014-04-04 03:24:21.881067] I [glusterd.c:961:init] 0-management: Using /var/lib/glusterd as working directory
[2014-04-04 03:24:21.883998] I [socket.c:3480:socket_init] 0-socket.management: SSL support is NOT enabled
[2014-04-04 03:24:21.884021] I [socket.c:3495:socket_init] 0-socket.management: using system polling thread
[2014-04-04 03:24:21.889760] E [socket.c:695:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
[2014-04-04 03:24:21.889787] E [socket.c:698:__socket_server_bind] 0-socket.management: Port is already in use
[2014-04-04 03:24:21.889809] W [rpcsvc.c:1396:rpcsvc_transport_create] 0-rpc-service: listening on transport failed
[2014-04-04 03:24:21.889837] E [glusterd.c:1055:init] 0-management: creation of listener failed
[2014-04-04 03:24:21.889850] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-04-04 03:24:21.889862] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-04-04 03:24:21.889875] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2014-04-04 03:24:21.890072] W [glusterfsd.c:1002:cleanup_and_exit] (-->glusterd(main+0x5d2) [0x406802] (-->glusterd(glusterfs_volumes_init+0xb7) [0x4051b7] (-->glusterd(glusterfs_process_volfp+0x103) [0x4050c3]))) 0-: received signum (0), shutting down
[2014-04-04 03:24:21.902895] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3484ce8b6d] (-->/lib64/libpthread.so.0() [0x34854079d1] (-->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down

Comment 1 Joe Julian 2014-04-04 21:14:02 UTC
"glusterd --xlator-option *.upgrade=on -N" in %postinstall for 3.4.3 exits when it's complete (I think that's new?) so the upgrade will leave glusterd not running.

Perhaps we should test the exit status (if it's at all valuable) and run glusterd upon a successful upgrade.

Comment 2 Adam Huffman 2014-04-15 10:38:41 UTC
Just to let you know this happened again with the 3.4.3-3 update overnight. Both peers were left with the gluster daemon stopped.

Comment 3 Niels de Vos 2015-05-17 21:58:16 UTC
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.


Note You need to log in before you can comment on or make changes to this bug.