On Debian Wheezy. Tried to upgrade 3.6.2 -> 3.7.6. # gluster volume info Volume Name: shared Type: Replicate Volume ID: 6b0fa9ec-71dd-441c-9f99-4b0a9317e19d Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.0.5:/data/exports/shared Brick2: 192.168.0.6:/data/exports/shared Brick3: 192.168.0.7:/data/exports/shared Options Reconfigured: nfs.register-with-portmap: on nfs.addr-namelookup: off nfs.rpc-auth-allow: 192.168.0.5,192.168.0.6,192.168.0.7 cluster.quorum-type: auto On all 3 nodes: * Stopped gluster * Umounted volume Then, on all 3 nodes: * Upgraded glusterfs to 3.7.6 GlusterFS failed to start at all. This was repeating in the log on all 3 nodes: [2015-12-29 14:14:30.481273] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument [2015-12-29 14:14:30.481303] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-12-29 14:14:30.481592] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument [2015-12-29 14:14:30.481613] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2015-12-29 14:14:30.481813] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.5:24007 failed (Connection reset by peer) [2015-12-29 14:14:30.481956] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481589 (xid=0xd7) [2015-12-29 14:14:30.482053] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481619 (xid=0xd8) [2015-12-29 14:14:30.482067] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (Connection reset by peer) [2015-12-29 14:14:30.482190] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481837 (xid=0xd7) [2015-12-29 14:14:30.482286] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481866 (xid=0xd8) [2015-12-29 14:14:30.482306] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected [2015-12-29 14:14:30.482405] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held [2015-12-29 14:14:30.482089] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected [2015-12-29 14:14:30.482556] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held Solved by downgrading back to 3.6.2. Is there anything I can do, or to stick to 3.6?
Do you mean GlusterD failed to start? Looking at the log it doesn't seem like though. Do you see any error log related to "failed to initialize, please review volfile again", if not, then GlusterD should run. A more description should help us in figuring it out. Can you attach both glusterd and glusterfs brick logs?
Created attachment 1111898 [details] glusterd log I stopped all Gluster daemons and removed /var/log/glusterfs directory completely to catch clean logs. Only glusterd was logging, and that is the file attached. There were no brick logs. I managed to figure out how to avoid this issue, and in short, this is my upgrade path now which seems to work fine: 1) stop volume 2) stop glusterfs daemons 3) upgrade to 3.7 4) start glusterfs daemons 5) start volume Please let me know if you need any other info. Thank you.
Since the issue is resolved mind closing this bug?
No, not at all. You just might want to consider updating documentation on upgrading 3.6 -> 3.7, but this is it. Thank you for your time.