Bug 1294668

Summary: GlusterFS fails to start after upgrade from 3.6.2
Product: [Community] GlusterFS Reporter: Nikola Kotur <nikola.kotur>
Component: coreAssignee: Atin Mukherjee <amukherj>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.7.6CC: amukherj, bugs, gluster-bugs, hgowtham, nikola.kotur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-06 09:23:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterd log none

Description Nikola Kotur 2015-12-29 14:46:40 UTC
On Debian Wheezy. Tried to upgrade 3.6.2 -> 3.7.6.

# gluster volume info

Volume Name: shared
Type: Replicate
Volume ID: 6b0fa9ec-71dd-441c-9f99-4b0a9317e19d
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.0.5:/data/exports/shared
Brick2: 192.168.0.6:/data/exports/shared
Brick3: 192.168.0.7:/data/exports/shared
Options Reconfigured:
nfs.register-with-portmap: on
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 192.168.0.5,192.168.0.6,192.168.0.7
cluster.quorum-type: auto

On all 3 nodes:

* Stopped gluster
* Umounted volume

Then, on all 3 nodes:

* Upgraded glusterfs to 3.7.6

GlusterFS failed to start at all. This was repeating in the log on all 3 nodes:

[2015-12-29 14:14:30.481273] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2015-12-29 14:14:30.481303] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481592] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2015-12-29 14:14:30.481613] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481813] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.5:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.481956] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481589 (xid=0xd7)
[2015-12-29 14:14:30.482053] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481619 (xid=0xd8)
[2015-12-29 14:14:30.482067] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.482190] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481837 (xid=0xd7)
[2015-12-29 14:14:30.482286] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481866 (xid=0xd8)
[2015-12-29 14:14:30.482306] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482405] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held
[2015-12-29 14:14:30.482089] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482556] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held

Solved by downgrading back to 3.6.2.

Is there anything I can do, or to stick to 3.6?

Comment 1 Atin Mukherjee 2015-12-29 14:59:25 UTC
Do you mean GlusterD failed to start? Looking at the log it doesn't seem like though. Do you see any error log related to "failed to initialize, please review volfile again", if not, then GlusterD should run.

A more description should help us in figuring it out. Can you attach both glusterd and glusterfs brick logs?

Comment 2 Nikola Kotur 2016-01-05 15:46:19 UTC
Created attachment 1111898 [details]
glusterd log

I stopped all Gluster daemons and removed /var/log/glusterfs directory completely to catch clean logs. Only glusterd was logging, and that is the file attached. There were no brick logs.

I managed to figure out how to avoid this issue, and in short, this is my upgrade path now which seems to work fine:

1) stop volume
2) stop glusterfs daemons
3) upgrade to 3.7
4) start glusterfs daemons
5) start volume

Please let me know if you need any other info.

Thank you.

Comment 3 Atin Mukherjee 2016-01-06 04:33:15 UTC
Since the issue is resolved mind closing this bug?

Comment 4 Nikola Kotur 2016-01-06 09:23:31 UTC
No, not at all. You just might want to consider updating documentation on upgrading 3.6 -> 3.7, but this is it.

Thank you for your time.