1294668 – GlusterFS fails to start after upgrade from 3.6.2

Bug 1294668 - GlusterFS fails to start after upgrade from 3.6.2

Summary: GlusterFS fails to start after upgrade from 3.6.2

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.7.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-12-29 14:46 UTC by Nikola Kotur
Modified:	2016-01-06 09:23 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-01-06 09:23:31 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterd log (15.21 KB, text/plain) 2016-01-05 15:46 UTC, Nikola Kotur	no flags	Details
View All

Description Nikola Kotur 2015-12-29 14:46:40 UTC

On Debian Wheezy. Tried to upgrade 3.6.2 -> 3.7.6.

# gluster volume info

Volume Name: shared
Type: Replicate
Volume ID: 6b0fa9ec-71dd-441c-9f99-4b0a9317e19d
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.0.5:/data/exports/shared
Brick2: 192.168.0.6:/data/exports/shared
Brick3: 192.168.0.7:/data/exports/shared
Options Reconfigured:
nfs.register-with-portmap: on
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 192.168.0.5,192.168.0.6,192.168.0.7
cluster.quorum-type: auto

On all 3 nodes:

* Stopped gluster
* Umounted volume

Then, on all 3 nodes:

* Upgraded glusterfs to 3.7.6

GlusterFS failed to start at all. This was repeating in the log on all 3 nodes:

[2015-12-29 14:14:30.481273] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2015-12-29 14:14:30.481303] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481592] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2015-12-29 14:14:30.481613] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481813] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.5:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.481956] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481589 (xid=0xd7)
[2015-12-29 14:14:30.482053] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481619 (xid=0xd8)
[2015-12-29 14:14:30.482067] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.482190] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481837 (xid=0xd7)
[2015-12-29 14:14:30.482286] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481866 (xid=0xd8)
[2015-12-29 14:14:30.482306] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482405] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held
[2015-12-29 14:14:30.482089] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482556] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held

Solved by downgrading back to 3.6.2.

Is there anything I can do, or to stick to 3.6?

Comment 1 Atin Mukherjee 2015-12-29 14:59:25 UTC

Do you mean GlusterD failed to start? Looking at the log it doesn't seem like though. Do you see any error log related to "failed to initialize, please review volfile again", if not, then GlusterD should run.

A more description should help us in figuring it out. Can you attach both glusterd and glusterfs brick logs?

Comment 2 Nikola Kotur 2016-01-05 15:46:19 UTC

Created attachment 1111898 [details]
glusterd log

I stopped all Gluster daemons and removed /var/log/glusterfs directory completely to catch clean logs. Only glusterd was logging, and that is the file attached. There were no brick logs.

I managed to figure out how to avoid this issue, and in short, this is my upgrade path now which seems to work fine:

1) stop volume
2) stop glusterfs daemons
3) upgrade to 3.7
4) start glusterfs daemons
5) start volume

Please let me know if you need any other info.

Thank you.

Comment 3 Atin Mukherjee 2016-01-06 04:33:15 UTC

Since the issue is resolved mind closing this bug?

Comment 4 Nikola Kotur 2016-01-06 09:23:31 UTC

No, not at all. You just might want to consider updating documentation on upgrading 3.6 -> 3.7, but this is it.

Thank you for your time.

Note You need to log in before you can comment on or make changes to this bug.