Bug 1294668

Summary:

GlusterFS fails to start after upgrade from 3.6.2

Product:

[Community] GlusterFS

Reporter:

Nikola Kotur <nikola.kotur>

Component:

core

Assignee:

Atin Mukherjee <amukherj>

Status:

CLOSED WORKSFORME

QA Contact:

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.7.6

CC:

amukherj, bugs, gluster-bugs, hgowtham, nikola.kotur

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-01-06 09:23:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
glusterd log	none

Description Nikola Kotur 2015-12-29 14:46:40 UTC

On Debian Wheezy. Tried to upgrade 3.6.2 -> 3.7.6.

# gluster volume info

Volume Name: shared
Type: Replicate
Volume ID: 6b0fa9ec-71dd-441c-9f99-4b0a9317e19d
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.0.5:/data/exports/shared
Brick2: 192.168.0.6:/data/exports/shared
Brick3: 192.168.0.7:/data/exports/shared
Options Reconfigured:
nfs.register-with-portmap: on
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 192.168.0.5,192.168.0.6,192.168.0.7
cluster.quorum-type: auto

On all 3 nodes:

* Stopped gluster
* Umounted volume

Then, on all 3 nodes:

* Upgraded glusterfs to 3.7.6

GlusterFS failed to start at all. This was repeating in the log on all 3 nodes:

[2015-12-29 14:14:30.481273] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 8, Invalid argument
[2015-12-29 14:14:30.481303] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481592] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2015-12-29 14:14:30.481613] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2015-12-29 14:14:30.481813] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.5:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.481956] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481589 (xid=0xd7)
[2015-12-29 14:14:30.482053] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481619 (xid=0xd8)
[2015-12-29 14:14:30.482067] W [socket.c:588:__socket_rwv] 0-management: readv on 192.168.0.6:24007 failed (Connection reset by peer)
[2015-12-29 14:14:30.482190] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-12-29 14:14:30.481837 (xid=0xd7)
[2015-12-29 14:14:30.482286] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1e7)[0x7f648ae07a57] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1be)[0x7f648abce1de] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f648abce2ee] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x88)[0x7f648abcfc78] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x1d0)[0x7f648abd02c0] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-12-29 14:14:30.481866 (xid=0xd8)
[2015-12-29 14:14:30.482306] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482405] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held
[2015-12-29 14:14:30.482089] W [rpc-clnt-ping.c:208:rpc_clnt_ping_cbk] 0-management: socket disconnected
[2015-12-29 14:14:30.482556] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60) [0x7f6485db82a0] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x382) [0x7f6485dc2772] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7f6485e5f10a] ) 0-management: Lock for vol shared not held

Solved by downgrading back to 3.6.2.

Is there anything I can do, or to stick to 3.6?

Comment 1 Atin Mukherjee 2015-12-29 14:59:25 UTC

Do you mean GlusterD failed to start? Looking at the log it doesn't seem like though. Do you see any error log related to "failed to initialize, please review volfile again", if not, then GlusterD should run.

A more description should help us in figuring it out. Can you attach both glusterd and glusterfs brick logs?

Comment 2 Nikola Kotur 2016-01-05 15:46:19 UTC

Created attachment 1111898 [details]
glusterd log

I stopped all Gluster daemons and removed /var/log/glusterfs directory completely to catch clean logs. Only glusterd was logging, and that is the file attached. There were no brick logs.

I managed to figure out how to avoid this issue, and in short, this is my upgrade path now which seems to work fine:

1) stop volume
2) stop glusterfs daemons
3) upgrade to 3.7
4) start glusterfs daemons
5) start volume

Please let me know if you need any other info.

Thank you.

Comment 3 Atin Mukherjee 2016-01-06 04:33:15 UTC

Since the issue is resolved mind closing this bug?

Comment 4 Nikola Kotur 2016-01-06 09:23:31 UTC

No, not at all. You just might want to consider updating documentation on upgrading 3.6 -> 3.7, but this is it.

Thank you for your time.