1111124 – core: volume start fails during upgrade

Bug 1111124 - core: volume start fails during upgrade

Summary: core: volume start fails during upgrade

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1110224
TreeView+	depends on / blocked

Reported:	2014-06-19 10:16 UTC by Kaushal
Modified:	2014-11-11 08:35 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:	1110224
Environment:
Last Closed:	2014-11-11 08:35:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kaushal 2014-06-19 10:16:35 UTC

+++ This bug was initially created as a clone of Bug #1110224 +++

Description of problem:
I upgraded my cluster from glusterfs-3.6.0.15-1.el6rhs.x86_64 to glusterfs-3.6.0.18-1.el6rhs.x86_64.

Process followed was stop the volume, stop the glusterd on all nodes and yum update the glusterfs rpms.
Now, start the volume. It fails, throwing this error,
[root@nfs1 ~]# gluster volume start dist-rep
volume start: dist-rep: failed: Commit failed on 10.70.37.44. Please check log file for details.
Commit failed on 10.70.37.215. Please check log file for details.
Commit failed on 10.70.37.201. Please check log file for details.


gluster volume info on node1,
[root@nfs1 ~]# gluster volume info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 7ab235ad-a666-44b3-a46f-d3321f3eb4d6
Status: Stopped
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.62:/bricks/d1r1
Brick2: 10.70.37.215:/bricks/d1r2
Brick3: 10.70.37.44:/bricks/d2r1
Brick4: 10.70.37.201:/bricks/d2r2
Brick5: 10.70.37.62:/bricks/d3r1
Brick6: 10.70.37.215:/bricks/d3r2
Brick7: 10.70.37.44:/bricks/d4r1
Brick8: 10.70.37.201:/bricks/d4r2
Brick9: 10.70.37.62:/bricks/d5r1
Brick10: 10.70.37.215:/bricks/d5r2
Brick11: 10.70.37.44:/bricks/d6r1
Brick12: 10.70.37.201:/bricks/d6r2
Brick13: 10.70.37.62:/bricks/d1r1-add
Brick14: 10.70.37.215:/bricks/d1r2-add
Options Reconfigured:
nfs.addr-namelookup: on
nfs.rpc-auth-allow: *.lab.eng.blr.redhat.com
nfs.rpc-auth-reject: 10.70.35.33
nfs.export-dirs: on
nfs.export-dir: /1(rhsauto054.lab.eng.blr.redhat.com),/2(172.16.0.0/27)
features.quota: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

well, providing the providing the sequence of information that happened,

1. cluster had glusterfs-3.6.0.15-1.el6rhs.x86_64 on all nodes,
2. a 6x2 volume was there on this cluster
3. volume was nfs mounted and iozone was executed on the mount-point.
4. meanwhile the iozone was going I killed killed processes on node2 and node3
5. iozone finished successfully.
6. executed gluster volume start <vol-name> force to bring back all bricks.
   But, this failed, throwing this error
   "volume start: dist-rep: failed: Commit failed on localhost. Please check log file for details."
7.  executed gluster volume set all op-version 30000
8.  start force still failed.
9.  then as discussion with developer upgraded the rpm to the 18th version, now the normal gluster volume start also fails, as mentioned above.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.18-1.el6rhs.x86_64

How reproducible:
happening on this build

Actual results:
gluster volume start fails, as mentioned above

Expected results:
start should happen without issues.

Additional info:
glusterd.info from each node,
node1,
[root@nfs1 ~]# cat /var/lib/glusterd/glusterd.info 
UUID=bd23f0cb-d64a-4ddb-8543-6e1bbc812c7d
operating-version=30000

node2,
[root@nfs2 ~]# cat /var/lib/glusterd/glusterd.info
UUID=db4a5cde-f048-4796-84dd-19ba9ca98e6f
operating-version=30000

node3,
[root@nfs3 ~]# cat /var/lib/glusterd/glusterd.info
UUID=7f8f341e-4274-40f0-ae83-bde70365d2f4
operating-version=30000

node4,
[root@nfs4 ~]# cat /var/lib/glusterd/glusterd.info
UUID=9512d008-9dd8-4a5b-bf8c-983862a86c4a
operating-version=30000

Comment 1 Anand Avati 2014-06-19 10:32:21 UTC

REVIEW: http://review.gluster.org/8113 (glusterd: Check mount_dir for own bricks only during start) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2014-06-20 06:02:01 UTC

COMMIT: http://review.gluster.org/8113 committed in master by Krishnan Parthasarathi (kparthas) 
------
commit 3fe1a14a82f3894e6b9e9d3004a185c48ea4bf6b
Author: Kaushal M <kaushal>
Date:   Thu Jun 19 15:21:33 2014 +0530

    glusterd: Check mount_dir for own bricks only during start
    
    During the start volume commit op brickinfo->mount_dir was being checked
    for all bricks by glusterd. This could lead to failures starting the
    volumes which were carried forward on upgrade.
    
    Change-Id: If3d3ee4b2b9f68341ff4422dd90faf32bc3e898f
    BUG: 1111124
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/8113
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Rajesh Joseph <rjoseph>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 3 Niels de Vos 2014-09-22 12:43:17 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:35:28 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.