1085202 – [SNAPSHOT]: While rebalance is in progress as part of remove-brick the snapshot creation fails with prevalidation

Bug 1085202 - [SNAPSHOT]: While rebalance is in progress as part of remove-brick the snapshot creation fails with prevalidation

Summary: [SNAPSHOT]: While rebalance is in progress as part of remove-brick the snapsh...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Joseph Elwin Fernandes
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:	1101993 1202842 1223636
TreeView+	depends on / blocked

Reported:	2014-04-08 05:39 UTC by Rahul Hinduja
Modified:	2016-09-17 12:57 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.0-3.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1101993 (view as bug list)
Environment:
Last Closed:	2015-07-29 04:31:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Rahul Hinduja 2014-04-08 05:39:19 UTC

Description of problem:
=======================

While rebalance is in progress with "gluster volume rebalance start", snap creation fails as expected with message "snapshot create: failed: rebalance process is running for the volume vol2".

[root@snapshot-12 ~]# gluster snapshot create r5 vol2
snapshot create: failed: rebalance process is running for the volume vol2
Snapshot command failed
[root@snapshot-12 ~]# 


But, when a rebalance is in progress as part of remove brick, snap creation fails with pre-validation error. In both the cases the volume has rebalance in progress this should also complain that snap create failed because rebalance is in progress.

[root@snapshot-09 ~]# gluster volume rebalance vol0 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          not started               0.00
                             10.70.43.20                0        0Bytes             0             0             0          not started               0.00
                            10.70.43.186              343         1.5MB           743             0             0          in progress              30.00
                             10.70.43.70                0        0Bytes          2711             0             0          in progress              30.00
volume rebalance: vol0: success: 
[root@snapshot-09 ~]# 

[root@snapshot-10 ~]# gluster snapshot create r1 vol0
snapshot create: failed: Pre Validation failed on 10.70.43.186. Please check log file for details.
Snapshot command failed
[root@snapshot-10 ~]#


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.1.7.snap.mar27.2014git-1.el6.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Create and start a volume
2. Mount a volume and create files to it
3. Remove a brick using "gluster volume remove-brick vol-name start"
4. Remove brick should be successful and should start rebalance
5. Create a snapshot of a volume 

Actual results:
===============

Creation fails with pre-validation error


Expected results:
=================
It should fail gracefully with proper message as "rebalance is in progress" or "remove-brick is in progress"

Comment 3 Nagaprasad Sathyanarayana 2014-04-21 06:18:14 UTC

Marking snapshot BZs to RHS 3.0.

Comment 4 Joseph Elwin Fernandes 2014-05-20 06:39:11 UTC

1) Was not able to reproduce the issue with glusterfs-3.6.0.4 
2) Fixed in http://review.gluster.org/#/c/7128/
3) Moving the bug ON_QA

Comment 5 Rahul Hinduja 2014-05-26 06:45:37 UTC

(In reply to Joseph Elwin Fernandes from comment #4)
> 1) Was not able to reproduce the issue with glusterfs-3.6.0.4 
> 2) Fixed in http://review.gluster.org/#/c/7128/
> 3) Moving the bug ON_QA

Did you confirm that the issue was reproducible with the earlier bits? If yes, can you post the probable cause of it and what might have fixed it in the newer build.

With the review link you provided, the last build was generated on 11-April-2014 while this bug was reported on 08-April-2014, so that means something between these dates would have fixed it. Please provide the proper pointer or analysis

Note: This issue was fairly reproducible, will try to reproduce it on latest bits as well

Comment 6 Rahul Hinduja 2014-05-26 07:11:04 UTC

Able to hit this issue with build: glusterfs-3.6.0.5-1.el6rhs.x86_64 with exactly the same steps as mentioned in Description


Rebalance is in progress:
==========================
[root@snapshot13 ~]# gluster volume remove-brick vol0 snapshot16.lab.eng.blr.redhat.com:/brick0/b0 snapshot15.lab.eng.blr.redhat.com:/brick0/b0 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
       snapshot15.lab.eng.blr.redhat.com               78       324.3KB           183             0             0          in progress               6.00
       snapshot16.lab.eng.blr.redhat.com                0        0Bytes           584             0             0          in progress               6.00
[root@snapshot13 ~]# 
[root@snapshot13 ~]# 

Snapshot creation fails with pre-validation:
============================================

[root@snapshot13 ~]# gluster snapshot create snap0 vol0
snapshot create: failed: Pre Validation failed on snapshot15.lab.eng.blr.redhat.com. Please check log file for details.
Pre Validation failed on snapshot16.lab.eng.blr.redhat.com. Please check log file for details.
Snapshot command failed
[root@snapshot13 ~]#

Comment 7 Joseph Elwin Fernandes 2014-05-29 05:07:57 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1101993#c1

Anand Avati 2014-05-29 01:05:14 EDT
REVIEW: http://review.gluster.org/7899 ([glusterd/snapshot] Fix for snap create preval for remote peer err msg) posted (#7) for review on master by Joseph Fernandes (josferna)

Comment 9 Joseph Elwin Fernandes 2014-06-12 06:54:58 UTC

Downstream submit 
https://code.engineering.redhat.com/gerrit/#/c/26717/

Comment 12 senaik 2015-02-20 12:21:14 UTC

Adding to the comments in Comment 11 :

On a nx2 volume when node is down, and snapshot is created it fails with error message "quorum is not met" whereas on nx3 when node is down and snapshot is created it fails with "One or more bricks may be down" error message. The error message should be same on nx2 and nx3 volumes.

Comment 14 senaik 2015-07-07 10:13:25 UTC

Version : glusterfs-3.7.1-7.el6rhs.x86_64

gluster v remove-brick vol0 replica 3 rhs-arch-srv4.lab.eng.blr.redhat.com:/rhs/brick5/b5 inception.lab.eng.blr.redhat.com:/rhs/brick6/b6 rhs-arch-srv2.lab.eng.blr.redhat.com:/rhs/brick6/b6 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               19        50.1KB            28             0             0          in progress               8.00
    rhs-arch-srv2.lab.eng.blr.redhat.com               13        46.8KB           176             0             0          in progress               7.00
    rhs-arch-srv4.lab.eng.blr.redhat.com               70       777.1KB            76             0             0          in progress               7.00


gluster snapshot create A1 vol0
snapshot create: failed: rebalance process is running for the volume vol0
Snapshot command failed

Marking the bug 'verified'

Comment 16 errata-xmlrpc 2015-07-29 04:31:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.