Bug 1108652 - [SNAPSHOT]: Restore fails with prevalidation when the glusterd is restarted after the quorum didnt match
Summary: [SNAPSHOT]: Restore fails with prevalidation when the glusterd is restarted a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: rjoseph
QA Contact: Rahul Hinduja
URL:
Whiteboard: SNAPSHOT
Depends On:
Blocks: 1067342 1100282 1109024
TreeView+ depends on / blocked
 
Reported: 2014-06-12 11:42 UTC by Rahul Hinduja
Modified: 2016-09-17 12:59 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.6.0.17-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1109024 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:41:19 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Rahul Hinduja 2014-06-12 11:42:29 UTC
Description of problem:
=======================

In a scenario where the restore is failed because quorum didn't meet, the subsequent restore when the quorum actually meets also fails with prevalidation. 

It has major impact, as the volume is also marked for deletion and entries from volume information are removed. 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0.16-1.el6rhs.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Create and start the volume(2*2) from 4 nodes
2. Create a snapshot of volume
3. kill glusterd on node2
4. bring down the node4(poweroff)
5. offline the volume from node1 (gluster volume stop volume)
6. Restore the volume to snapshot taken at step2
7. Restore should fail as the quorum doesn't meet
8. Start the glusterd on node2
9. Restore the volume to snapshot taken at step2

Actual results:
===============

Restore fails with prevalidation.

[root@snapshot13 ~]# gluster snapshot restore snap1
snapshot restore: failed: Pre-validation failed on localhost. Please check log file for details
Snapshot command failed
[root@snapshot13 ~]# 

Expected results:
=================

Restore should not fail.


Additional info:
================

On 2 of the machines the trash has the volume information

[root@snapshot13 ~]# ls /var/lib/glusterd/trash/
vols-vol0.deleted
[root@snapshot13 ~]#

Comment 5 Rahul Hinduja 2014-06-16 09:44:08 UTC
Verified with build: glusterfs-3.6.0.17-1.el6rhs.x86_64

[root@snapshot13 ~]# gluster snapshot list vol0
snap1
[root@snapshot13 ~]# 
[root@snapshot13 ~]# 
[root@snapshot13 ~]# 
[root@snapshot13 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list 
[root@snapshot13 ~]# gluster peer status
Number of Peers: 3

Hostname: snapshot14.lab.eng.blr.redhat.com
Uuid: 359bb151-a987-4dd1-a1e6-6fe2c3c30b9e
State: Peer in Cluster (Disconnected)

Hostname: snapshot15.lab.eng.blr.redhat.com
Uuid: 262f8999-3c5e-4ccf-8efc-47ccce690ff8
State: Peer in Cluster (Connected)

Hostname: snapshot16.lab.eng.blr.redhat.com
Uuid: 4afe2c38-2cb0-432a-81ec-18799eaea5cd
State: Peer in Cluster (Disconnected)
[root@snapshot13 ~]# gluster volume stop vol0
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: vol0: success
[root@snapshot13 ~]# 


[root@snapshot13 ~]# ls /var/lib/glusterd/trash/
ls: cannot access /var/lib/glusterd/trash/: No such file or directory
[root@snapshot13 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list 
[root@snapshot13 ~]# gluster snapshot restore snap1
snapshot restore: failed: glusterds are not in quorum
Snapshot command failed
[root@snapshot13 ~]# ls /var/lib/glusterd/trash/
ls: cannot access /var/lib/glusterd/trash/: No such file or directory
[root@snapshot13 ~]# [root@snapshot13 ~]# gluster peer status
Number of Peers: 3

Hostname: snapshot14.lab.eng.blr.redhat.com
Uuid: 359bb151-a987-4dd1-a1e6-6fe2c3c30b9e
State: Peer in Cluster (Connected)

Hostname: snapshot15.lab.eng.blr.redhat.com
Uuid: 262f8999-3c5e-4ccf-8efc-47ccce690ff8
State: Peer in Cluster (Connected)

Hostname: snapshot16.lab.eng.blr.redhat.com
Uuid: 4afe2c38-2cb0-432a-81ec-18799eaea5cd
State: Peer in Cluster (Disconnected)
[root@snapshot13 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list 
[root@snapshot13 ~]# gluster snapshot restore snap1
Snapshot restore: snap1: Snap restored successfully
[root@snapshot13 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list 
4afe2c38-2cb0-432a-81ec-18799eaea5cd:9e782c7e-54ac-4b88-8360-e74e072c8336=1eebeff0a2e34fe5b3ffe2460843a341:4:/var/run/gluster/snaps/1eebeff0a2e34fe5b3ffe2460843a341/brick4/b0:3:1
[root@snapshot13 ~]# ls /var/lib/glusterd/trash/
ls: cannot access /var/lib/glusterd/trash/: No such file or directory
[root@snapshot13 ~]# 





[root@snapshot16 ~]# ls /var/lib/glusterd/snaps/
missed_snaps_list
[root@snapshot16 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list 
4afe2c38-2cb0-432a-81ec-18799eaea5cd:9e782c7e-54ac-4b88-8360-e74e072c8336=1eebeff0a2e34fe5b3ffe2460843a341:4:/var/run/gluster/snaps/1eebeff0a2e34fe5b3ffe2460843a341/brick4/b0:3:2
[root@snapshot16 ~]# 


Moving the bug to verified state.

Comment 7 errata-xmlrpc 2014-09-22 19:41:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.