Bug 1232428

Summary: [SNAPSHOT] : Snapshot delete fails with error - Snap might not be in an usable state
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: senaik
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.1CC: asengupt, hawk, nsathyan, rhs-bugs, sgraf, storage-qa-internal, vagarwal
Target Milestone: ---Keywords: Regression, TestBlocker, Triaged
Target Release: RHGS 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.1-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1232430 (view as bug list) Environment:
Last Closed: 2015-07-29 05:05:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202842, 1223636, 1232430, 1232887    

Description senaik 2015-06-16 18:02:49 UTC
Description of problem:
======================
Snapshot delete fails with error - Snap might not be in an usable state

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.7.1-3.el6rhs.x86_64

How reproducible:
=================
always


Steps to Reproduce:
==================
1.Create a 6x3 dist-rep volume 
2.Create a snapshot 
3.Delete the snapshot. Fails with below error : 

gluster snapshot delete S2_GMT-2015.06.16-15.55.07
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: failed: Snapshot S2_GMT-2015.06.16-15.55.07 might not be in an usable state.
Snapshot command failed

4.Restore the volume to any snapshot. It is successful.

5. Restore the volume to another snapshot. It fails:

gluster snapshot restore S1_GMT-2015.06.16-15.54.28
Restore operation will replace the original volume with the snapshotted volume. Do you still want to continue? (y/n) y
Snapshot restore: S1_GMT-2015.06.16-15.54.28: Snap restored successfully
[root@inception ~]# gluster snapshot restore S2_GMT-2015.06.16-15.55.07
Restore operation will replace the original volume with the snapshotted volume. Do you still want to continue? (y/n) y
snapshot restore: failed: Commit failed on localhost. Please check log file for details.
Snapshot command failed


------------------Part of the log------------------
[2015-06-16 17:56:23.496062] E [MSGID: 106044] [glusterd-snapshot.c:2804:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /var/run/gluster/snap
s/5f9c535f5cad499d88f75c85c83f22ba/brick5/b2 (/dev/RHS_vg2/5f9c535f5cad499d88f75c85c83f22ba_1)
[2015-06-16 17:56:23.496125] W [MSGID: 106073] [glusterd-snapshot.c:2622:glusterd_do_lvm_snapshot_remove] 0-management: Getting the root of the brick for volume 5f9c535
f5cad499d88f75c85c83f22ba (snap S2_GMT-2015.06.16-15.55.07) failed. Removing lv (/dev/RHS_vg3/5f9c535f5cad499d88f75c85c83f22ba_2).
[2015-06-16 17:56:23.684402] E [MSGID: 106044] [glusterd-snapshot.c:2676:glusterd_do_lvm_snapshot_remove] 0-management: removing snapshot of the brick (inception.lab.en
g.blr.redhat.com:/var/run/gluster/snaps/5f9c535f5cad499d88f75c85c83f22ba/brick9/b3) of device /dev/RHS_vg3/5f9c535f5cad499d88f75c85c83f22ba_2 failed
[2015-06-16 17:56:23.684484] E [MSGID: 106044] [glusterd-snapshot.c:2804:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /var/run/gluster/snap
s/5f9c535f5cad499d88f75c85c83f22ba/brick9/b3 (/dev/RHS_vg3/5f9c535f5cad499d88f75c85c83f22ba_2)
[2015-06-16 17:56:23.684589] W [MSGID: 106073] [glusterd-snapshot.c:2622:glusterd_do_lvm_snapshot_remove] 0-management: Getting the root of the brick for volume 5f9c535
f5cad499d88f75c85c83f22ba (snap S2_GMT-2015.06.16-15.55.07) failed. Removing lv (/dev/RHS_vg5/5f9c535f5cad499d88f75c85c83f22ba_3).
[2015-06-16 17:56:23.869829] E [MSGID: 106044] [glusterd-snapshot.c:2676:glusterd_do_lvm_snapshot_remove] 0-management: removing snapshot of the brick (inception.lab.en
g.blr.redhat.com:/var/run/gluster/snaps/5f9c535f5cad499d88f75c85c83f22ba/brick13/b5) of device /dev/RHS_vg5/5f9c535f5cad499d88f75c85c83f22ba_3 failed
[2015-06-16 17:56:23.869916] E [MSGID: 106044] [glusterd-snapshot.c:2804:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /var/run/gluster/snap
s/5f9c535f5cad499d88f75c85c83f22ba/brick13/b5 (/dev/RHS_vg5/5f9c535f5cad499d88f75c85c83f22ba_3)
[2015-06-16 17:56:23.870004] W [MSGID: 106073] [glusterd-snapshot.c:2622:glusterd_do_lvm_snapshot_remove] 0-management: Getting the root of the brick for volume 5f9c535
f5cad499d88f75c85c83f22ba (snap S2_GMT-2015.06.16-15.55.07) failed. Removing lv (/dev/RHS_vg6/5f9c535f5cad499d88f75c85c83f22ba_4).
[2015-06-16 17:56:24.049981] E [MSGID: 106044] [glusterd-snapshot.c:2676:glusterd_do_lvm_snapshot_remove] 0-management: removing snapshot of the brick (inception.lab.en
g.blr.redhat.com:/var/run/gluster/snaps/5f9c535f5cad499d88f75c85c83f22ba/brick17/b6) of device /dev/RHS_vg6/5f9c535f5cad499d88f75c85c83f22ba_4 failed
[2015-06-16 17:56:24.050058] E [MSGID: 106044] [glusterd-snapshot.c:2804:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /var/run/gluster/snap
s/5f9c535f5cad499d88f75c85c83f22ba/brick17/b6 (/dev/RHS_vg6/5f9c535f5cad499d88f75c85c83f22ba_4)
[2015-06-16 17:56:24.050328] W [MSGID: 106033] [glusterd-snapshot.c:2850:glusterd_lvm_snapshot_remove] 0-management: Failed to rmdir: /var/run/gluster/snaps/5f9c535f5cad499d88f75c85c83f22ba/, err: Directory not empty. More than one glusterd running on this node. [Directory not empty]
[2015-06-16 17:56:24.050358] W [MSGID: 106044] [glusterd-snapshot.c:2921:glusterd_snap_volume_remove] 0-management: Failed to remove lvm snapshot volume 5f9c535f5cad499d88f75c85c83f22ba
[2015-06-16 17:56:24.050376] W [MSGID: 106044] [glusterd-snapshot.c:3017:glusterd_snap_remove] 0-management: Failed to remove volinfo 5f9c535f5cad499d88f75c85c83f22ba for snap S2_GMT-2015.06.16-15.55.07
[2015-06-16 17:56:24.050399] E [MSGID: 106044] [glusterd-snapshot.c:6011:glusterd_snapshot_remove_commit] 0-management: Failed to remove snap S2_GMT-2015.06.16-15.55.07
[2015-06-16 17:56:24.050419] E [MSGID: 106044] [glusterd-snapshot.c:8007:glusterd_snapshot] 0-management: Failed to delete snapshot
[2015-06-16 17:56:24.050453] W [MSGID: 106123] [glusterd-mgmt.c:242:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2015-06-16 17:56:24.050472] E [MSGID: 106123] [glusterd-mgmt.c:1240:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2015-06-16 17:56:24.050488] E [MSGID: 106123] [glusterd-mgmt.c:2112:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
----------------------------------------------------------------------

Actual results:


Expected results:


Additional info:

Comment 2 senaik 2015-06-17 06:18:28 UTC
Snapshot delete worked in the previous build glusterfs-3.7.1-2.el6rhs.x86_64. This is a Regression caused in the latest build glusterfs-3.7.1-3.el6rhs.x86_64

Comment 6 senaik 2015-06-19 09:43:48 UTC
Version : glusterfs-3.7.1-4.el6rhs.x86_64

Deleting a snapshot is successful. Marking the bug 'Verified'.

Performed the following steps:

gluster snapshot create S1 vol0
snapshot create: success: Snap S1_GMT-2015.06.19-09.32.42 created successfully

[root@inception ~]# gluster snapshot activate S1_GMT-2015.06.19-09.32.42
Snapshot activate: S1_GMT-2015.06.19-09.32.42: Snap activated successfully

[root@inception ~]# gluster snapshot delete S1_GMT-2015.06.19-09.32.42
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: S1_GMT-2015.06.19-09.32.42: snap removed successfully

Performed a Recursive Restore on the volume and it was successful:

gluster snapshot restore S12_GMT-2015.06.19-09.38.37
Restore operation will replace the original volume with the snapshotted volume. Do you still want to continue? (y/n) y
Snapshot restore: S12_GMT-2015.06.19-09.38.37: Snap restored successfully

[root@inception ~]# gluster snapshot restore S12_GMT-2015.06.19-09.39.41
Restore operation will replace the original volume with the snapshotted volume. Do you still want to continue? (y/n) y
Snapshot restore: S12_GMT-2015.06.19-09.39.41: Snap restored successfully

Comment 7 Richard Neuboeck 2015-07-17 06:00:02 UTC
I can confirm that the problem (still) exists in glusterfs 3.7.2

Version: glusterfs-3.7.2-3.el7.x86_64 (gluster repo)
OS: CentOS 7.1 64bit

Steps to recreate:

# gluster snapshot create snap1 tvol1 description 'test snapshot'
snapshot create: success: Snap snap1_GMT-2015.07.16-11.16.03 created successfully
t
# gluster snapshot list
snap1_GMT-2015.07.16-11.16.03

# gluster snapshot delete snap1_GMT-2015.07.16-11.16.03
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: failed: Snapshot snap1_GMT-2015.07.16-11.16.03 might not be in an usable state.
Snapshot command failed

# gluster snapshot delete all
System contains 1 snapshot(s).
Do you still want to continue and delete them?  (y/n) y
snapshot delete: failed: Snapshot snap1_GMT-2015.07.16-11.16.03 might not be in an usable state.
Snapshot command failed

Comment 8 Avra Sengupta 2015-07-17 07:08:07 UTC
Richard, the build you are using looks to be the latest upstream gluster release. This bug is for tracking the downstream RHS product. Can we please continue further investigation regarding the issue on the upstream bug (https://bugzilla.redhat.com/show_bug.cgi?id=1232430)

Could you also update that bug, with the details of your setup, as in what kind of volume are you using, how many bricks are there in the volume, how many nodes in the cluster. Also could you please attach to that bug, all the logs present in /var/log/glusterfs/ from all the nodes in the cluster

Comment 9 errata-xmlrpc 2015-07-29 05:05:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html