Bug 1104635
| Summary: | [SNAPSHOT]: before the snap is marked to be deleted if the node goes down than the snaps are propagated on other nodes and glusterd hungs | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
| Component: | snapshot | Assignee: | Avra Sengupta <asengupt> | |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | rhgs-3.0 | CC: | asengupt, nsathyan, rhs-bugs, rjoseph, senaik, sharne, ssamanta, storage-qa-internal | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 3.0.3 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | SNAPSHOT | |||
| Fixed In Version: | glusterfs-3.6.0.33-1 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1104714 (view as bug list) | Environment: | ||
| Last Closed: | 2015-01-15 13:37:37 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1086145 | |||
| Bug Blocks: | 1087818, 1104714, 1162694, 1175754 | |||
|
Description
Rahul Hinduja
2014-06-04 11:53:23 UTC
Verified this with build: glusterfs-3.6.0.17-1.el6rhs.x86_64
Initially had 180 snaps, started deletion in loop. While deletion was inprogress brought down glusterd and brought it back multiple times on one server.
Few of the snaps delete failed with message:
"snapshot delete: failed: snap snap70 might not be in an usable state.
Snapshot command failed"
Once all the snaps are deleted and the glusterd was brought online all the snaps except to the one that might be in unusable state were deleted. Respective entries were marked as 2:2 in missed_entry_list as
[root@rhs-arch-srv2 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list | wc
171 171 30609
[root@rhs-arch-srv2 ~]#
[root@rhs-arch-srv2 ~]#
[root@rhs-arch-srv2 ~]# service glusterd status
glusterd (pid 19503) is running...
[root@rhs-arch-srv2 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list | grep ":2:2" | wc
171 171 30609
[root@rhs-arch-srv2 ~]#
The above confirms that the snaps were marked for deletion and is successfully deleted after handshake.
glusterd was not hung and was able to delete the snaps where snapshot delete failed.
[root@inception ~]# ls /var/lib/glusterd/snaps/
missed_snaps_list snap143 snap166 snap50 snap70 snap91
[root@inception ~]#
[root@inception ~]# gluster snapshot list
snap50
snap70
snap91
snap143
snap166
[root@inception ~]# gluster snapshot delete snap143
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap143: snap removed successfully
[root@inception ~]# gluster snapshot list
snap50
snap70
snap91
snap166
[root@inception ~]# gluster snapshot delete snap50
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap50: snap removed successfully
[root@inception ~]#
[root@inception ~]#
[root@inception ~]# gluster snapshot list
No snapshots present
[root@inception ~]#
[root@rhs-arch-srv2 ~]# gluster snapshot list
snap50
snap70
snap91
snap143
snap166
[root@rhs-arch-srv2 ~]# gluster snapshot list
snap50
snap70
snap91
snap166
[root@rhs-arch-srv2 ~]# gluster snapshot delete snap166
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap166: snap removed successfully
[root@rhs-arch-srv2 ~]# gluster snapshot delete snap91
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap91: snap removed successfully
[root@rhs-arch-srv2 ~]# gluster snapshot delete snap70
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap70: snap removed successfully
[root@rhs-arch-srv2 ~]#
Moving the bug to verified state.
Version : glusterfs 3.6.0.28 ======== While deleting snaps in a loop, restarted glusterd on few nodes. Some snapshots were still remaining in the system because those snapshots were not marked for decommssion where glusterd went down. When glusterd comes back up on those nodes it recreated the snaps on the other nodes. So when the snap deletion is tried again it fails . However, glusterd does not hang . The below snapshots failed with 'Commit failed' error when glusterd was restarted on other nodes gluster snapshot list vol1_snap_6 vol1_snap_18 vol1_snap_19 vol1_snap_56 vol1_snap_71 vol1_snap_72 vol1_snap_73 vol1_snap_87 vol1_snap_107 vol1_snap_113 vol1_snap_138 vol1_snap_143 vol1_snap_177 vol1_snap_189 Delete snapshot : gluster snapshot delete vol1_snap_6 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: failed: Commit failed on snapshot14.lab.eng.blr.redhat.com. Please check log file for details. Commit failed on snapshot16.lab.eng.blr.redhat.com. Please check log file for details. Commit failed on snapshot15.lab.eng.blr.redhat.com. Please check log file for details. Snapshot command failed Re-opening the bug Edited doc text. Please review and sign-off. The doc text looks fine to me. Version : glusterfs 3.6.0.37 ======== Retried the steps as mentioned in Description and Comment 5, unable to reproduce the issue. Marking the bug as 'Verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html |