Bug 1107575
Summary: | [SNAPSHOT]: Restoring snapshot to a volume fails with "Commit failed on <peer_node_name>", when glusterd is restarted after creation of the snapshot | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | senaik |
Component: | snapshot | Assignee: | Joseph Elwin Fernandes <josferna> |
Status: | CLOSED ERRATA | QA Contact: | senaik |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.0 | CC: | josferna, ltroan, nsathyan, rhinduja, rhs-bugs, sankarshan, ssamanta, storage-qa-internal, vagarwal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.0.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | SNAPSHOT | ||
Fixed In Version: | glusterfs-3.6.0.16-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-09-22 19:40:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1098087 | ||
Bug Blocks: |
Description
senaik
2014-06-10 09:08:02 UTC
sosreports : ========== http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1107575/ Was able to reproduce the issue with the following step (without activate/deactivate) 1) Create volume 2) Start volume 3) Create Data via nfs/gluster client 4) Create snap 5) service glusterd restart 6) restore snap(fails say stop volume) 7) Stop volume 8) restore snap [root@joeremote1 ~]# gluster snapshot restore snap1 snapshot restore: failed: Commit failed on joeremote2. Error Please check log file for details. Snapshot command failed 9) snap info [root@joeremote1 ~]# gluster snapshot info snap1 Snapshot info : failed: Operation failed Snapshot command failed 10) when checked for snap bricks or volume bricks all are down Clearly not an activate/deactive issue Will Investigate more on why this is happening. The log suggests copying of quota config files failed [2014-06-10 10:56:38.558608] E [glusterd-utils.c:12250:glusterd_copy_quota_files] 0-management: /var/lib/glusterd/snaps/snap1/81c37f9740054c0d9f077195f7d5f3b9/quota.cksum not found [2014-06-10 10:56:38.558627] E [glusterd-snapshot.c:7130:gd_restore_snap_volume] 0-management: Failed to restore quota files for snap snap1 [2014-06-10 10:56:38.558642] E [glusterd-snapshot.c:744:glusterd_snapshot_restore] 0-management: Failed to restore snap for snap1 [2014-06-10 10:56:38.558651] W [glusterd-snapshot.c:5888:glusterd_snapshot] 0-management: Failed to restore snapshot [2014-06-10 10:56:38.558658] W [glusterd-mgmt.c:222:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed [2014-06-10 10:56:38.558665] E [glusterd-mgmt-handler.c:554:glusterd_handle_commit_fn] 0-management: commit failed on operation Snapshot [2014-06-10 10:56:38.559101] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /var/run/gluster/snaps/81c37f9740054c0d9f077195f7d5f3b9/brick2/brick1 on port 49154 [2014-06-10 10:56:38.559542] W [socket.c:522:__socket_rwv] 0-socket.management: writev on 10.70.43.166:1018 failed (Broken pipe) [2014-06-10 10:56:38.559560] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now [2014-06-10 10:56:38.559593] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /var/run/gluster/snaps/81c37f9740054c0d9f077195f7d5f3b9/brick3/brick2 on port 49155 [2014-06-10 10:56:38.559903] W [socket.c:522:__socket_rwv] 0-socket.management: writev on 10.70.43.166:1017 failed (Broken pipe) [2014-06-10 10:56:38.559918] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now This issue is fixed under the patch http://review.gluster.org/#/c/7934/ The issue is the quota.conf checksum mismatch when the glusterd is restart. The above patch fixes the issue. This bug is depended on the fix for the bug 1101483 The patch is merged upstream. I have tested the fix and it works. Version : glusterfs-3.6.0.16-1.el6rhs.x86_64 Repeated the steps as mentioned in 'Steps to Reproduce' and Comment 3. Could not reproduce the issue. Marking the bug as 'Verified' Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |