Bug 1180560

Summary: [SNAPSHOT]: Snapshot restore fails after adding a node to master with geo-replication involved
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: shilpa <smanjara>
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: senaik
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: annair, asengupt, nsathyan, rcyriac, rhs-bugs, rjoseph, senaik, spandit, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.0.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: snapshot
Fixed In Version: glusterfs-3.6.0.45-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1181418 (view as bug list) Environment:
Last Closed: 2015-03-26 06:35:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1181418, 1182947, 1186192    

Description shilpa 2015-01-09 12:58:51 UTC
Description of problem: After adding a new node to master cluster, snap restore fails due to missing geo-replication folder in the snap directory of the new brick.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.42


How reproducible:Always


Steps to Reproduce:
1. Create and start a master slave geo-rep session
2. Stop the session.
3. Take a snapshot.
4. Add a node to the master cluster.
5. Try to perform restore to the snapshot.


Actual results:

Restore fails because geo-replication folder in snap folder is not copied to the new node.

Expected results:

Restore should not fail. 

Additional info:

# gluster vol stop test_vol && gluster snap restore
snap1 && gluster vol start test_vol
Stopping volume will make its data inaccessible. Do you want to
continue? (y/n) y
volume stop: test_vol: success
Restore operation will replace the original volume with the snapshotted
volume. Do you still want to continue? (y/n) y
snapshot restore: failed: Commit failed on 10.70.42.33. Please check log
file for details.
Snapshot command failed



Source logs
##################
[2015-01-09 12:17:52.190749] D
[glusterd-utils.c:1246:glusterd_volume_brickinfo_get] 0-management:
Returning 0
[2015-01-09 12:17:52.191195] D
[glusterd-utils.c:1335:glusterd_volinfo_find] 0-management: Volume
test_vol found
[2015-01-09 12:17:52.191207] D
[glusterd-utils.c:1342:glusterd_volinfo_find] 0-management: Returning 0
[2015-01-09 12:17:52.191289] E
[glusterd-snapshot-utils.c:2899:glusterd_copy_folder] 0-management:
Unable to open
/var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol
[2015-01-09 12:17:52.191309] E
[glusterd-snapshot-utils.c:3178:glusterd_restore_geo_rep_files]
0-management: Could not copy
/var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol
to /var/l
ib/glusterd/geo-replication/test_vol_10.x.x.x_slave_vol
[2015-01-09 12:17:52.191324] E
[glusterd-snapshot.c:8221:gd_restore_snap_volume] 0-management: Failed
to restore geo-rep files for snap snap1
[2015-01-09 12:17:52.191338] D
[glusterd-utils.c:732:glusterd_volume_brickinfos_delete] 0-management:
Returning 0
[2015-01-09 12:17:52.191348] D [store.c:458:gf_store_handle_destroy] 0-:
Returning 0
[2015-01-09 12:17:52.191355] D
[glusterd-utils.c:776:glusterd_volinfo_delete] 0-management: Returning 0
[2015-01-09 12:17:52.191362] E
[glusterd-snapshot.c:836:glusterd_snapshot_restore] 0-management: Failed
to restore snap for snap1
[2015-01-09 12:17:52.191368] W
[glusterd-snapshot.c:6900:glusterd_snapshot] 0-management: Failed to
restore snapshot
[2015-01-09 12:17:52.191375] W
[glusterd-mgmt.c:224:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit
Failed
[2015-01-09 12:17:52.191381] D
[glusterd-mgmt.c:235:gd_mgmt_v3_commit_fn] 0-management: OP = 28.
Returning -1
[2015-01-09 12:17:52.191388] E
[glusterd-mgmt-handler.c:567:glusterd_handle_commit_fn] 0-management:
commit failed on operation Snapshot

Comment 3 shilpa 2015-02-19 13:12:36 UTC
tested on 3.6.0.45-1. Snapshot restore successful after addition of new node.

# gluster snapshot restore snap1
Snapshot restore: snap1: Snap restored successfully


Volume Name: master
Type: Distributed-Replicate
Volume ID: 894bc69c-1b46-463e-bac1-817d2ec6c667
Status: Stopped
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick1/a1
Brick2: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick2/a2
Brick3: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick3/a3
Brick4: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick4/a4
Brick5: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick5/a5
Brick6: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick6/a6
Brick7: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick7/a7
Brick8: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick8/a8
Options Reconfigured:
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

Comment 5 errata-xmlrpc 2015-03-26 06:35:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html