Description of problem: After adding a new node to master cluster, snap restore fails due to missing geo-replication folder in the snap directory of the new brick. Version-Release number of selected component (if applicable): glusterfs-3.6.0.42 How reproducible:Always Steps to Reproduce: 1. Create and start a master slave geo-rep session 2. Stop the session. 3. Take a snapshot. 4. Add a node to the master cluster. 5. Try to perform restore to the snapshot. Actual results: Restore fails because geo-replication folder in snap folder is not copied to the new node. Expected results: Restore should not fail. Additional info: # gluster vol stop test_vol && gluster snap restore snap1 && gluster vol start test_vol Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: test_vol: success Restore operation will replace the original volume with the snapshotted volume. Do you still want to continue? (y/n) y snapshot restore: failed: Commit failed on 10.70.42.33. Please check log file for details. Snapshot command failed Source logs ################## [2015-01-09 12:17:52.190749] D [glusterd-utils.c:1246:glusterd_volume_brickinfo_get] 0-management: Returning 0 [2015-01-09 12:17:52.191195] D [glusterd-utils.c:1335:glusterd_volinfo_find] 0-management: Volume test_vol found [2015-01-09 12:17:52.191207] D [glusterd-utils.c:1342:glusterd_volinfo_find] 0-management: Returning 0 [2015-01-09 12:17:52.191289] E [glusterd-snapshot-utils.c:2899:glusterd_copy_folder] 0-management: Unable to open /var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol [2015-01-09 12:17:52.191309] E [glusterd-snapshot-utils.c:3178:glusterd_restore_geo_rep_files] 0-management: Could not copy /var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol to /var/l ib/glusterd/geo-replication/test_vol_10.x.x.x_slave_vol [2015-01-09 12:17:52.191324] E [glusterd-snapshot.c:8221:gd_restore_snap_volume] 0-management: Failed to restore geo-rep files for snap snap1 [2015-01-09 12:17:52.191338] D [glusterd-utils.c:732:glusterd_volume_brickinfos_delete] 0-management: Returning 0 [2015-01-09 12:17:52.191348] D [store.c:458:gf_store_handle_destroy] 0-: Returning 0 [2015-01-09 12:17:52.191355] D [glusterd-utils.c:776:glusterd_volinfo_delete] 0-management: Returning 0 [2015-01-09 12:17:52.191362] E [glusterd-snapshot.c:836:glusterd_snapshot_restore] 0-management: Failed to restore snap for snap1 [2015-01-09 12:17:52.191368] W [glusterd-snapshot.c:6900:glusterd_snapshot] 0-management: Failed to restore snapshot [2015-01-09 12:17:52.191375] W [glusterd-mgmt.c:224:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed [2015-01-09 12:17:52.191381] D [glusterd-mgmt.c:235:gd_mgmt_v3_commit_fn] 0-management: OP = 28. Returning -1 [2015-01-09 12:17:52.191388] E [glusterd-mgmt-handler.c:567:glusterd_handle_commit_fn] 0-management: commit failed on operation Snapshot
Fixed with https://code.engineering.redhat.com/gerrit/40842 and https://code.engineering.redhat.com/gerrit/40843/
tested on 3.6.0.45-1. Snapshot restore successful after addition of new node. # gluster snapshot restore snap1 Snapshot restore: snap1: Snap restored successfully Volume Name: master Type: Distributed-Replicate Volume ID: 894bc69c-1b46-463e-bac1-817d2ec6c667 Status: Stopped Snap Volume: no Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick1/a1 Brick2: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick2/a2 Brick3: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick3/a3 Brick4: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick4/a4 Brick5: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick5/a5 Brick6: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick6/a6 Brick7: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick7/a7 Brick8: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick8/a8 Options Reconfigured: performance.readdir-ahead: on geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0682.html