1180560 – [SNAPSHOT]: Snapshot restore fails after adding a node to master with geo-replication involved

Bug 1180560 - [SNAPSHOT]: Snapshot restore fails after adding a node to master with geo-replication involved

Summary: [SNAPSHOT]: Snapshot restore fails after adding a node to master with geo-rep...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.4
Assignee:	Avra Sengupta
QA Contact:	senaik
Docs Contact:
URL:
Whiteboard:	snapshot
Depends On:
Blocks:	1181418 1182947 1186192
TreeView+	depends on / blocked

Reported:	2015-01-09 12:58 UTC by shilpa
Modified:	2016-09-17 12:56 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.6.0.45-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1181418 (view as bug list)
Environment:
Last Closed:	2015-03-26 06:35:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0682	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #4	2015-03-26 10:32:55 UTC

Description shilpa 2015-01-09 12:58:51 UTC

Description of problem: After adding a new node to master cluster, snap restore fails due to missing geo-replication folder in the snap directory of the new brick.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.42


How reproducible:Always


Steps to Reproduce:
1. Create and start a master slave geo-rep session
2. Stop the session.
3. Take a snapshot.
4. Add a node to the master cluster.
5. Try to perform restore to the snapshot.


Actual results:

Restore fails because geo-replication folder in snap folder is not copied to the new node.

Expected results:

Restore should not fail. 

Additional info:

# gluster vol stop test_vol && gluster snap restore
snap1 && gluster vol start test_vol
Stopping volume will make its data inaccessible. Do you want to
continue? (y/n) y
volume stop: test_vol: success
Restore operation will replace the original volume with the snapshotted
volume. Do you still want to continue? (y/n) y
snapshot restore: failed: Commit failed on 10.70.42.33. Please check log
file for details.
Snapshot command failed



Source logs
##################
[2015-01-09 12:17:52.190749] D
[glusterd-utils.c:1246:glusterd_volume_brickinfo_get] 0-management:
Returning 0
[2015-01-09 12:17:52.191195] D
[glusterd-utils.c:1335:glusterd_volinfo_find] 0-management: Volume
test_vol found
[2015-01-09 12:17:52.191207] D
[glusterd-utils.c:1342:glusterd_volinfo_find] 0-management: Returning 0
[2015-01-09 12:17:52.191289] E
[glusterd-snapshot-utils.c:2899:glusterd_copy_folder] 0-management:
Unable to open
/var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol
[2015-01-09 12:17:52.191309] E
[glusterd-snapshot-utils.c:3178:glusterd_restore_geo_rep_files]
0-management: Could not copy
/var/lib/glusterd/snaps/snap1/geo-replication/test_vol_10.x.x.x_slave_vol
to /var/l
ib/glusterd/geo-replication/test_vol_10.x.x.x_slave_vol
[2015-01-09 12:17:52.191324] E
[glusterd-snapshot.c:8221:gd_restore_snap_volume] 0-management: Failed
to restore geo-rep files for snap snap1
[2015-01-09 12:17:52.191338] D
[glusterd-utils.c:732:glusterd_volume_brickinfos_delete] 0-management:
Returning 0
[2015-01-09 12:17:52.191348] D [store.c:458:gf_store_handle_destroy] 0-:
Returning 0
[2015-01-09 12:17:52.191355] D
[glusterd-utils.c:776:glusterd_volinfo_delete] 0-management: Returning 0
[2015-01-09 12:17:52.191362] E
[glusterd-snapshot.c:836:glusterd_snapshot_restore] 0-management: Failed
to restore snap for snap1
[2015-01-09 12:17:52.191368] W
[glusterd-snapshot.c:6900:glusterd_snapshot] 0-management: Failed to
restore snapshot
[2015-01-09 12:17:52.191375] W
[glusterd-mgmt.c:224:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit
Failed
[2015-01-09 12:17:52.191381] D
[glusterd-mgmt.c:235:gd_mgmt_v3_commit_fn] 0-management: OP = 28.
Returning -1
[2015-01-09 12:17:52.191388] E
[glusterd-mgmt-handler.c:567:glusterd_handle_commit_fn] 0-management:
commit failed on operation Snapshot

Comment 2 Avra Sengupta 2015-01-28 14:19:46 UTC

Fixed with https://code.engineering.redhat.com/gerrit/40842 and https://code.engineering.redhat.com/gerrit/40843/

Comment 3 shilpa 2015-02-19 13:12:36 UTC

tested on 3.6.0.45-1. Snapshot restore successful after addition of new node.

# gluster snapshot restore snap1
Snapshot restore: snap1: Snap restored successfully


Volume Name: master
Type: Distributed-Replicate
Volume ID: 894bc69c-1b46-463e-bac1-817d2ec6c667
Status: Stopped
Snap Volume: no
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick1/a1
Brick2: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick2/a2
Brick3: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick3/a3
Brick4: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick4/a4
Brick5: ccr:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick5/a5
Brick6: metallica:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick6/a6
Brick7: pinkfloyd:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick7/a7
Brick8: beatles:/var/run/gluster/snaps/3e87c2ab27214296920cc0e5b3ffc1ef/brick8/a8
Options Reconfigured:
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

Comment 5 errata-xmlrpc 2015-03-26 06:35:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html

Note You need to log in before you can comment on or make changes to this bug.