1277414 – [Snapshot]: Snapshot restore stucks in post validation.

Bug 1277414 - [Snapshot]: Snapshot restore stucks in post validation.

Summary: [Snapshot]: Snapshot restore stucks in post validation.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Avra Sengupta
QA Contact:	Anil Shah
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1299184 1300979 1301030
TreeView+	depends on / blocked

Reported:	2015-11-03 09:40 UTC by Shashank Raj
Modified:	2016-11-08 03:52 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1300979 (view as bug list)
Environment:
Last Closed:	2016-06-23 04:55:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
restore_failure_logs (9.92 KB, text/plain) 2015-11-03 09:40 UTC, Shashank Raj	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Shashank Raj 2015-11-03 09:40:53 UTC

Created attachment 1088849 [details]
restore_failure_logs

Description of problem:
After recursive restores of the snapshot, it stucks in post validation.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-5

How reproducible:
1/1

Steps to Reproduce:
1.Create a tiered volume and start it
2.Create 10 snapshots of the volume.
3.Do restore of snapshots one by one. Observe that during the 6th restore, the command fails as "Request timed out" and in logs it is observed that restore is stuck in postvalidation.
4.and after that all the snapshot commands are getting stuck and results in timeout.

Actual results:

After recursive restores of the snapshot, it stucks in post validation.

Expected results:

Additional info:

Logs are attached for the reference

Comment 2 Shashank Raj 2016-01-21 12:55:49 UTC

Hitting this issue again on the cloned volume from dist-rep volume, with the latest build where in second restore timed out and after that all the snapshot commands on that respective node is timing out.

BUILD: glusterfs-3.7.5-17

Steps followed:

1) Create a dist-replica volume and start it.
2) FUSE mount the volume and write some files from the mount point.
3) Create a snapshot of the volume and activate it.
4) Create a clone of the snapshot and mount it using FUSE.
5) Create data on the cloned volume from FUSE (file 1 to file10).
6) Create a snapshot of the cloned volume (snap1).
7) Create some more data on cloned volume from FUSE (file11 to file20).
8) Create another snapshot of the cloned volume (snap2).
9) Repeat steps 5 to 8 (until 50 files and 5 snaps).
10) Stop the cloned volume.
11) Restore the cloned volume to snap3 created above.
12) Start the volume and check for files. Cloned volume should have files from file1 to file30.
13) List the snapshots of the cloned volume. It should show all snapshots except snap3.
14) Stop the cloned volume and again restore the volume to snap5
15) observe that the restore timed out and all the other snapshot commands after that are getting timed out on that particular node from which we issued restore command.

sos report from the nodes are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1277414

Comment 3 Avra Sengupta 2016-01-22 09:22:25 UTC

Reproducible every time with the following steps:

1. Create and start a volume and take 5 snapshots of it.
2. Stop the volume and restore it to snap1
3. Have an open fd at one of the brick backends (This step is to simulate umount failure on one of the nodes.)
4. Restore the volume to snap2.

Comment 5 Avra Sengupta 2016-03-10 09:49:43 UTC

Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (IN REVIEW)

Comment 6 Avra Sengupta 2016-03-11 08:10:04 UTC

Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (MERGED)

Comment 8 Anil Shah 2016-04-01 10:28:19 UTC

Did consecutive snapshot restore for 15 snapshots.Didn't see any failure or post validation failed error messages in logs.


Bug verified on build glusterfs-3.7.9-1.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2016-06-23 04:55:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.