Bug 1277414 - [Snapshot]: Snapshot restore stucks in post validation.
Summary: [Snapshot]: Snapshot restore stucks in post validation.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.1.3
Assignee: Avra Sengupta
QA Contact: Anil Shah
URL:
Whiteboard:
Depends On:
Blocks: 1299184 1300979 1301030
TreeView+ depends on / blocked
 
Reported: 2015-11-03 09:40 UTC by Shashank Raj
Modified: 2016-11-08 03:52 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1300979 (view as bug list)
Environment:
Last Closed: 2016-06-23 04:55:54 UTC
Embargoed:


Attachments (Terms of Use)
restore_failure_logs (9.92 KB, text/plain)
2015-11-03 09:40 UTC, Shashank Raj
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description Shashank Raj 2015-11-03 09:40:53 UTC
Created attachment 1088849 [details]
restore_failure_logs

Description of problem:
After recursive restores of the snapshot, it stucks in post validation.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-5

How reproducible:
1/1

Steps to Reproduce:
1.Create a tiered volume and start it
2.Create 10 snapshots of the volume.
3.Do restore of snapshots one by one. Observe that during the 6th restore, the command fails as "Request timed out" and in logs it is observed that restore is stuck in postvalidation.
4.and after that all the snapshot commands are getting stuck and results in timeout.

Actual results:

After recursive restores of the snapshot, it stucks in post validation.

Expected results:

Additional info:

Logs are attached for the reference

Comment 2 Shashank Raj 2016-01-21 12:55:49 UTC
Hitting this issue again on the cloned volume from dist-rep volume, with the latest build where in second restore timed out and after that all the snapshot commands on that respective node is timing out.

BUILD: glusterfs-3.7.5-17

Steps followed:

1) Create a dist-replica volume and start it.
2) FUSE mount the volume and write some files from the mount point.
3) Create a snapshot of the volume and activate it.
4) Create a clone of the snapshot and mount it using FUSE.
5) Create data on the cloned volume from FUSE (file 1 to file10).
6) Create a snapshot of the cloned volume (snap1).
7) Create some more data on cloned volume from FUSE (file11 to file20).
8) Create another snapshot of the cloned volume (snap2).
9) Repeat steps 5 to 8 (until 50 files and 5 snaps).
10) Stop the cloned volume.
11) Restore the cloned volume to snap3 created above.
12) Start the volume and check for files. Cloned volume should have files from file1 to file30.
13) List the snapshots of the cloned volume. It should show all snapshots except snap3.
14) Stop the cloned volume and again restore the volume to snap5
15) observe that the restore timed out and all the other snapshot commands after that are getting timed out on that particular node from which we issued restore command.

sos report from the nodes are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1277414

Comment 3 Avra Sengupta 2016-01-22 09:22:25 UTC
Reproducible every time with the following steps:

1. Create and start a volume and take 5 snapshots of it.
2. Stop the volume and restore it to snap1
3. Have an open fd at one of the brick backends (This step is to simulate umount failure on one of the nodes.)
4. Restore the volume to snap2.

Comment 5 Avra Sengupta 2016-03-10 09:49:43 UTC
Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (IN REVIEW)

Comment 6 Avra Sengupta 2016-03-11 08:10:04 UTC
Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (MERGED)

Comment 8 Anil Shah 2016-04-01 10:28:19 UTC
Did consecutive snapshot restore for 15 snapshots.Didn't see any failure or post validation failed error messages in logs.


Bug verified on build glusterfs-3.7.9-1.el7rhgs.x86_64

Comment 10 errata-xmlrpc 2016-06-23 04:55:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240


Note You need to log in before you can comment on or make changes to this bug.