This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1277414 - [Snapshot]: Snapshot restore stucks in post validation.
[Snapshot]: Snapshot restore stucks in post validation.
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot (Show other bugs)
3.1
x86_64 Linux
high Severity high
: ---
: RHGS 3.1.3
Assigned To: Avra Sengupta
Anil Shah
: Triaged, ZStream
Depends On:
Blocks: 1299184 1300979 1301030
  Show dependency treegraph
 
Reported: 2015-11-03 04:40 EST by Shashank Raj
Modified: 2016-11-07 22:52 EST (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1300979 (view as bug list)
Environment:
Last Closed: 2016-06-23 00:55:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
restore_failure_logs (9.92 KB, text/plain)
2015-11-03 04:40 EST, Shashank Raj
no flags Details

  None (edit)
Description Shashank Raj 2015-11-03 04:40:53 EST
Created attachment 1088849 [details]
restore_failure_logs

Description of problem:
After recursive restores of the snapshot, it stucks in post validation.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-5

How reproducible:
1/1

Steps to Reproduce:
1.Create a tiered volume and start it
2.Create 10 snapshots of the volume.
3.Do restore of snapshots one by one. Observe that during the 6th restore, the command fails as "Request timed out" and in logs it is observed that restore is stuck in postvalidation.
4.and after that all the snapshot commands are getting stuck and results in timeout.

Actual results:

After recursive restores of the snapshot, it stucks in post validation.

Expected results:

Additional info:

Logs are attached for the reference
Comment 2 Shashank Raj 2016-01-21 07:55:49 EST
Hitting this issue again on the cloned volume from dist-rep volume, with the latest build where in second restore timed out and after that all the snapshot commands on that respective node is timing out.

BUILD: glusterfs-3.7.5-17

Steps followed:

1) Create a dist-replica volume and start it.
2) FUSE mount the volume and write some files from the mount point.
3) Create a snapshot of the volume and activate it.
4) Create a clone of the snapshot and mount it using FUSE.
5) Create data on the cloned volume from FUSE (file 1 to file10).
6) Create a snapshot of the cloned volume (snap1).
7) Create some more data on cloned volume from FUSE (file11 to file20).
8) Create another snapshot of the cloned volume (snap2).
9) Repeat steps 5 to 8 (until 50 files and 5 snaps).
10) Stop the cloned volume.
11) Restore the cloned volume to snap3 created above.
12) Start the volume and check for files. Cloned volume should have files from file1 to file30.
13) List the snapshots of the cloned volume. It should show all snapshots except snap3.
14) Stop the cloned volume and again restore the volume to snap5
15) observe that the restore timed out and all the other snapshot commands after that are getting timed out on that particular node from which we issued restore command.

sos report from the nodes are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1277414
Comment 3 Avra Sengupta 2016-01-22 04:22:25 EST
Reproducible every time with the following steps:

1. Create and start a volume and take 5 snapshots of it.
2. Stop the volume and restore it to snap1
3. Have an open fd at one of the brick backends (This step is to simulate umount failure on one of the nodes.)
4. Restore the volume to snap2.
Comment 5 Avra Sengupta 2016-03-10 04:49:43 EST
Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (IN REVIEW)
Comment 6 Avra Sengupta 2016-03-11 03:10:04 EST
Master URL: http://review.gluster.org/#/c/13282/ (MERGED)
Release 3.7 URL: http://review.gluster.org/#/c/13548/ (MERGED)
Comment 8 Anil Shah 2016-04-01 06:28:19 EDT
Did consecutive snapshot restore for 15 snapshots.Didn't see any failure or post validation failed error messages in logs.


Bug verified on build glusterfs-3.7.9-1.el7rhgs.x86_64
Comment 10 errata-xmlrpc 2016-06-23 00:55:54 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.