Bug 1279730 - guest paused due to IO error from gluster based storage doesn't resume automatically or manually
Summary: guest paused due to IO error from gluster based storage doesn't resume automa...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: write-behind
Version: mainline
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard: gluster
Depends On:
Blocks: 1293534
TreeView+ depends on / blocked
 
Reported: 2015-11-10 07:09 UTC by Raghavendra G
Modified: 2016-06-16 13:43 UTC (History)
23 users (show)

Fixed In Version: glusterfs-3.8rc2
Clone Of: 1171261
: 1293534 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:43:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Krutika Dhananjay 2015-11-10 12:15:53 UTC
If a guest which has disk on gluster volume , is paused due to storage error &
does not resume when the storage comes up .These guests cannot be resumed manually also.
The disks in the storage domain are readable & writable (touch).

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
vm is paused & does not resume automatically or manually once the storage is up.

Expected results:
VM auto unpauses once the storage domain comes up.

Comment 2 Vijay Bellur 2015-11-17 07:38:17 UTC
REVIEW: http://review.gluster.org/12593 (performance/write-behind: retry "failed syncs to backend") posted (#1) for review on release-3.7 by Raghavendra G (rgowdapp)

Comment 3 Vijay Bellur 2015-11-17 07:40:15 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 4 Vijay Bellur 2015-11-17 08:31:38 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 5 Vijay Bellur 2015-11-17 18:31:40 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 6 Vijay Bellur 2015-11-18 07:32:12 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 7 Vijay Bellur 2015-11-18 07:34:45 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 8 Vijay Bellur 2015-11-18 10:38:17 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 9 Vijay Bellur 2015-11-18 17:43:57 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#7) for review on master by Raghavendra G (rgowdapp)

Comment 10 Vijay Bellur 2015-11-19 06:59:49 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#8) for review on master by Raghavendra G (rgowdapp)

Comment 11 Vijay Bellur 2015-11-19 08:01:20 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#9) for review on master by Raghavendra G (rgowdapp)

Comment 12 Vijay Bellur 2015-11-20 07:27:54 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#10) for review on master by Raghavendra G (rgowdapp)

Comment 13 Vijay Bellur 2015-11-20 09:42:11 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#11) for review on master by Raghavendra G (rgowdapp)

Comment 14 Vijay Bellur 2015-11-23 07:41:35 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#12) for review on master by Raghavendra G (rgowdapp)

Comment 15 Vijay Bellur 2015-11-25 05:57:39 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#13) for review on master by Raghavendra G (rgowdapp)

Comment 16 Vijay Bellur 2015-11-25 05:59:38 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#14) for review on master by Raghavendra G (rgowdapp)

Comment 17 Vijay Bellur 2015-11-25 10:11:00 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#15) for review on master by Raghavendra G (rgowdapp)

Comment 18 Vijay Bellur 2015-11-25 18:02:44 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#16) for review on master by Raghavendra G (rgowdapp)

Comment 19 Vijay Bellur 2015-11-25 19:02:03 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#17) for review on master by Raghavendra G (rgowdapp)

Comment 20 Vijay Bellur 2015-11-26 04:36:02 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#18) for review on master by Raghavendra G (rgowdapp)

Comment 21 Vijay Bellur 2015-11-27 06:58:14 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#19) for review on master by Raghavendra G (rgowdapp)

Comment 22 Vijay Bellur 2015-11-30 17:25:39 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#20) for review on master by Raghavendra G (rgowdapp)

Comment 23 Vijay Bellur 2015-12-05 13:26:05 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#21) for review on master by Raghavendra G (rgowdapp)

Comment 24 Vijay Bellur 2015-12-07 11:33:28 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#22) for review on master by Raghavendra G (rgowdapp)

Comment 25 Vijay Bellur 2015-12-08 05:03:54 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#23) for review on master by Raghavendra G (rgowdapp)

Comment 26 Vijay Bellur 2015-12-08 21:47:22 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#24) for review on master by Vijay Bellur (vbellur)

Comment 27 Vijay Bellur 2015-12-09 06:49:32 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#25) for review on master by Raghavendra G (rgowdapp)

Comment 28 Vijay Bellur 2015-12-10 11:00:18 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#26) for review on master by Raghavendra G (rgowdapp)

Comment 29 Vijay Bellur 2015-12-17 05:14:29 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#27) for review on master by Raghavendra G (rgowdapp)

Comment 30 Vijay Bellur 2015-12-20 14:29:45 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#28) for review on master by Raghavendra G (rgowdapp)

Comment 31 Vijay Bellur 2015-12-21 09:11:14 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#29) for review on master by Raghavendra G (rgowdapp)

Comment 32 Vijay Bellur 2015-12-21 14:53:52 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#30) for review on master by Raghavendra G (rgowdapp)

Comment 33 Vijay Bellur 2015-12-21 16:03:40 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#31) for review on master by Raghavendra G (rgowdapp)

Comment 34 Vijay Bellur 2015-12-22 09:55:25 UTC
REVIEW: http://review.gluster.org/12594 (performance/write-behind: retry "failed syncs to backend") posted (#32) for review on master by Raghavendra G (rgowdapp)

Comment 35 Vijay Bellur 2015-12-22 09:56:00 UTC
COMMIT: http://review.gluster.org/12594 committed in master by Raghavendra G (rgowdapp) 
------
commit 3fcead2de7bcdb4e1312f37e7e750abd8d9d9770
Author: Raghavendra G <rgowdapp>
Date:   Tue Nov 17 12:57:54 2015 +0530

    performance/write-behind: retry "failed syncs to backend"
    
    1. When sync fails, the cached-write is still preserved unless there
       is a flush/fsync waiting on it.
    
    2. When a sync fails and there is a flush/fsync waiting on the
       cached-write, the cache is thrown away and no further retries will
       be made. In other words flush/fsync act as barriers for all the
       previous writes. The behaviour of fsync acting as a barrier is
       controlled by an option (see below for details). All previous
       writes are either successfully synced to backend or forgotten in
       case of an error. Without such barrier fop (especially flush which
       is issued prior to a close), we end up retrying for ever even after
       fd is closed.
    
    3. If a fop is waiting on cached-write and syncing to backend fails,
       the waiting fop is failed.
    
    4. sync failures when no fop is waiting are ignored and are not
       propagated to application. For eg.,
       a. first attempt of sync of a cached-write w1 fails
       b. second attempt of sync of w1 succeeds
    
       If there are no fops dependent on w1 are issued b/w a and b,
       application won't know about failure encountered in a.
    
    5. The effect of repeated sync failures is that, there will be no
       cache for future writes and they cannot be written behind.
    
    fsync as a barrier and resync of cached writes post fsync failure:
    ==================================================================
    Whether to keep retrying failed syncs post fsync is controlled by an
    option "resync-failed-syncs-after-fsync". By default, this option is
    set to "off".
    
    If sync of "cached-writes issued before fsync" (to backend) fails,
    this option configures whether to retry syncing them after fsync or
    forget them. If set to on, cached-writes are retried till a "flush"
    fop (or a successful sync) on sync failures. fsync itself is failed
    irrespective of the value of this option, when there is a sync failure
    of any cached-writes issued before fsync.
    
    Change-Id: I6097c9257bfb9ee5b15616fbe6a0576ae9af369a
    Signed-off-by: Raghavendra G <rgowdapp>
    BUG: 1279730
    Reviewed-on: http://review.gluster.org/12594

Comment 36 Vijay Bellur 2015-12-30 09:02:20 UTC
REVIEW: http://review.gluster.org/13113 (wb: Remove inline keyword) posted (#1) for review on master by Raghavendra Talur (rtalur)

Comment 37 Vijay Bellur 2015-12-30 11:00:56 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#1) for review on master by Raghavendra Talur (rtalur)

Comment 38 Vijay Bellur 2015-12-30 18:33:32 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: use yet to be released GD_OP_VERSION) posted (#2) for review on master by Raghavendra Talur (rtalur)

Comment 39 Vijay Bellur 2015-12-31 10:32:46 UTC
REVIEW: http://review.gluster.org/13113 (wb: remove inline keyword) posted (#2) for review on master by Raghavendra Talur (rtalur)

Comment 40 Vijay Bellur 2015-12-31 10:33:14 UTC
COMMIT: http://review.gluster.org/13113 committed in master by Raghavendra Talur (rtalur) 
------
commit 96f4ec28a80c013b71aa723efaa5810d2eacdd7f
Author: Raghavendra Talur <rtalur>
Date:   Wed Dec 30 13:23:33 2015 +0530

    wb: remove inline keyword
    
    When compiled with -Werror flag gcc throws the following
    error:
    
    ‘iov_length’ is static but used in inline
    function ‘__wb_modify_write_request’ which is not static.
    Let gcc decide what functions to inline and remove the inline
    keyword.
    
    Change-Id: I6d832596eefcf08306634936e11d2c8d4b8f9ccd
    BUG: 1279730
    Signed-off-by: Raghavendra Talur <rtalur>
    Reviewed-on: http://review.gluster.org/13113

Comment 41 Vijay Bellur 2016-01-04 07:28:23 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: use yet to be released GD_OP_VERSION) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 42 Vijay Bellur 2016-01-05 08:43:38 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#4) for review on master by Raghavendra Talur (rtalur)

Comment 43 Vijay Bellur 2016-01-05 12:50:49 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#5) for review on master by Raghavendra Talur (rtalur)

Comment 44 Vijay Bellur 2016-01-05 17:39:29 UTC
REVIEW: http://review.gluster.org/13177 (performance/write-behind: maintain correct transit size in case of short writes.) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 45 Vijay Bellur 2016-01-06 04:54:27 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#6) for review on master by Raghavendra Talur (rtalur)

Comment 46 Vijay Bellur 2016-01-07 06:00:06 UTC
COMMIT: http://review.gluster.org/13177 committed in master by Raghavendra G (rgowdapp) 
------
commit ea42ffa13c00263a574226626d30749b6b0f3776
Author: Raghavendra G <rgowdapp>
Date:   Tue Jan 5 22:16:31 2016 +0530

    performance/write-behind: maintain correct transit size in case of
    short writes.
    
    1. Imagine a write when cache is filled with failed syncs.
    2. This write won't be unwound since cache size has exceeded
    configured limit.
    3. With trickling-writes on by default, the last write request wont be
    considered for winding when there is non zero in-transit size.
    4. There was a bug in accounting of in-transit size when winds
    resulted in short writes. Due to this bug, in-transit size used to be
    non-zero even when there are no syncs in progress.
    5. Due to 3 and 4, current write request won't be wound till there is
    another write or fsync or flush from application. But application
    can't do any other fop till current write request is unwound. This
    resulted in deadlock and hence application would be hung in 'D'
    state.
    
    This patch fixes bug in accounting of in-transit size during short
    writes.
    
    Change-Id: I04ce8bb510efaaed7623cac38d69b32dbc3730ce
    Signed-off-by: Raghavendra G <rgowdapp>
    BUG: 1279730
    Reviewed-on: http://review.gluster.org/13177
    Tested-by: Gluster Build System <jenkins.com>

Comment 47 Vijay Bellur 2016-01-07 13:24:51 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#7) for review on master by Raghavendra Talur (rtalur)

Comment 48 Vijay Bellur 2016-01-11 07:03:50 UTC
REVIEW: http://review.gluster.org/13117 (glusterd: GD_OP_VERSION should not be a released one) posted (#8) for review on master by Raghavendra Talur (rtalur)

Comment 49 Vijay Bellur 2016-01-11 16:10:14 UTC
COMMIT: http://review.gluster.org/13117 committed in master by Raghavendra G (rgowdapp) 
------
commit 7d4f708b18c1e6c965ebe8c84e14dd69ae4b7859
Author: Raghavendra Talur <rtalur>
Date:   Wed Dec 30 16:19:44 2015 +0530

    glusterd: GD_OP_VERSION should not be a released one
    
    performance.resync-failed-syncs-after-fsync was
    introduced after 3.7.6 was released. Hence it should
    use 3_7_7 as op version not 3_7_6.
    
    
    
    Change-Id: If4def1bf0fdc9fa4938ccb78308bec77eeaa2284
    BUG: 1279730
    Signed-off-by: Raghavendra Talur <rtalur>
    Reviewed-on: http://review.gluster.org/13117
    Reviewed-by: Atin Mukherjee <amukherj>
    Tested-by: Gluster Build System <jenkins.com>

Comment 50 Niels de Vos 2016-06-16 13:43:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.