Bug 1366648 - [GSS] A hot tier brick becomes full, causing the entire volume to have issues and returns stale file handle and input/output error.
Summary: [GSS] A hot tier brick becomes full, causing the entire volume to have issues...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Milind Changire
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1361759 1394482
TreeView+ depends on / blocked
 
Reported: 2016-08-12 14:09 UTC by Milind Changire
Modified: 2017-03-06 17:22 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1361759
: 1394482 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:22:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Vijay Bellur 2016-08-12 14:53:23 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#1) for review on master by Milind Changire (mchangir)

Comment 2 Vijay Bellur 2016-08-13 06:52:48 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#2) for review on master by Milind Changire (mchangir)

Comment 3 Vijay Bellur 2016-08-13 16:53:31 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#3) for review on master by Milind Changire (mchangir)

Comment 4 Vijay Bellur 2016-08-14 09:20:54 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#4) for review on master by Milind Changire (mchangir)

Comment 5 Milind Changire 2016-08-16 07:08:03 UTC
Problem Description:
Hot tier brick gets 100% full even when cluster.watermark-hi has been set to 90.

Analysis:
When IO is started on tiered volumes, there is no test to check if the usage of the brick has breached the hi-watermark. This causes the file to continue to grow beyond the hi-watermark and eventually consume the entire brick space if the IO continues without any checks.

Comment 6 Vijay Bellur 2016-08-16 17:17:18 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#5) for review on master by Milind Changire (mchangir)

Comment 7 Vijay Bellur 2016-08-18 13:25:56 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#6) for review on master by Milind Changire (mchangir)

Comment 8 Vijay Bellur 2016-08-18 21:34:20 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#7) for review on master by Milind Changire (mchangir)

Comment 9 Worker Ant 2016-09-08 13:58:26 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#8) for review on master by Milind Changire (mchangir)

Comment 10 Worker Ant 2016-09-08 14:11:00 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#9) for review on master by Milind Changire (mchangir)

Comment 12 Worker Ant 2016-10-14 06:11:24 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#10) for review on master by Milind Changire (mchangir)

Comment 13 Milind Changire 2016-10-14 06:22:51 UTC
Oonkwee,
We are indeed working/pushing hard to formulate a reasonable solution.
The solution draft needs more discussions on the implementation approach.

Comment 14 Worker Ant 2016-10-15 05:20:06 UTC
REVIEW: http://review.gluster.org/15158 (cluster/tier: handle fast demotions) posted (#11) for review on master by Milind Changire (mchangir)

Comment 15 Worker Ant 2016-10-19 19:51:54 UTC
COMMIT: http://review.gluster.org/15158 committed in master by Dan Lambright (dlambrig) 
------
commit 460016428cf27484c333227f534c2e2f73a37fb1
Author: Milind Changire <mchangir>
Date:   Sat Oct 15 10:49:19 2016 +0530

    cluster/tier: handle fast demotions
    
    Demote files on priority if hi-watermark has been breached and continue
    to demote until the watermark drops below hi-watermark.
    
    Monitor watermark more frequently.
    Trigger demotion as soon as hi-watermark is breached.
    Add cluster.tier-emergency-demote-query-limit option to limit number
    of files returned from the database query for every iteration of
    tier_migrate_using_query_file(). If watermark hasn't dropped below
    hi-watermark during the first iteration, the next iteration will be
    triggered approximately 1 second after tier_demote() returns to the
    main tiering loop.
    Update changetimerecorder xlator to handle query for emergency demote
    mode.
    
    Add tier-ctr-interface.h:
    Move tier and ctr interface specific macros and struct definition from
    libglusterfs/src/gfdb/gfdb_data_store.h to new header
    libglusterfs/src/tier-ctr-interface.h
    
    Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
    BUG: 1366648
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: http://review.gluster.org/15158
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Dan Lambright <dlambrig>

Comment 16 Milind Changire 2016-12-01 12:55:21 UTC
op-version needs fix

Comment 17 Worker Ant 2016-12-01 12:57:11 UTC
REVIEW: http://review.gluster.org/15990 (cluster/tier: fix op-version for tier-query-limit) posted (#1) for review on master by Milind Changire (mchangir)

Comment 18 Worker Ant 2016-12-02 04:07:03 UTC
COMMIT: http://review.gluster.org/15990 committed in master by Atin Mukherjee (amukherj) 
------
commit 530453c78146e8ba4f13636e1dec1ea59849c783
Author: Milind Changire <mchangir>
Date:   Thu Dec 1 18:18:27 2016 +0530

    cluster/tier: fix op-version for tier-query-limit
    
    Correct the op-version for tier-query-limit option from 3.9.0 to 3.9.1
    
    Change-Id: I3a52a94c2708a97c18377e945d559a51d8025c41
    BUG: 1366648
    Signed-off-by: Milind Changire <mchangir>
    Reviewed-on: http://review.gluster.org/15990
    Reviewed-by: Dan Lambright <dlambrig>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 19 Shyamsundar 2017-03-06 17:22:10 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.