Bug 1272398 - Data Tiering:Lot of Promotions/Demotions failed error messages
Summary: Data Tiering:Lot of Promotions/Demotions failed error messages
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: 3.7.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Dan Lambright
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1272409 1273043 glusterfs-3.7.6 1295700 1311865
TreeView+ depends on / blocked
 
Reported: 2015-10-16 10:32 UTC by Nag Pavan Chilakam
Modified: 2016-02-25 08:58 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.7.6
Clone Of:
: 1272409 (view as bug list)
Environment:
Last Closed: 2015-11-17 06:01:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2015-10-16 10:32:12 UTC
Description of problem:
======================
On QE Setups like the one used for longevity and stress, we are seeing lot of promote/demote failed error messages in tier log.
If they are genuine, then we need to find the reason and fix them accordingly.
If they are spurious errors, we need to clean up such errors before the product release, as it is really difficult for debugging or root causing other issues



Version-Release number of selected component (if applicable):
=======================================================
glusterfs-server-3.7.5-0.19.git0f5c3e8.el7.centos.x86_64

Steps Carried:
==============

1. Created 12 node cluster
2. Create tiered volume with Hot tier as (6 x 2) and Cold tier as (2 x (6 + 2) = 16)
3. Fuse Mount the volume on 3 clients RHEL7.2,RHEl7.1 and RHEL6.7
4. Start creating data from each client:

Client 1:
=========
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/

Client 2:
=========
[root@mia ~]# cd /mnt/fuse/
[root@mia fuse]# for i in {1..10}; do cp -rf /etc etc.$i ; sleep 100 ; done

Client 3:
=========
[root@wingo fuse]# for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; sleep 10 ; done

5. After a while, the data creation of client 1 and client 2 should be completed while the data creation from client 3 will still be inprogress

6. At this point the data creation will be of only 1 file from client 3 in every 10 sec.

7. Monitor the cpu usage using top

Comment 2 Vijay Bellur 2015-10-19 18:00:00 UTC
REVIEW: http://review.gluster.org/12394 (cluster/tier remove suprious log messages on valid failed migration) posted (#1) for review on release-3.7 by Dan Lambright (dlambrig)

Comment 3 Vijay Bellur 2015-10-19 18:18:18 UTC
REVIEW: http://review.gluster.org/12395 (cluster/tier update man pages for tier feature) posted (#1) for review on release-3.7 by Dan Lambright (dlambrig)

Comment 4 Vijay Bellur 2015-10-19 20:00:27 UTC
COMMIT: http://review.gluster.org/12394 committed in release-3.7 by Dan Lambright (dlambrig) 
------
commit 6fe5d09826542c37626f8f63299d6bce4671c34f
Author: Dan Lambright <dlambrig>
Date:   Mon Oct 19 09:04:07 2015 -0400

    cluster/tier remove suprious log messages on valid failed migration
    
    Backport fix 12391
    
    > On a write to a replica volume, we record in all brick's databases an entry.
    > When the tier daemon runs, it will only move the file if it is the true
    > owner of the file as defined by the XATTR_NODE_UUID_KEY.
    
    > Change-Id: Ib82717f87a3f94f3d0d9f969773de9e88d6aaf22
    > BUG: 1273043
    > Signed-off-by: Dan Lambright <dlambrig>
    > Reviewed-on: http://review.gluster.org/12391
    > Reviewed-by: Joseph Fernandes
    > Tested-by: NetBSD Build System <jenkins.org>
    > Tested-by: Gluster Build System <jenkins.com>
    Signed-off-by: Dan Lambright <dlambrig>
    
    Change-Id: I12147f878cd1927f845867fb7c0b84c4db017ee1
    BUG: 1272398
    Reviewed-on: http://review.gluster.org/12394
    Reviewed-by: Joseph Fernandes
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Dan Lambright <dlambrig>
    Tested-by: Dan Lambright <dlambrig>

Comment 5 Vijay Bellur 2015-10-19 21:48:50 UTC
COMMIT: http://review.gluster.org/12395 committed in release-3.7 by Dan Lambright (dlambrig) 
------
commit 05ad7bc4e15b1b0d50d406cdc26402963b22ac77
Author: Dan Lambright <dlambrig>
Date:   Mon Oct 19 14:16:42 2015 -0400

    cluster/tier update man pages for tier feature
    
    Add to gluster man pages instructions for tier commands.
    
    Backport fix 12391
    
    > Change-Id: I0918460eeaba22bb6a11238d4f5501fa8e61da88
    > BUG: 1272557
    > Signed-off-by: Dan Lambright <dlambrig>
    > Reviewed-on: http://review.gluster.org/12380
    > Tested-by: NetBSD Build System <jenkins.org>
    > Reviewed-by: N Balachandran <nbalacha>
    
    Change-Id: I2cc16defb2eeb56075357c32d4ef71d6869891bb
    BUG: 1272398
    Signed-off-by: Dan Lambright <dlambrig>
    Reviewed-on: http://review.gluster.org/12395
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System <jenkins.org>

Comment 6 Raghavendra Talur 2015-10-28 07:35:11 UTC
Dan,

The patch merged is only for man page, does this complete the bug fix?
If not we can move the state back to assigned on this.

Comment 7 Dan Lambright 2015-10-29 20:36:49 UTC
There was a clerical error on my part. The man page fix should not have been associated with this bug. 

I will move the fix back to assigned and propagate the correct fix to 3.7, which has already been merged upstream.

Comment 8 Vijay Bellur 2015-10-29 20:45:31 UTC
REVIEW: http://review.gluster.org/12465 (cluster/tier do not log error message on lookup heal for files on hot tier) posted (#1) for review on release-3.7 by Dan Lambright (dlambrig)

Comment 9 Vijay Bellur 2015-10-29 20:46:19 UTC
REVIEW: http://review.gluster.org/12465 (cluster/tier dont log error on lookup heal for files on hot tier) posted (#2) for review on release-3.7 by Dan Lambright (dlambrig)

Comment 10 Vijay Bellur 2015-10-30 12:53:36 UTC
COMMIT: http://review.gluster.org/12465 committed in release-3.7 by Dan Lambright (dlambrig) 
------
commit c360e8d3e33ac02a3bdb11d16fa4f638fc7dea9c
Author: Dan Lambright <dlambrig>
Date:   Mon Oct 26 14:19:24 2015 -0400

    cluster/tier dont log error on lookup heal for files on hot tier
    
    This is a backport of 12430
    
    On fix-layout heal files are scanned. Files found are exist on the hot or cold
    subvolume. Those not found in the cold tier would exist on the hot. They
    should not be flagged as an error.
    
    Replace INFO with TRACE for common tier migration logs. Frequent migration
    was growing the log files too quickly.
    
    On migratation failures, do not acrue files towards cycle limit's budget.
    
    > Change-Id: Ie832ee07c43bce5477ae81c939d1fe8416a11615
    > BUG: 1275383
    > Signed-off-by: Dan Lambright <dlambrig>
    > Reviewed-on: http://review.gluster.org/12430
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Joseph Fernandes
    Signed-off-by: Dan Lambright <dlambrig>
    
    Change-Id: Ia1ce5c3ac9c8c43cf3f3f7e0bd6161aa13affe5f
    BUG: 1272398
    Signed-off-by: Dan Lambright <dlambrig>
    Reviewed-on: http://review.gluster.org/12465
    Tested-by: Gluster Build System <jenkins.com>

Comment 11 Raghavendra Talur 2015-11-17 06:01:01 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.