Bug 1427419 - Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user.
Summary: Warning messages throwing when EC volume offline brick comes up are difficult...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: 3.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Sunil Kumar Acharya
QA Contact:
URL:
Whiteboard:
Depends On: 1408361 1409202 1435592
Blocks: 1414347 1427089
TreeView+ depends on / blocked
 
Reported: 2017-02-28 07:31 UTC by Sunil Kumar Acharya
Modified: 2017-03-24 10:24 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.8.10
Clone Of: 1409202
Environment:
Last Closed: 2017-03-18 10:52:28 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Worker Ant 2017-02-28 07:40:59 UTC
REVIEW: https://review.gluster.org/16781 (cluster/ec: Fixing log message) posted (#1) for review on release-3.8 by Sunil Kumar Acharya (sheggodu)

Comment 2 Ashish Pandey 2017-02-28 07:55:23 UTC
Description of problem:
=======================
When any of the EC volume bricks goes down and comes up when IO was happening, getting the below warning messages in self heal daemon log (shd log), end user can't understand problem is with which sub volumes, we are printing the hex decimal values for subvolumes, enduser has to do lot of maths to know the sub volumes.

We have to improve these warning messages for end user to understand.



[2016-12-23 04:52:00.658995] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:00.659085] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:00.812666] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:00.812709] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.053575] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:01.053651] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.059907] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2016-12-23 04:52:01.059983] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument]
[2016-12-23 04:52:01.085491] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-9.el6rhs.x86_64.

 
How reproducible:
=================
Always


Steps to Reproduce
===================
1. Have basic recommended EC volume setup.
2. Fuse mount the volume.
3. Make one brick down and start IO in the mount point.
4. after some time of IO happens, brick up the offline brick using volume start force.
5. Check the self heal daemon logs for above mentioned warning messages.

Actual results:
===============
Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user.

Expected results:
=================
Improve the warning messages throwing when EC volume offline brick comes up to make end user to understand.

Comment 3 Worker Ant 2017-03-10 08:33:11 UTC
COMMIT: https://review.gluster.org/16781 committed in release-3.8 by jiffin tony Thottan (jthottan) 
------
commit a76304cd434028215de39cf3b45672cc7ec6ca70
Author: Sunil Kumar H G <sheggodu>
Date:   Fri Dec 30 14:11:15 2016 +0530

    cluster/ec: Fixing log message
    
    Updating the warning message with details to improve
    user understanding.
    
    >BUG: 1409202
    >Change-Id: I001f8d5c01c97fff1e4e1a3a84b62e17c025c520
    >Signed-off-by: Sunil Kumar H G <sheggodu>
    >Reviewed-on: http://review.gluster.org/16315
    >Tested-by: Sunil Kumar Acharya
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Xavier Hernandez <xhernandez>
    
    BUG: 1427419
    Change-Id: I34a869d7cd7630881c897e0e4ecac367cd2820f9
    Signed-off-by: Sunil Kumar Acharya <sheggodu>
    Reviewed-on: https://review.gluster.org/16781
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Ashish Pandey <aspandey>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>

Comment 4 Niels de Vos 2017-03-18 10:52:28 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.10, please open a new bug report.

glusterfs-3.8.10 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-March/000068.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.