Bug 1427419
Summary: | Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user. | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sunil Kumar Acharya <sheggodu> |
Component: | disperse | Assignee: | Sunil Kumar Acharya <sheggodu> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.8 | CC: | aflyhorse, amukherj, aspandey, bsrirama, bugs, nchilaka, rhs-bugs, sheggodu, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.10 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1409202 | Environment: | |
Last Closed: | 2017-03-18 10:52:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1408361, 1409202, 1435592 | ||
Bug Blocks: | 1414347, 1427089 |
Comment 1
Worker Ant
2017-02-28 07:40:59 UTC
Description of problem: ======================= When any of the EC volume bricks goes down and comes up when IO was happening, getting the below warning messages in self heal daemon log (shd log), end user can't understand problem is with which sub volumes, we are printing the hex decimal values for subvolumes, enduser has to do lot of maths to know the sub volumes. We have to improve these warning messages for end user to understand. [2016-12-23 04:52:00.658995] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2016-12-23 04:52:00.659085] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument] [2016-12-23 04:52:00.812666] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2016-12-23 04:52:00.812709] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument] [2016-12-23 04:52:01.053575] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2016-12-23 04:52:01.053651] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument] [2016-12-23 04:52:01.059907] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1) [2016-12-23 04:52:01.059983] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 0-Disperse1-disperse-0: Heal failed [Invalid argument] [2016-12-23 04:52:01.085491] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-Disperse1-disperse-0: Operation failed on some subvolumes Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-9.el6rhs.x86_64. How reproducible: ================= Always Steps to Reproduce =================== 1. Have basic recommended EC volume setup. 2. Fuse mount the volume. 3. Make one brick down and start IO in the mount point. 4. after some time of IO happens, brick up the offline brick using volume start force. 5. Check the self heal daemon logs for above mentioned warning messages. Actual results: =============== Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user. Expected results: ================= Improve the warning messages throwing when EC volume offline brick comes up to make end user to understand. COMMIT: https://review.gluster.org/16781 committed in release-3.8 by jiffin tony Thottan (jthottan) ------ commit a76304cd434028215de39cf3b45672cc7ec6ca70 Author: Sunil Kumar H G <sheggodu> Date: Fri Dec 30 14:11:15 2016 +0530 cluster/ec: Fixing log message Updating the warning message with details to improve user understanding. >BUG: 1409202 >Change-Id: I001f8d5c01c97fff1e4e1a3a84b62e17c025c520 >Signed-off-by: Sunil Kumar H G <sheggodu> >Reviewed-on: http://review.gluster.org/16315 >Tested-by: Sunil Kumar Acharya >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Xavier Hernandez <xhernandez> BUG: 1427419 Change-Id: I34a869d7cd7630881c897e0e4ecac367cd2820f9 Signed-off-by: Sunil Kumar Acharya <sheggodu> Reviewed-on: https://review.gluster.org/16781 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Ashish Pandey <aspandey> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.10, please open a new bug report. glusterfs-3.8.10 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-March/000068.html [2] https://www.gluster.org/pipermail/gluster-users/ |