Bug 1408361
Summary: | Warning messages throwing when EC volume offline brick comes up are difficult to understand for end user. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
Component: | disperse | Assignee: | Sunil Kumar Acharya <sheggodu> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, aspandey, asrivast, rhinduja, rhs-bugs, sheggodu, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.3.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-19 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1409202 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:30:55 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1409202, 1414347, 1417147, 1417151, 1427089, 1427419, 1435592 |
Description
Byreddy
2016-12-23 05:15:11 UTC
not a blocker for 3.2 can be deferred Fix for the issue is ported to downstream as part of : https://code.engineering.redhat.com/gerrit/#/c/95621/ comment 6 of BUG 1409202 explains the expected output. I am not seeing any such logs (not even the logs reported as part of the bz) Can you confirm what am I missing, else I may have to fail bz [2017-05-02 09:50:11.000192] W [MSGID: 114031] [client-rpc-fops.c:1102:client3_3_getxattr_cbk] 0-ecv-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo [Transport endpoint is not connected] [2017-05-02 09:50:11.000431] W [MSGID: 114031] [client-rpc-fops.c:1102:client3_3_getxattr_cbk] 0-ecv-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001). Key: glusterfs.xattrop_index_gfid [Transport endpoint is not connected] [2017-05-02 09:50:11.000470] W [MSGID: 122032] [ec-heald.c:252:ec_shd_index_sweep] 0-ecv-disperse-0: unable to get index-dir on ecv-client-0 [2017-05-02 09:50:11.005913] W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (1) [2017-05-02 09:50:12.038091] W [MSGID: 122006] [ec-combine.c:206:ec_iatt_combine] 0-ecv-disperse-0: Failed to combine iatt (inode: 9530459126316708641-9530459126316708641, links: 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1883136-1884672, mode: 100644-100644) [2017-05-02 09:50:12.431567] W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (1) [2017-05-02 09:50:14.830091] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f561366bdc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f5614d01ef5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f5614d01d6b] ) 0-: received signum (15), shutting down [2017-05-02 09:51:19.007132] W [MSGID: 122006] [ec-combine.c:206:ec_iatt_combine] 0-ecv-disperse-0: Failed to combine iatt (inode: 12856018975539067138-12856018975539067138, links: 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 872448-871936, mode: 100644-100644) [2017-05-02 09:51:19.007270] W [MSGID: 122006] [ec-combine.c:206:ec_iatt_combine] 0-ecv-disperse-0: Failed to combine iatt (inode: 12856018975539067138-12856018975539067138, links: 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 872448-871936, mode: 100644-100644) [2017-05-02 09:51:19.007704] W [MSGID: 122006] [ec-combine.c:206:ec_iatt_combine] 0-ecv-disperse-0: Failed to combine iatt (inode: 12856018975539067138-12856018975539067138, link I am not sure what was missed but I am able to notice the message on mount log by following the steps given above for recreating the issue. ************************* /var/log/glusterfs/LAB-fuse_mounts-mount.log:[2017-05-02 10:50:17.040814] W [MSGID: 122053] [ec-common.c:161:ec_check_status] 0-ec-vol-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=011, remaining=000, good=011, bad=100) /var/log/glusterfs/LAB-fuse_mounts-mount.log:The message "W [MSGID: 122053] [ec-common.c:161:ec_check_status] 0-ec-vol-disperse-0: Operation failed on 1 of 3 subvolumes.(up=111, mask=011, remaining=000, good=011, bad=100)" repeated 27 times between [2017-05-02 10:50:17.040814] and [2017-05-02 10:50:17.263437] ************************* on_qa validation: Moving to verified as the fix works Note: The logs are seen in mount logs unlike shd logs as was mentioned in the description details of this bz (https://bugzilla.redhat.com/show_bug.cgi?id=1408361#c0) before fix(checked on 3.2 GA build, 3.8.4-18) [2017-05-02 11:56:33.808629] W [MSGID: 122053] [ec-common.c:154:ec_check_status] 0-ec-disperse-0: Operation failed on some subvolumes (up=3F, mask=3E, remaining=0, good=3E, bad=1) [2017-05-02 11:56:35.324088] I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ec-disperse-0: /dir1/file.32: name heal successful on 3F The message "I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ec-disperse-0: /dir1/file.32: name heal successful on 3F" repeated 68 times between [2017-05-02 11:56:35.324088] and [2017-05-02 11:56:36.178563] [2017-05-02 11:56:36.180624] W [MSGID: 122053] [ec-common.c:154:ec_check_status] 0-ec-disperse-0: Operation failed on some subvolumes (up=3F, mask=3E, remaining=0, good=3E, bad=1) [2017-05-02 11:56:36.193575] I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ec-disperse-0: /dir1/file.32: name heal successful on 3F With fix(3.8.4-24) Observed the logs to be more meaningful as mentioned in comment#9 ================== 2: name heal successful on 3FF The message "I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF" repeated 4 times between [2017-05-02 09:49:04.591874] and [2017-05-02 09:49:04.654420] [2017-05-02 09:49:04.665916] W [MSGID: 122006] [ec-combine.c:206:ec_iatt_combine] 0-ecv-disperse-0: Failed to combine iatt (inode: 12307727039240654507-12307727039240654507, links: 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 0-869376, mode: 100644-100644) [2017-05-02 09:49:04.666009] N [MSGID: 122029] [ec-combine.c:93:ec_combine_write] 0-ecv-disperse-0: Mismatching iatt in answers of 'WRITE' [2017-05-02 09:49:04.666481] W [MSGID: 122053] [ec-common.c:161:ec_check_status] 0-ecv-disperse-0: Operation failed on 1 of 10 subvolumes.(up=1111111111, mask=1111111111, remaining=0000000000, good=1111111110, bad=0000000001) [2017-05-02 09:49:04.668391] I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF The message "I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF" repeated 108 times between [2017-05-02 09:49:04.668391] and [2017-05-02 09:49:06.947593] [2017-05-02 09:49:06.960188] W [MSGID: 122053] [ec-common.c:161:ec_check_status] 0-ecv-disperse-0: Operation failed on 1 of 10 subvolumes.(up=1111111111, mask=1111111110, remaining=0000000000, good=1111111110, bad=0000000001) [2017-05-02 09:49:06.992557] I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF The message "I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF" repeated 3 times between [2017-05-02 09:49:06.992557] and [2017-05-02 09:49:07.057389] [2017-05-02 09:49:07.080095] W [MSGID: 122053] [ec-common.c:161:ec_check_status] 0-ecv-disperse-0: Operation failed on 1 of 10 subvolumes.(up=1111111111, mask=1111111110, remaining=0000000000, good=1111111110, bad=0000000001) [2017-05-02 09:49:07.083458] I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF The message "W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (1)" repeated 61 times between [2017-05-02 09:48:25.989120] and [2017-05-02 09:49:03.021721] The message "I [MSGID: 122058] [ec-heal.c:2380:ec_heal_do] 0-ecv-disperse-0: /dir3/filo.22: name heal successful on 3FF" repeated 15 times between [2017-05-02 09:49:07.083458] and [2017-05-02 09:49:07.401948] Waiting for data... (interrupt to abort) *** Bug 1435592 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |