Bug 1609451
| Summary: | [Tracker for gluster bug 1609450] Bricks are marked as down, after node reboot | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> | ||||
| Component: | rhhi | Assignee: | Gobinda Das <godas> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | rhgs-3.3 | CC: | guillaume.pavese, rhs-bugs, storage-qa-internal, vbellur | ||||
| Target Milestone: | --- | Keywords: | Reopened, Tracking | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
When a node rebooted, including as part of upgrades or updates, subsequent runs of `gluster volume status` sometimes incorrectly reported that bricks were not running, even when the relevant `glusterfsd` processes were running as expected. State is now reported correctly in these circumstances.
|
Story Points: | --- | ||||
| Clone Of: | 1609450 | Environment: |
RHHI
|
||||
| Last Closed: | 2020-08-17 15:16:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1609450, 1758438 | ||||||
| Bug Blocks: | 1548985 | ||||||
| Attachments: |
|
||||||
|
Description
SATHEESARAN
2018-07-28 01:27:20 UTC
I have a suspicion around the following messages in glusterd logs <snip> [2018-07-27 15:49:12.961132] I [MSGID: 106493] [glusterd-rpc-ops.c:693:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 6715c775-6021-4f21-a669-83bee56e55c5 [2018-07-27 15:49:12.967504] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-07-27 15:49:12.972686] I [MSGID: 106005] [glusterd-handler.c:6122:__glusterd_brick_rpc_notify] 0-management: Brick rhsqa-grafton12.lab.eng.blr.redhat.com:/gluster_bricks/data/data has disconnected from glusterd. [2018-07-27 15:49:12.980700] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-07-27 15:49:12.986954] I [MSGID: 106005] [glusterd-handler.c:6122:__glusterd_brick_rpc_notify] 0-management: Brick rhsqa-grafton12.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine has disconnected from glusterd. [2018-07-27 15:49:12.993857] I [socket.c:2465:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-07-27 15:49:13.000230] I [MSGID: 106005] [glusterd-handler.c:6122:__glusterd_brick_rpc_notify] 0-management: Brick rhsqa-grafton12.lab.eng.blr.redhat.com:/gluster_bricks/vmstore/vmstore has disconnected from glusterd. </snip> Closed as per the status of the tracking bz I have hit the same issue while upgrading from RHHI-V 1.1 to RHHI-V 1.5 RHHI-V 1.1 - glusterfs-3.8.4-15.8.el7rhgs RHHI-V 1.5 - glusterfs-3.12.2-25.el7rhgs Post upgrade, the RHVH node was rebooted, whrn the node came up, I could issue gluster volume status and noticed that the brick was down, but after investigating the brick process, those were up and running. So re-opening the bug Created attachment 1497833 [details]
glusterd.log
Attaching the glusterd.log from the host (RHVH node) which was upgraded and rebooted
This issue to be documented as the known_issue. Workaround: Restart glusterd on the RHVH node post upgrade/update and reboot. Adding this as a known_issue for RHHI-V 1.6 (In reply to SATHEESARAN from comment #8) > Adding this as a known_issue for RHHI-V 1.6 Why? You could not reproduce it (https://bugzilla.redhat.com/show_bug.cgi?id=1609450#c28) - please close both. (In reply to Yaniv Kaul from comment #9) > (In reply to SATHEESARAN from comment #8) > > Adding this as a known_issue for RHHI-V 1.6 > > Why? You could not reproduce it > (https://bugzilla.redhat.com/show_bug.cgi?id=1609450#c28) - please close > both. Yaniv, I think you have misunderstood the issue. I could not hit issue only with DEBUG logs enabled. The issue is still seen particularly during the upgrade/update, that the brick process is not show in 'gluster volume status' but it was still running in the node. @Laura, doc_text looks good. I have edited the text to make sure that this issue is not hit sometimes, and not all the times This issue is no longer seen with RHHI-V 1.8 with RHV 4.4 |