Bug 1836233
Summary: | RHHI-V 1.6: one host becomes non-operational | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | SATHEESARAN <sasundar> |
Component: | sharding | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED ERRATA | QA Contact: | SATHEESARAN <sasundar> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.5 | CC: | atoborek, dberry, kdhananj, mpandey, pprakash, puebele, rafrojas, rhs-bugs, rkothiya, sheggodu, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.5.z Batch Update 2 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-6.0-35 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | 1815192 | Environment: | |
Last Closed: | 2020-06-16 06:20:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1815192 |
Description
SATHEESARAN
2020-05-15 13:00:04 UTC
There are two parts to this issue - 1. The crash itself which is fixed by https://review.gluster.org/#/c/glusterfs/+/24244/ This is a simple null-dereference bug and the patch is also non-intrusive. 2. The second issue is the source of the ESTALE error itself which leads to error logging being triggered in shard where the crash happens. The ESTALE issue is seen only in distributed-replicate RHHI volumes. This is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1835180 as a separate bug in DHT, the source of the ESTALE errors. The combined issue is such that as long as the ESTALE error persists, the mount process keeps crashing in the same place every time, even after restarts. So far two customers have hit this bug. Fix to 1) has been merged upstream. If we at least take fix to 1) in, customers wont be seeing a service disruption in terms of mounts crashing. The fuse mount point will continue to operate smoothly without interruption. Just that background deletion will keep failing with ESTALE but that's ok, it's a background job. -Krutika Verified with RHGS 3.5.2 - glusterfs-6.0-37.el8rhgs and RHVH 4.4.1 1. Created RHHI-V deployment with glusterfs storage domain for hosted-engine storage 2. Created few more gluster storage domains with 2x3 gluster replica volumes 3. Ran all sorts of workload and VM life cycle operations 4. Added more bricks and expanded the volume in to 3X3 5. Triggered rebalance 6. Removed the set of bricks With all these operations, there are no errors seen. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2572 *** Bug 1851526 has been marked as a duplicate of this bug. *** *** Bug 2128703 has been marked as a duplicate of this bug. *** |