Bug 1877413
Summary: | ceph: problems with clusters containing nodes on s390x for some specific configurations and workloads | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | eshiskki | |
Component: | RADOS | Assignee: | Neha Ojha <nojha> | |
Status: | CLOSED ERRATA | QA Contact: | Manohar Murthy <mmurthy> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.2 | CC: | akupczyk, asakthiv, bhubbard, bniver, ceph-eng-bugs, ceph-qe-bugs, dzafman, eshiskki, gmeno, hannsj_uhl, kchai, kdreyer, madam, nojha, ocs-bugs, pdonnell, rzarzyns, sseshasa, tserlin, uweigand, vereddy | |
Target Milestone: | --- | |||
Target Release: | 4.2 | |||
Hardware: | s390x | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | ceph-14.2.11-67.el8cp, ceph-14.2.11-67.el7cp | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1895040 (view as bug list) | Environment: | ||
Last Closed: | 2021-01-12 14:57:01 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1895040 |
Description
eshiskki
2020-09-09 15:08:51 UTC
The root cause of the problems was identified as a number of endian bugs during Ceph message encoding/decoding that were causing various issues on IBM Z. Proposed fixups were sent to upstream. The related upstream pull requests are: https://github.com/ceph/ceph/pull/35920 (msg/msg_types: entity_addrvec_t: fix decode on big-endian hosts) This is probably not critical on its own, but it did cause ceph unit test failures. https://github.com/ceph/ceph/pull/36697 (messages,mds: Fix decoding of enum types on big-endian systems) This showed up as MON daemons crashing whenever they receive a HEALTH_WARN message (and probably other problems). https://github.com/ceph/ceph/pull/36992 (include/encoding: Fix encode/decode of float types on big-endian systems) This causes immediate crashes of an IBM Z OSD attempting to join an x86 based Ceph cluster (and probably other problems). Does not seem to be an OCS bug. Moving to RHCS. Eduard please follow these instructions to get this in the product repo https://mojo.redhat.com/docs/DOC-1221008 let me know if you run into trouble *** Bug 1890640 has been marked as a duplicate of this bug. *** ok noted, removing this bz from the release notes. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0081 |