Bug 1870631
| Summary: | OCS 4.6 Deployment : RGW pods went into 'CrashLoopBackOff' state on Z Platform | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Venkat <vpiniset> |
| Component: | ceph | Assignee: | Matt Benjamin (redhat) <mbenjamin> |
| Status: | CLOSED ERRATA | QA Contact: | Raz Tamir <ratamir> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | bniver, ebenahar, jthottan, kdreyer, madam, muagarwa, ocs-bugs, sostapov, tdesala, uweigand |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | OCS 4.6.0 | Flags: | mbenjamin:
needinfo-
mbenjamin: needinfo- |
| Hardware: | s390x | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-12-17 06:23:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Venkat
2020-08-20 13:41:05 UTC
Mark Kogan <mkogan> Aug 18, 2020, 5:55 PM (2 days ago) to Yaniv, me, rhocs-eng, poornima.nayak, Chidanand, Ulrich, OCS-QE, Nourhane, Matt Hello, From the log: "debug 2020-08-18 07:22:12.763 3ffbb8df110 0 WARNING: skipping unknown framework: beast" Last checked the Boost library version that we used (1.67), did not support the Boost.Context library on the Z platform thus s390x builds are configured with the -DWITH_BOOST_CONTEXT=OFF CMake flag which disables the beast frontend. Please check that changing the rook-ceph-rgw* pods configuration to use the civetweb frontend resolves the issue. Following this mail, re-checked the current status of this limitation and the circumstances have changed, ceph version 14.2.8-91.el8cp was updated to build with a newer version of the boost library - boost 1.72 which per the boost context library documentation[1] has added support for s390x architecture. If it's possible to arrange access to an s390x development VM (with RHEL or Fedora) for vstart environment, would re-test the Beast framework for compilation and functional issues with boost 1.72 on the Z platform. [1]https://www.boost.org/doc/libs/1_72_0/libs/context/doc/html/context/architectures.html Regards, Mark Ulrich Weigand Aug 18, 2020, 9:02 PM (2 days ago) to Ken, Mark, Chidanand, Matt, Nourhane, OCS-QE, poornima.nayak, rhocs-eng, me, Yaniv Mark Kogan <mkogan> wrote on 18.08.2020 14:25:12: > From the log: > "debug 2020-08-18 07:22:12.763 3ffbb8df110 0 WARNING: skipping > unknown framework: beast" [snip] > Following this mail, re-checked the current status of this > limitation and the circumstances have changed, > ceph version 14.2.8-91.el8cp was updated to build with a newer > version of the boost library - boost 1.72 > which per the boost context library documentation[1] has added > support for s390x architecture. Turns out the support in 1.72 was incomplete, we've added full support in 1.73. But for the RH Ceph builds, we provided a backport of the necessary changes as a patch against 1.72, which I understand should have make boost context (and therefore the beast frontend) work properly on Z. Ken Dreyer worked on integrating this into the latest Ceph builds. Ken, is this supposed to be working now? > If it's possible to arrange access to an s390x development VM (with > RHEL or Fedora) for vstart environment, > would re-test the Beast framework for compilation and functional > issues with boost 1.72 on the Z platform. There is supposed to be a dev environment available to Christina Meno's team, but I'm not sure this is already fully set up ... Bye, Ulrich Elad, is this really a blocker? Is it consistently seen or QE is blocked because of this?
There is even a workaround avaialble if I am not wrong and seems more to be a retest issue.
>> Please check that changing the rook-ceph-rgw* pods configuration to use the civetweb frontend resolves the issue.
Hi Scott, Can someone help us in determining the correct build and therefore appropriate bug state and release. Thanks Mudit, OCS QE are not the ones who actively test over IBM Z. I added the blocker? flag so this but will not be pushed out to 4.7, in order to allow IBM team to have a successful deployment in their test executions apologies for the delay. I've tested with latest build and issue is not hitting now. [root@ocsvm2 ~]# oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-585.ci OpenShift Container Storage 4.6.0-585.ci Succeeded [root@ocsvm2 ~]# oc get pods|grep rgw rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-78d6f7dw9vbd 1/1 Running 0 31h rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-769755drwjtb 1/1 Running 0 31h [root@ocsvm2 ~]# Clearing the needinfo. Providing dev_ack based on https://bugzilla.redhat.com/show_bug.cgi?id=1870631#c8, there is no fix from OCS side though. Thanks Matt, I think it is clear now. OCS 4.6 is already based on RHCS 4.1z2 which means this issue should have been fixed by now and that is what is being reflected from Venkat's update. Moving the BZ to ON_QA, QE can mark it VERIFIED. I've verified this on latest version of ocs 4.6 and its fixed now. Hence, this can be closed. [root@ocplnx31 ~]# oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-607.ci OpenShift Container Storage 4.6.0-607.ci Succeeded [root@ocplnx31 ~]# oc get pods|grep rgw rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-59cbf87jrcn7 1/1 Running 0 4m49s rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-7d6fdfdjvqxg 1/1 Running 0 4m49s [root@ocplnx31 ~]# oc version Client Version: 4.5.16 Server Version: 4.5.15 Kubernetes Version: v1.18.3+2fbd7c7 [root@ocplnx31 ~]# Thank you Venkat. Moving this BZ to verified state based on Comment20. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605 |