Bug 2061675
| Summary: | [IBM Z][External Mode] - segfault in rook objectstore controller when in external mode (ocs-ci tier1 execution) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Abdul Kandathil (IBM) <akandath> |
| Component: | rook | Assignee: | Blaine Gardner <brgardne> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Abdul Kandathil (IBM) <akandath> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot, sostapov, vavuthu |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.10.0 | ||
| Hardware: | s390x | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 4.10.0-189 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-21 09:12:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Abdul Kandathil (IBM)
2022-03-08 09:55:37 UTC
ODF cluster status after the tier execution:
--------
[root@m4204001 ~]# oc -n openshift-storage get pod
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-59588d8bc6-xhfg4 2/2 Running 0 17h
csi-cephfsplugin-87bh5 3/3 Running 0 17h
csi-cephfsplugin-jpdzv 3/3 Running 0 17h
csi-cephfsplugin-provisioner-6547889564-58bts 6/6 Running 0 11h
csi-cephfsplugin-provisioner-6547889564-v4hgw 6/6 Running 0 17h
csi-cephfsplugin-wpxxq 3/3 Running 0 17h
csi-rbdplugin-hg9ml 4/4 Running 0 17h
csi-rbdplugin-provisioner-6f4cd57fcb-t4c47 7/7 Running 0 17h
csi-rbdplugin-provisioner-6f4cd57fcb-xdhgj 7/7 Running 0 17h
csi-rbdplugin-s8mc6 4/4 Running 0 17h
csi-rbdplugin-vv42j 4/4 Running 0 17h
noobaa-core-0 1/1 Running 0 11h
noobaa-db-pg-0 1/1 Running 0 17h
noobaa-endpoint-5f946897b6-bqhkv 1/1 Running 0 11h
noobaa-endpoint-5f946897b6-ztzzj 1/1 Running 0 17h
noobaa-operator-86dc95f87c-tdnp2 1/1 Running 0 11h
ocs-metrics-exporter-b76b778f5-snvk9 1/1 Running 0 11h
ocs-operator-7445966997-pv79k 1/1 Running 0 17h
odf-console-759876895-gl9k9 1/1 Running 0 17h
odf-operator-controller-manager-5fcb6d85cc-2rrft 2/2 Running 0 17h
rook-ceph-operator-6f87b7f4d8-2jzr4 0/1 CrashLoopBackOff 143 (4m25s ago) 11h
rook-ceph-tools-external-7b6558594f-4cqkp 1/1 Running 0 11h
[root@m4204001 ~]#
[root@m4204001 ~]# oc -n openshift-storage get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
ocs-external-storagecluster-cephcluster 17h Connecting Attempting to connect to an external Ceph cluster HEALTH_OK true
[root@m4204001 ~]#
[root@m4204001 ~]# oc -n openshift-storage get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ocs-external-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 17h
ocs-external-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 17h
ocs-external-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 17h
openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 17h
[root@m4204001 ~]#
[root@m4204001 ~]# oc -n openshift-storage get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded
ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Installing
odf-csi-addons-operator.v4.10.0 CSI Addons 4.10.0 Succeeded
odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 Succeeded
[root@m4204001 ~]#
--------
RHCS cluster status :
--------
[root@xzkvm01 ~]# ceph -s
cluster:
id: bdabda3c-9b00-11ec-9831-525400e56e5d
health: HEALTH_OK
services:
mon: 3 daemons, quorum xzkvm01,xzkvm02,xzkvm03 (age 4d)
mgr: xzkvm01.zblcqg(active, since 4d), standbys: xzkvm02.wippoc
mds: 1/1 daemons up, 2 standby
osd: 3 osds: 3 up (since 4d), 3 in (since 4d)
rgw: 4 daemons active (2 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 10 pools, 241 pgs
objects: 14.69k objects, 53 GiB
usage: 61 GiB used, 539 GiB / 600 GiB avail
pgs: 241 active+clean
io:
client: 5.3 KiB/s wr, 0 op/s rd, 0 op/s wr
[root@xzkvm01 ~]#
--------
The operator log shows the following stack: 2022-03-08T08:48:40.962801533Z panic: runtime error: invalid memory address or nil pointer dereference 2022-03-08T08:48:40.962801533Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf81faa] 2022-03-08T08:48:40.962801533Z 2022-03-08T08:48:40.962801533Z goroutine 1077 [running]: 2022-03-08T08:48:40.962801533Z github.com/rook/rook/pkg/apis/ceph.rook.io/v1.(*CephObjectStore).GetObjectKind(0x0, 0x0, 2022-03-08T08:48:40.962885446Z 0x0) 2022-03-08T08:48:40.962885446Z <autogenerated>:1 +0xa 2022-03-08T08:48:40.962885446Z github.com/rook/rook/pkg/operator/ceph/reporting.ReportReconcileResult(0xc0001821e0, 0x1de5be0, 0xc000e09e00, 0x1e1c010, 0x0, 0xc00086eed0, 0x0, 0x1dacdd8, 0xc0006ed6c8, 0xc0006ed6c8, ...) 2022-03-08T08:48:40.962885446Z /remote-source/rook/app/pkg/operator/ceph/reporting/reporting.go:46 +0x3c 2022-03-08T08:48:40.962885446Z github.com/rook/rook/pkg/operator/ceph/object.(*ReconcileCephObjectStore).Reconcile(0xc000969760, 0x1debeb8, 0xc00086eed0, 0xc000a8b8f0, 0x11, 0xc0008a0270, 0x2b, 0xc00086eed0, 0xc00086ee70, 0x30, ...) 2022-03-08T08:48:40.962885446Z /remote-source/rook/app/pkg/operator/ceph/object/controller.go:159 +0xac 2022-03-08T08:48:40.962885446Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc00069f040, 0x1debeb8, 0xc00086ee70, 0xc000a8b8f0, 0x11, 0xc0008a0270, 0x2b, 0xc00086ee70, 0x0, 0x0, ...) 2022-03-08T08:48:40.962885446Z /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x220 2022-03-08T08:48:40.962885446Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00069f040, 0x1debe10, 0xc0000d2f80, 0x1863620, 0xc0007522a0) 2022-03-08T08:48:40.962885446Z /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x29c 2022-03-08T08:48:40.962885446Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00069f040, 0x1debe10, 0xc0000d2f80, 0x02022-03-08T08:48:40.962896763Z ) 2022-03-08T08:48:40.962896763Z /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x206 2022-03-08T08:48:40.962896763Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc000044e90, 0xc00069f040, 0x1debe10, 0xc0000d2f80) 2022-03-08T08:48:40.962905747Z /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x5c 2022-03-08T08:48:40.962905747Z created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 2022-03-08T08:48:40.962916472Z /remote-source/rook/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x3ba This is coming from the source here: https://github.com/red-hat-storage/rook/blob/release-4.10/pkg/operator/ceph/reporting/reporting.go#L46 Blaine could you take a look? I believe I found the source of the issue and am working on a fix upstream. PR backport to ODF 4.10 here: https://github.com/red-hat-storage/rook/pull/358 This fix has been verified. Not able to reproduce the issue anymore. *** Bug 2064763 has been marked as a duplicate of this bug. *** |