Bug 2075581
| Summary: | [IBM Z] : ODF 4.11.0-38 deployment leaves the storagecluster in "Progressing" state although all the openshift-storage pods are up and Running | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Sravika <sbalusu> |
| Component: | ocs-operator | Assignee: | Travis Nielsen <tnielsen> |
| Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.11 | CC: | bniver, jarrpa, madam, muagarwa, nberry, nigoyal, ocs-bugs, odf-bz-bot, prsurve, sostapov, svenkat, tmuthami, tnielsen, vavuthu |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.11.0-63 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-24 13:51:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Nitin, PTAL storageCluster is waiting on noobaa as it is shown in the status of storagecluster
conditions:
- lastHeartbeatTime: "2022-04-14T15:27:04Z"
lastTransitionTime: "2022-04-14T14:49:18Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: ReconcileComplete
- lastHeartbeatTime: "2022-04-14T14:45:51Z"
lastTransitionTime: "2022-04-14T14:45:51Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Available
- lastHeartbeatTime: "2022-04-14T15:27:04Z"
lastTransitionTime: "2022-04-14T14:45:51Z"
message: Waiting on Nooba instance to finish initialization
reason: NoobaaInitializing
status: "True"
type: Progressing
- lastHeartbeatTime: "2022-04-14T14:45:51Z"
lastTransitionTime: "2022-04-14T14:45:51Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Degraded
- lastHeartbeatTime: "2022-04-14T14:45:51Z"
lastTransitionTime: "2022-04-14T14:45:51Z"
message: Initializing StorageCluster
reason: Init
status: Unknown
type: Upgradeable
While looking at noobaa status it says object storage is not ready
conditions:
- lastHeartbeatTime: "2022-04-14T14:49:18Z"
lastTransitionTime: "2022-04-14T14:49:18Z"
message: Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready
reason: TemporaryError
status: "False"
type: Available
- lastHeartbeatTime: "2022-04-14T14:49:18Z"
lastTransitionTime: "2022-04-14T14:49:18Z"
message: Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready
reason: TemporaryError
status: "True"
type: Progressing
- lastHeartbeatTime: "2022-04-14T14:49:18Z"
lastTransitionTime: "2022-04-14T14:49:18Z"
message: Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready
reason: TemporaryError
status: "False"
type: Degraded
- lastHeartbeatTime: "2022-04-14T14:49:18Z"
lastTransitionTime: "2022-04-14T14:49:18Z"
message: Ceph objectstore user "noobaa-ceph-objectstore-user" is not ready
reason: TemporaryError
status: "False"
type: Upgradeable
While looking at the cephobjectstore found the connection refused.
status:
bucketStatus:
details: 'failed to get details from ceph object user "rook-ceph-internal-s3-user-checker-f770bb8b-fafb-44d7-b909-13581aeb1e46":
Get "https://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc:443/admin/user?display-name=rook-ceph-internal-s3-user-checker-f770bb8b-fafb-44d7-b909-13581aeb1e46&format=json&uid=rook-ceph-internal-s3-user-checker-f770bb8b-fafb-44d7-b909-13581aeb1e46":
dial tcp 172.30.244.15:443: connect: connection refused'
health: Failure
lastChanged: "2022-04-14T15:29:09Z"
lastChecked: "2022-04-14T15:30:10Z"
info:
endpoint: http://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc:80
secureEndpoint: https://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc:443
observedGeneration: 1
phase: Failure
@tnielsen Can someone from rook pls take a look
The rgw deployment status shows:
message: 'pods "rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-74bcdc586b-"
is forbidden: error looking up service account openshift-storage/rook-ceph-rgw:
serviceaccount "rook-ceph-rgw" not found'
The RGW pod failed to start because the new rook-ceph-rgw service account in 4.11 must not be found in the CSV.
With the resync to downstream 4.11 in https://github.com/red-hat-storage/rook/pull/370, the next build should now pick up the fix to create the rook-ceph-rgw service account with the csv. Still seeing the problem. The RGW pod is not getting created:
[root@nx124-411-592e-sao01-bastion-0 ~]# oc get csv odf-operator.v4.11.0 -n openshift-storage -o yaml | grep "full_version"
full_version: 4.11.0-46
[root@nx124-411-592e-sao01-bastion-0 ~]# oc rsh -n openshift-storage rook-ceph-tools-559b64cbb4-r5swn ceph -s
cluster:
id: 86f69cae-50b3-4f73-b12b-4c30ba08bad9
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,c,b (age 4h)
mgr: a(active, since 4h)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 4h), 3 in (since 4h)
data:
volumes: 1/1 healthy
pools: 11 pools, 177 pgs
objects: 1.15k objects, 3.5 GiB
usage: 11 GiB used, 1.5 TiB / 1.5 TiB avail
pgs: 177 active+clean
io:
client: 853 B/s rd, 10 KiB/s wr, 1 op/s rd, 1 op/s wr
[root@nx124-411-592e-sao01-bastion-0 ~]# oc -n openshift-storage get Pod -n openshift-storage --selector=app=rook-ceph-rgw
No resources found in openshift-storage namespace.
[root@nx124-411-592e-sao01-bastion-0 ~]#
I see 4.11.0-46 was published on 4/20 and the comment by Travis was on the same day, I will look for the next build and check. Thanks for double checking the next build to make sure the fix is in... Per discussion in gchat, another fix was needed for the service account generation in the csv. Follow-up fix is merged downstream now with https://github.com/red-hat-storage/rook/pull/372 We are still wiating for a stable build, the latest build again didn't pass the deployment. Please wait for the next stable build. [root@nx124-411-94c4-syd04-bastion-0 ~]# oc get csv odf-operator.v4.11.0 -o yaml -n openshift-storage | grep full
full_version: 4.11.0-51
[root@nx124-411-94c4-syd04-bastion-0 ~]# oc -n openshift-storage get Pod -n openshift-storage --selector=app=rook-ceph-rgw
No resources found in openshift-storage namespace.
[root@nx124-411-94c4-syd04-bastion-0 ~]#
Checked in driver 51 of 4.11 build, the rgw pod is still missing.
[root@nx124-411-94c4-syd04-bastion-0 ~]# oc -n openshift-storage get StorageCluster ocs-storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 56m Ready 2022-04-27T16:44:54Z 4.11.0 [root@nx124-411-94c4-syd04-bastion-0 ~]# But storage cluster is good in Ready state. [root@nx124-411-94c4-syd04-bastion-0 ~]# oc get storagesystem -n openshift-storage NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster [root@nx124-411-94c4-syd04-bastion-0 ~]# [root@nx124-411-94c4-syd04-bastion-0 ~]# oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 57m Ready Cluster created successfully HEALTH_OK [root@nx124-411-94c4-syd04-bastion-0 ~]# Sridhar What does the following show? - oc -n openshift-storage describe deploy <rook-ceph-rgw-deployment> - oc -n openshift-storage get svc @tnielsen : Please find the output as follows:
# oc -n openshift-storage describe deploy rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a
Namespace: openshift-storage
CreationTimestamp: Thu, 28 Apr 2022 11:38:41 +0200
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph-version=16.2.7-107
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
rgw=ocs-storagecluster-cephobjectstore
rook-version=v4.11.0-0.354c987b60b7e13e92ac0d69e8504f3cb6c11279
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Annotations: banzaicloud.com/last-applied:
{"metadata":{"labels":{"app":"rook-ceph-rgw","app.kubernetes.io/component":"cephobjectstores.ceph.rook.io","app.kubernetes.io/created-by":...
deployment.kubernetes.io/revision: 1
Selector: app=rook-ceph-rgw,ceph_daemon_id=ocs-storagecluster-cephobjectstore,rgw=ocs-storagecluster-cephobjectstore,rook_cluster=openshift-storage,rook_object_store=ocs-storagecluster-cephobje
ctstore
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 0 max surge
Pod Template:
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
rgw=ocs-storagecluster-cephobjectstore
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Service Account: rook-ceph-rgw
Init Containers:
chown-container-data-dir:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
chown
Args:
--verbose
--recursive
ceph:ceph
/var/log/ceph
/var/lib/ceph/crash
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
Containers:
rgw:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
radosgw
Args:
--fsid=75dffa61-aa5b-4948-9fe4-55ec8571d348
--keyring=/etc/ceph/keyring-store/keyring
--log-to-stderr=true
--err-to-stderr=true
/etc/ceph/private from rook-ceph-rgw-cert (ro)
/etc/ceph/rgw from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
log-collector:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
/bin/bash
-x
-e
-m
-c
CEPH_CLIENT_ID=ceph-client.rgw.ocs.storagecluster.cephobjectstore.a
PERIODICITY=24h
LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
if [ -z "$PERIODICITY" ]; then
PERIODICITY=24h
fi
# edit the logrotate file to only rotate a specific daemon log
# otherwise we will logrotate log files without reloading certain daemons
# this might happen when multiple daemons run on the same machine
sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"
while true; do
sleep "$PERIODICITY"
echo "starting log rotation"
logrotate --verbose --force "$LOG_ROTATE_CEPH_FILE"
echo "I am going to sleep now, see you in $PERIODICITY"
done
sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE" [0/1796]
while true; do
sleep "$PERIODICITY"
echo "starting log rotation"
logrotate --verbose --force "$LOG_ROTATE_CEPH_FILE"
echo "I am going to sleep now, see you in $PERIODICITY"
done
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/log/ceph from rook-ceph-log (rw)
Volumes:
rook-config-override:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: rook-config-override
ConfigMapOptional: <nil>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring
Optional: false
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/crash
HostPathType:
ceph-daemon-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types
Optional: false
rook-ceph-rgw-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ocs-storagecluster-cos-ceph-rgw-tls-cert
Optional: false
Priority Class Name: openshift-user-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
ReplicaFailure True FailedCreate
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-f98b59bd5 (0/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 30m deployment-controller Scaled up replica set rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-f98b59bd5 to 1
# oc -n openshift-storage get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-addons-controller-manager-metrics-service ClusterIP 172.30.43.175 <none> 8443/TCP 38m
csi-cephfsplugin-metrics ClusterIP 172.30.5.60 <none> 8080/TCP,8081/TCP 37m
csi-rbdplugin-metrics ClusterIP 172.30.43.190 <none> 8080/TCP,8081/TCP 37m
noobaa-db-pg ClusterIP 172.30.210.52 <none> 5432/TCP 34m
noobaa-mgmt LoadBalancer 172.30.52.103 <pending> 80:32344/TCP,443:30675/TCP,8445:30630/TCP,8446:31687/TCP 34m
noobaa-operator-service ClusterIP 172.30.60.183 <none> 443/TCP 34m
ocs-metrics-exporter ClusterIP 172.30.43.168 <none> 8080/TCP,8081/TCP 34m
odf-console-service ClusterIP 172.30.111.187 <none> 9001/TCP 39m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.226.23 <none> 8443/TCP 39m
rook-ceph-mgr ClusterIP 172.30.114.236 <none> 9283/TCP 35m
rook-ceph-mon-a ClusterIP 172.30.171.54 <none> 6789/TCP,3300/TCP 36m
rook-ceph-mon-b ClusterIP 172.30.17.54 <none> 6789/TCP,3300/TCP 36m
rook-ceph-mon-c ClusterIP 172.30.71.96 <none> 6789/TCP,3300/TCP 36m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore ClusterIP 172.30.102.80 <none> 80/TCP,443/TCP 35m
s3 LoadBalancer 172.30.7.38 <pending> 80:32395/TCP,443:30633/TCP,8444:30869/TCP,7004:30373/TCP 34m
sts LoadBalancer 172.30.100.71 <pending> 443:30487/TCP 34m
From my environment:
[root@nx124-411-b853-syd04-bastion-0 ~]# oc get deployment rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a -n openshift-storage
NAME READY UP-TO-DATE AVAILABLE AGE
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a 0/1 0 0 7h37m
[root@nx124-411-b853-syd04-bastion-0 ~]# oc get csv -A
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
openshift-local-storage local-storage-operator.4.11.0-202204220613 Local Storage 4.11.0-202204220613 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded
openshift-storage mcg-operator.v4.11.0 NooBaa Operator 4.11.0 Succeeded
openshift-storage ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 Succeeded
openshift-storage odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 Succeeded
openshift-storage odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 Succeeded
[root@nx124-411-b853-syd04-bastion-0 ~]# oc get csv odf-operator.v4.11.0 -n openshift-storage -o yaml | grep "full"
full_version: 4.11.0-51
[root@nx124-411-b853-syd04-bastion-0 ~]# oc get csv ocs-operator.v4.11.0 -n openshift-storage -o yaml | grep "full"
full_version: 4.11.0-51
[root@nx124-411-b853-syd04-bastion-0 ~]#
[root@nx124-411-b853-syd04-bastion-0 ~]# oc -n openshift-storage describe deploy rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a
Namespace: openshift-storage
CreationTimestamp: Thu, 28 Apr 2022 01:19:41 -0400
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph-version=16.2.7-107
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
rgw=ocs-storagecluster-cephobjectstore
rook-version=v4.11.0-0.354c987b60b7e13e92ac0d69e8504f3cb6c11279
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Annotations: banzaicloud.com/last-applied:
{"metadata":{"labels":{"app":"rook-ceph-rgw","app.kubernetes.io/component":"cephobjectstores.ceph.rook.io","app.kubernetes.io/created-by":...
deployment.kubernetes.io/revision: 1
Selector: app=rook-ceph-rgw,ceph_daemon_id=ocs-storagecluster-cephobjectstore,rgw=ocs-storagecluster-cephobjectstore,rook_cluster=openshift-storage,rook_object_store=ocs-storagecluster-cephobjectstore
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 0 max surge
Pod Template:
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
rgw=ocs-storagecluster-cephobjectstore
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Service Account: rook-ceph-rgw
Init Containers:
chown-container-data-dir:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
chown
Args:
--verbose
--recursive
ceph:ceph
/var/log/ceph
/var/lib/ceph/crash
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
Containers:
rgw:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
radosgw
Args:
--fsid=b31cdec5-ff50-4967-b200-81a7c3fbfa1a
--keyring=/etc/ceph/keyring-store/keyring
--log-to-stderr=true
--err-to-stderr=true
--mon-cluster-log-to-stderr=true
--log-stderr-prefix=debug
--default-log-to-file=false
--default-mon-cluster-log-to-file=false
--mon-host=$(ROOK_CEPH_MON_HOST)
--mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)
--id=rgw.ocs.storagecluster.cephobjectstore.a
--setuser=ceph
--setgroup=ceph
--foreground
--rgw-frontends=beast port=8080 ssl_port=443 ssl_certificate=/etc/ceph/private/rgw-cert.pem ssl_private_key=/etc/ceph/private/rgw-key.pem
--host=$(POD_NAME)
--rgw-mime-types-file=/etc/ceph/rgw/mime.types
--rgw-realm=ocs-storagecluster-cephobjectstore
--rgw-zonegroup=ocs-storagecluster-cephobjectstore
--rgw-zone=ocs-storagecluster-cephobjectstore
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Liveness: tcp-socket :8080 delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/swift/healthcheck delay=10s timeout=1s period=10s #success=1 #failure=3
Startup: tcp-socket :8080 delay=10s timeout=1s period=10s #success=1 #failure=18
Environment:
CONTAINER_IMAGE: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
POD_NAME: (v1:metadata.name)
POD_NAMESPACE: (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: 4294967296 (limits.memory)
POD_MEMORY_REQUEST: 4294967296 (requests.memory)
POD_CPU_LIMIT: 2 (limits.cpu)
POD_CPU_REQUEST: 2 (requests.cpu)
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring (ro)
/etc/ceph/private from rook-ceph-rgw-cert (ro)
/etc/ceph/rgw from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
log-collector:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
/bin/bash
-x
-e
-m
-c
CEPH_CLIENT_ID=ceph-client.rgw.ocs.storagecluster.cephobjectstore.a
PERIODICITY=24h
LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
if [ -z "$PERIODICITY" ]; then
PERIODICITY=24h
fi
# edit the logrotate file to only rotate a specific daemon log
# otherwise we will logrotate log files without reloading certain daemons
# this might happen when multiple daemons run on the same machine
sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"
while true; do
sleep "$PERIODICITY"
echo "starting log rotation"
logrotate --verbose --force "$LOG_ROTATE_CEPH_FILE"
echo "I am going to sleep now, see you in $PERIODICITY"
done
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/log/ceph from rook-ceph-log (rw)
Volumes:
rook-config-override:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: rook-config-override
ConfigMapOptional: <nil>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring
Optional: false
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/crash
HostPathType:
ceph-daemon-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types
Optional: false
rook-ceph-rgw-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ocs-storagecluster-cos-ceph-rgw-tls-cert
Optional: false
Priority Class Name: openshift-user-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
ReplicaFailure True FailedCreate
OldReplicaSets: <none>
NewReplicaSet: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6f444f4574 (0/1 replicas created)
Events: <none>
[root@nx124-411-b853-syd04-bastion-0 ~]#
[root@nx124-411-b853-syd04-bastion-0 ~]#
[root@nx124-411-b853-syd04-bastion-0 ~]# oc -n openshift-storage get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-addons-controller-manager-metrics-service ClusterIP 172.30.18.152 <none> 8443/TCP 7h49m
csi-cephfsplugin-metrics ClusterIP 172.30.50.86 <none> 8080/TCP,8081/TCP 7h47m
csi-rbdplugin-metrics ClusterIP 172.30.235.99 <none> 8080/TCP,8081/TCP 7h47m
noobaa-db-pg ClusterIP 172.30.144.11 <none> 5432/TCP 7h38m
noobaa-mgmt LoadBalancer 172.30.121.155 <pending> 80:32409/TCP,443:32675/TCP,8445:31121/TCP,8446:32642/TCP 7h38m
noobaa-operator-service ClusterIP 172.30.16.88 <none> 443/TCP 7h38m
ocs-metrics-exporter ClusterIP 172.30.149.141 <none> 8080/TCP,8081/TCP 7h38m
odf-console-service ClusterIP 172.30.189.49 <none> 9001/TCP 7h50m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.219.94 <none> 8443/TCP 7h51m
rook-ceph-mgr ClusterIP 172.30.166.11 <none> 9283/TCP 7h40m
rook-ceph-mon-h ClusterIP 172.30.171.184 <none> 6789/TCP,3300/TCP 3h33m
rook-ceph-mon-i ClusterIP 172.30.173.6 <none> 6789/TCP,3300/TCP 3h33m
rook-ceph-mon-j ClusterIP 172.30.246.166 <none> 6789/TCP,3300/TCP 3h33m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore ClusterIP 172.30.2.139 <none> 80/TCP,443/TCP 7h39m
s3 LoadBalancer 172.30.137.88 <pending> 80:30549/TCP,443:32312/TCP,8444:30988/TCP,7004:30694/TCP 7h38m
sts LoadBalancer 172.30.252.157 <pending> 443:30302/TCP 7h38m
[root@nx124-411-b853-syd04-bastion-0 ~]#
One more question, what does this show `oc -n openshift-storage get serviceaccount`? The problem is still the missing rook-ceph-rgw service account. *** Bug 2079975 has been marked as a duplicate of this bug. *** # oc -n openshift-storage get serviceaccount NAME SECRETS AGE builder 2 33m ceph-nfs-external-provisioner-runner 2 33m csi-addons-controller-manager 2 32m default 2 33m deployer 2 33m noobaa 2 33m noobaa-endpoint 2 33m noobaa-odf-ui 2 33m ocs-metrics-exporter 2 33m ocs-operator 2 33m ocs-provider-server 2 33m odf-operator-controller-manager 2 33m rook-ceph-cmd-reporter 2 33m rook-ceph-mgr 2 32m rook-ceph-osd 2 33m rook-ceph-purge-osd 2 33m rook-ceph-rgw 2 33m rook-ceph-system 2 33m rook-csi-cephfs-plugin-sa 2 33m rook-csi-cephfs-provisioner-sa 2 33m rook-csi-rbd-plugin-sa 2 32m rook-csi-rbd-provisioner-sa 2 32m From IBM Power environment: [root@nx124-411-2f02-syd04-bastion-0 ~]# oc -n openshift-storage get serviceaccount NAME SECRETS AGE builder 2 5h33m ceph-nfs-external-provisioner-runner 2 5h32m csi-addons-controller-manager 2 5h30m default 2 5h33m deployer 2 5h33m noobaa 2 5h32m noobaa-endpoint 2 5h32m noobaa-odf-ui 2 5h32m ocs-metrics-exporter 2 5h32m ocs-operator 2 5h32m ocs-provider-server 2 5h32m odf-operator-controller-manager 2 5h32m rook-ceph-cmd-reporter 2 5h32m rook-ceph-mgr 2 5h32m rook-ceph-osd 2 5h32m rook-ceph-purge-osd 2 5h32m rook-ceph-rgw 2 5h32m rook-ceph-system 2 5h32m rook-csi-cephfs-plugin-sa 2 5h32m rook-csi-cephfs-provisioner-sa 2 5h32m rook-csi-rbd-plugin-sa 2 5h32m rook-csi-rbd-provisioner-sa 2 5h32m [root@nx124-411-2f02-syd04-bastion-0 ~]# Same as what Sravika posted above. Ok good to see the rook-ceph-rgw service account has been created. Now there must be a different error preventing the pod from starting. The describe on the deployment doesn't show the errors though. What does describe show for the rgw replicaset? If the rgw pod doesn't exist, the rgw replicaset must give some indication of the error. [root@nx124-411-2f02-syd04-bastion-0 ~]# oc describe replicaset rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7dfd5d9b98 -n openshift-storage
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7dfd5d9b98
Namespace: openshift-storage
Selector: app=rook-ceph-rgw,ceph_daemon_id=ocs-storagecluster-cephobjectstore,pod-template-hash=7dfd5d9b98,rgw=ocs-storagecluster-cephobjectstore,rook_cluster=openshift-storage,rook_object_store=ocs-storagecluster-cephobjectstore
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
pod-template-hash=7dfd5d9b98
rgw=ocs-storagecluster-cephobjectstore
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Annotations: banzaicloud.com/last-applied:
{"metadata":{"labels":{"app":"rook-ceph-rgw","app.kubernetes.io/component":"cephobjectstores.ceph.rook.io","app.kubernetes.io/created-by":...
deployment.kubernetes.io/desired-replicas: 1
deployment.kubernetes.io/max-replicas: 1
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a
Replicas: 0 current / 1 desired
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=rook-ceph-rgw
app.kubernetes.io/component=cephobjectstores.ceph.rook.io
app.kubernetes.io/created-by=rook-ceph-operator
app.kubernetes.io/instance=ocs-storagecluster-cephobjectstore
app.kubernetes.io/managed-by=rook-ceph-operator
app.kubernetes.io/name=ceph-rgw
app.kubernetes.io/part-of=ocs-storagecluster-cephobjectstore
ceph_daemon_id=ocs-storagecluster-cephobjectstore
ceph_daemon_type=rgw
pod-template-hash=7dfd5d9b98
rgw=ocs-storagecluster-cephobjectstore
rook.io/operator-namespace=openshift-storage
rook_cluster=openshift-storage
rook_object_store=ocs-storagecluster-cephobjectstore
Service Account: rook-ceph-rgw
Init Containers:
chown-container-data-dir:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
chown
Args:
--verbose
--recursive
ceph:ceph
/var/log/ceph
/var/lib/ceph/crash
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
Containers:
rgw:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
radosgw
Args:
--fsid=a6d856ac-f24b-4b9a-bf18-7ba8b4f710d8
--keyring=/etc/ceph/keyring-store/keyring
--log-to-stderr=true
--err-to-stderr=true
--mon-cluster-log-to-stderr=true
--log-stderr-prefix=debug
--default-log-to-file=false
--default-mon-cluster-log-to-file=false
--mon-host=$(ROOK_CEPH_MON_HOST)
--mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS)
--id=rgw.ocs.storagecluster.cephobjectstore.a
--setuser=ceph
--setgroup=ceph
--foreground
--rgw-frontends=beast port=8080 ssl_port=443 ssl_certificate=/etc/ceph/private/rgw-cert.pem ssl_private_key=/etc/ceph/private/rgw-key.pem
--host=$(POD_NAME)
--rgw-mime-types-file=/etc/ceph/rgw/mime.types
--rgw-realm=ocs-storagecluster-cephobjectstore
--rgw-zonegroup=ocs-storagecluster-cephobjectstore
--rgw-zone=ocs-storagecluster-cephobjectstore
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 2
memory: 4Gi
Liveness: tcp-socket :8080 delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/swift/healthcheck delay=10s timeout=1s period=10s #success=1 #failure=3
Startup: tcp-socket :8080 delay=10s timeout=1s period=10s #success=1 #failure=18
Environment:
CONTAINER_IMAGE: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
POD_NAME: (v1:metadata.name)
POD_NAMESPACE: (v1:metadata.namespace)
NODE_NAME: (v1:spec.nodeName)
POD_MEMORY_LIMIT: 4294967296 (limits.memory)
POD_MEMORY_REQUEST: 4294967296 (requests.memory)
POD_CPU_LIMIT: 2 (limits.cpu)
POD_CPU_REQUEST: 2 (requests.cpu)
ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false
ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false
Mounts:
/etc/ceph from rook-config-override (ro)
/etc/ceph/keyring-store/ from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring (ro)
/etc/ceph/private from rook-ceph-rgw-cert (ro)
/etc/ceph/rgw from rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/lib/ceph/rgw/ceph-ocs-storagecluster-cephobjectstore from ceph-daemon-data (rw)
/var/log/ceph from rook-ceph-log (rw)
log-collector:
Image: quay.io/rhceph-dev/rhceph@sha256:9f5f2f3444eb3c8aff5b8dde7ac3fe0bfab64a7ee5b90119af717e1e1d76a0eb
Port: <none>
Host Port: <none>
Command:
/bin/bash
-x
-e
-m
-c
CEPH_CLIENT_ID=ceph-client.rgw.ocs.storagecluster.cephobjectstore.a
PERIODICITY=24h
LOG_ROTATE_CEPH_FILE=/etc/logrotate.d/ceph
if [ -z "$PERIODICITY" ]; then
PERIODICITY=24h
fi
# edit the logrotate file to only rotate a specific daemon log
# otherwise we will logrotate log files without reloading certain daemons
# this might happen when multiple daemons run on the same machine
sed -i "s|*.log|$CEPH_CLIENT_ID.log|" "$LOG_ROTATE_CEPH_FILE"
while true; do
sleep "$PERIODICITY"
echo "starting log rotation"
logrotate --verbose --force "$LOG_ROTATE_CEPH_FILE"
echo "I am going to sleep now, see you in $PERIODICITY"
done
Environment: <none>
Mounts:
/etc/ceph from rook-config-override (ro)
/var/lib/ceph/crash from rook-ceph-crash (rw)
/var/log/ceph from rook-ceph-log (rw)
Volumes:
rook-config-override:
Type: Projected (a volume that contains injected data from multiple sources)
ConfigMapName: rook-config-override
ConfigMapOptional: <nil>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring:
Type: Secret (a volume populated by a Secret)
SecretName: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-keyring
Optional: false
rook-ceph-log:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/log
HostPathType:
rook-ceph-crash:
Type: HostPath (bare host directory volume)
Path: /var/lib/rook/openshift-storage/crash
HostPathType:
ceph-daemon-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rook-ceph-rgw-ocs-storagecluster-cephobjectstore-mime-types
Optional: false
rook-ceph-rgw-cert:
Type: Secret (a volume populated by a Secret)
SecretName: ocs-storagecluster-cos-ceph-rgw-tls-cert
Optional: false
Priority Class Name: openshift-user-critical
Conditions:
Type Status Reason
---- ------ ------
ReplicaFailure True FailedCreate
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 4m35s (x139 over 12h) replicaset-controller Error creating: pods "rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7dfd5d9b98-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "noobaa": Forbidden: not usable by user or serviceaccount, provider "noobaa-endpoint": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "rook-ceph": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "rook-ceph-csi": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
[root@nx124-411-2f02-syd04-bastion-0 ~]#
Aha, if you look at the scc, it must not have picked up the new rgw binding that was added here: https://github.com/rook/rook/pull/9964/files#diff-b218546ccf5d4a03e758f01e20b4eccde26c61450fccb27b1d21f1a122217e67R74 Now we need the OCS operator to update to the latest rook with this update to the scc Update to OCS operator is in progress... *** Bug 2081690 has been marked as a duplicate of this bug. *** This fixed merged, and the latest build should be working now. Deployment success on vSPhere platform with build: ocs-registry:4.11.0-63 job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/4157/consoleFull logs: 2022-05-06 12:07:06 06:37:06 - MainThread - ocs_ci.ocs.resources.storage_cluster - INFO - Check if StorageCluster: ocs-storagecluster is in Succeeded phase 2022-05-06 12:07:06 06:37:06 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get StorageCluster ocs-storagecluster -n openshift-storage -o yaml 2022-05-06 12:07:06 06:37:06 - MainThread - ocs_ci.ocs.ocp - INFO - Resource ocs-storagecluster is in phase: Ready! 2022-05-06 12:07:37 06:37:35 - MainThread - ocs_ci.ocs.resources.storage_cluster - INFO - Verifying ceph health 2022-05-06 12:07:37 06:37:35 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc wait --for condition=ready pod -l app=rook-ceph-tools -n openshift-storage --timeout=120s 2022-05-06 12:07:37 06:37:36 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get pod -l 'app=rook-ceph-tools' -o jsonpath='{.items[0].metadata.name}' 2022-05-06 12:07:37 06:37:36 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage exec rook-ceph-tools-5cfbd9fdc8-htfgh -- ceph health 2022-05-06 12:07:37 06:37:36 - MainThread - ocs_ci.utility.utils - INFO - Ceph cluster health is HEALTH_OK. ODF deployment on IBM Z succeeds with the latest build 4.11.0-63 .
# oc get storagecluster -A
NAMESPACE NAME AGE PHASE EXTERNAL CREATED AT VERSION
openshift-storage ocs-storagecluster 6m44s Ready 2022-05-06T09:35:16Z 4.11.0
# oc get csv odf-operator.v4.11.0 -n openshift-storage -oyaml | grep full_version
full_version: 4.11.0-63
# oc get po -n openshift-storage | grep rgw
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7d7d4d8rsb6g 2/2 Running 0 8m39s
# oc get po -n openshift-storage
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-5988b4d8b6-s2l9b 2/2 Running 0 12m
csi-cephfsplugin-b776p 3/3 Running 0 12m
csi-cephfsplugin-d99tm 3/3 Running 0 12m
csi-cephfsplugin-gj8p2 3/3 Running 0 12m
csi-cephfsplugin-provisioner-75bcbb797-bdkhq 6/6 Running 0 12m
csi-cephfsplugin-provisioner-75bcbb797-qqggf 6/6 Running 0 12m
csi-rbdplugin-b9jkg 4/4 Running 0 12m
csi-rbdplugin-cndhd 4/4 Running 0 12m
csi-rbdplugin-provisioner-5b9f4659f8-qbvgl 7/7 Running 0 12m
csi-rbdplugin-provisioner-5b9f4659f8-wc9kd 7/7 Running 0 12m
csi-rbdplugin-zlb96 4/4 Running 0 12m
noobaa-core-0 1/1 Running 0 8m51s
noobaa-db-pg-0 1/1 Running 0 8m51s
noobaa-endpoint-79cf94ddc9-8xjd9 1/1 Running 0 7m5s
noobaa-operator-555bb8d4-rtq65 1/1 Running 1 (8m50s ago) 13m
ocs-metrics-exporter-85744bfc5d-gzpxt 1/1 Running 0 13m
ocs-operator-5778c4c589-h8hsv 1/1 Running 0 13m
odf-console-7f84466d46-l2594 1/1 Running 0 13m
odf-operator-controller-manager-54b75784c7-5rnlb 2/2 Running 0 13m
rook-ceph-crashcollector-worker-0.ocsm4205001.lnxero1.boe-2lzl8 1/1 Running 0 9m21s
rook-ceph-crashcollector-worker-1.ocsm4205001.lnxero1.boe-842hj 1/1 Running 0 9m56s
rook-ceph-crashcollector-worker-2.ocsm4205001.lnxero1.boe-ft2w7 1/1 Running 0 9m19s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6c947c8dzhrbx 2/2 Running 0 9m21s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7dcc78c9qmhgs 2/2 Running 0 9m19s
rook-ceph-mgr-a-577896b47f-8r7db 2/2 Running 0 10m
rook-ceph-mon-a-769d6d576-j845l 2/2 Running 0 11m
rook-ceph-mon-b-5c8c84f7bc-djsmd 2/2 Running 0 10m
rook-ceph-mon-c-677d84597-qlp8w 2/2 Running 0 10m
rook-ceph-operator-55f596475d-krlx8 1/1 Running 0 13m
rook-ceph-osd-0-78fff9f694-zd69k 2/2 Running 0 9m36s
rook-ceph-osd-1-8484c487cb-xmzgk 2/2 Running 0 9m37s
rook-ceph-osd-2-f776cd8f6-5vwnt 2/2 Running 0 9m35s
rook-ceph-osd-prepare-11714a315d0086dac219451384576567-twpnp 0/1 Completed 0 9m51s
rook-ceph-osd-prepare-54a70174d45406e740930e7995d8fed4-kf9tp 0/1 Completed 0 9m51s
rook-ceph-osd-prepare-e2a03e34e2eb44d41486e5c31a6ca356-d6mqz 0/1 Completed 0 9m51s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7d7d4d8rsb6g 2/2 Running 0 9m6s
rook-ceph-tools-5cfbd9fdc8-ql5c2 1/1 Running 0 9m19s
It is looking good for IBM Power platform as well:
[root@nx124-411-402a-syd04-bastion-0 ~]# oc -n openshift-storage get Pod -n openshift-storage --selector=app=rook-ceph-rgw
NAME READY STATUS RESTARTS AGE
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6b48d7488lnj 2/2 Running 0 72m
[root@nx124-411-402a-syd04-bastion-0 ~]# oc get csv odf-operator.v4.11.0 -n openshift-storage -o yaml | grep "full"
full_version: 4.11.0-63
[root@nx124-411-402a-syd04-bastion-0 ~]#
Moving to VERIFIED based on comment #29 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156 |
Description of problem (please be detailed as possible and provide log snippests): ODF 4.11.0-38 deployment leaves the storagecluster in "Progressing" state although all the openshift-storage pods are up and Running # oc -n openshift-storage get StorageCluster ocs-storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 34m Progressing 2022-04-14T14:45:50Z 4.11.0 # oc get storagesystem -n openshift-storage NAMESPACE NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME openshift-storage ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster # oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 37m Ready Cluster created successfully HEALTH_OK # oc get po -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-6fdddd684c-kc7rv 2/2 Running 0 35m csi-cephfsplugin-6v8f8 3/3 Running 0 34m csi-cephfsplugin-g2mfl 3/3 Running 0 34m csi-cephfsplugin-mdtcr 3/3 Running 0 34m csi-cephfsplugin-provisioner-694458df4-7trk6 6/6 Running 0 34m csi-cephfsplugin-provisioner-694458df4-x8qw7 6/6 Running 0 34m csi-rbdplugin-hk5s7 4/4 Running 0 34m csi-rbdplugin-provisioner-5ff764646d-79f6n 7/7 Running 0 34m csi-rbdplugin-provisioner-5ff764646d-xm8qj 7/7 Running 0 34m csi-rbdplugin-vxczr 4/4 Running 0 34m csi-rbdplugin-z4ssw 4/4 Running 0 34m noobaa-core-0 1/1 Running 0 31m noobaa-db-pg-0 1/1 Running 0 31m noobaa-endpoint-54565dbb4b-d9czf 1/1 Running 0 29m noobaa-operator-9bc6685d9-c4hvc 1/1 Running 1 (31m ago) 36m ocs-metrics-exporter-858dd8784d-vhjw9 1/1 Running 0 36m ocs-operator-67cbb8dfc5-dtrhj 1/1 Running 0 36m odf-console-5f9bf644cd-85q55 1/1 Running 0 36m odf-operator-controller-manager-6c8ccd88f8-bm72q 2/2 Running 0 36m rook-ceph-crashcollector-worker-0.ocsm4205001.lnxne.boe-85z9dwr 1/1 Running 0 32m rook-ceph-crashcollector-worker-1.ocsm4205001.lnxne.boe-56qs4j9 1/1 Running 0 32m rook-ceph-crashcollector-worker-2.ocsm4205001.lnxne.boe-56hhfzj 1/1 Running 0 32m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8b9cf9b5gp2dg 2/2 Running 0 32m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-59b96569chjjt 2/2 Running 0 32m rook-ceph-mgr-a-667fd9fd66-7jpfz 2/2 Running 0 32m rook-ceph-mon-a-6bb76f6b79-js4dp 2/2 Running 0 34m rook-ceph-mon-b-6db796ffdc-b4h8h 2/2 Running 0 33m rook-ceph-mon-c-5f89c7bc9d-c6s2x 2/2 Running 0 33m rook-ceph-operator-dd6cccfb8-rg968 1/1 Running 0 36m rook-ceph-osd-0-556bc8866-c66md 2/2 Running 0 32m rook-ceph-osd-1-84d4f4b99b-r8jqd 2/2 Running 0 32m rook-ceph-osd-2-5c69c4684f-jsz2z 2/2 Running 0 32m rook-ceph-osd-prepare-636d124a0298dd81229f6875f74008ce-pw6hz 0/1 Completed 0 32m rook-ceph-osd-prepare-aad2ba905033100da0435c48775d47ff-lkv9g 0/1 Completed 0 32m rook-ceph-osd-prepare-d578d363cc45f10c7a4e945322c848c1-l7ttt 0/1 Completed 0 32m rook-ceph-tools-68cf9db877-dnhh6 1/1 Running 0 32m # oc -n openshift-storage rsh rook-ceph-tools-68cf9db877-dnhh6 sh-4.4$ ceph -s cluster: id: 6ba16244-ba70-4157-afac-be6eabc223d2 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 33m) mgr: a(active, since 32m) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 32m), 3 in (since 32m) data: volumes: 1/1 healthy pools: 11 pools, 177 pgs objects: 124 objects, 135 MiB usage: 350 MiB used, 1.5 TiB / 1.5 TiB avail pgs: 177 active+clean io: client: 853 B/s rd, 3.7 KiB/s wr, 1 op/s rd, 0 op/s wr Version of all relevant components (if applicable): ODF 4.11.0-38 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP 4.11 2. Deploy ODF 4.11.0-38 3. Actual results: Storage cluster is in "Progressing" state although all the openshift-storage pods are up and running Expected results: Storage cluster should be in "Ready" state Additional info: Must-gather logs: https://drive.google.com/file/d/1jefxEqQsBF5UjvblPvlm8-ADVRSGd-Zf/view?usp=sharing