This is fixed in 4.8, so I'm setting the target to 4.7.z, so removing the blocker flag. Randy, Mudit, are you good with that?
(In reply to Sébastien Han from comment #4) > This is fixed in 4.8, so I'm setting the target to 4.7.z, so removing the > blocker flag. > Randy, Mudit, are you good with that? Moving to POST since this is fixed in 4.8 / upstream.
The upstream/4.8 fix is this PR: https://github.com/rook/rook/pull/7374. I do not see any other related changes to this issue. I've created the downstream PR for 4.7, ready to merge when the BZ is fully acked. https://github.com/openshift/rook/pull/254
Please add doc text
LGTM.
The osd first failed to initialize and only after 35 minutes the osd are up and encryption is working. Since the osd is replace I don't have logs from the failed one. Will deploy new cluster soon and have some logs from the failed one.
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m45s default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-0-749f8ddbc8-s7sfq to ip-10-0-135-29.us-east-2.compute.internal Normal SuccessfulMountVolume 2m44s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-a57a35c5-3758-4001-a3cf-2fb122a3b4bc" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/volumeDevices/aws:/us-east-2a/vol-0f7e552d0a9953032" Normal SuccessfulMountVolume 2m44s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-a57a35c5-3758-4001-a3cf-2fb122a3b4bc" volumeMapPath "/var/lib/kubelet/pods/19692ab7-841b-454b-8e8d-a930a39116bf/volumeDevices/kubernetes.io~aws-ebs" Normal AddedInterface 2m43s multus Add eth0 [10.131.0.76/23] Normal Pulled 2m42s kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:725f93133acc0fb1ca845bd12e77f20d8629cad0e22d46457b2736578698eb6c" already present on machine Normal Created 2m42s kubelet Created container blkdevmapper Normal Started 2m42s kubelet Started container blkdevmapper Normal Pulled 2m (x4 over 2m41s) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:725f93133acc0fb1ca845bd12e77f20d8629cad0e22d46457b2736578698eb6c" already present on machine Normal Created 2m (x4 over 2m41s) kubelet Created container encryption-kms-get-kek Normal Started 2m (x4 over 2m41s) kubelet Started container encryption-kms-get-kek Warning BackOff 80s (x8 over 2m39s) kubelet Back-off restarting failed container oc logs rook-ceph-osd-0-749f8ddbc8-s7sfq -c encryption-kms-get-kek no encryption key rook-ceph-osd-encryption-key-ocs-deviceset-gp2-0-data-0q6whj present in vault ["Invalid path for a versioned K/V secrets engine. See the API docs for the appropriate API endpoints to use. If using the Vault CLI, use 'vault kv get' for this operation."] rook-ceph-osd-0-749f8ddbc8-s7sfq 0/2 Init:CrashLoopBackOff 4 2m25s rook-ceph-osd-1-55b97c64fc-fb2g6 0/2 Init:CrashLoopBackOff 4 2m18s rook-ceph-osd-2-6b8c95f8d5-tr2g8 0/2 Init:CrashLoopBackOff 4 2m15s
Tested another install and it took 10 minutes before the cluster was up. Attached above rook log "rook log 10 minutes"
Just to clarify, the reason why we don't see this in 4.8 is that in 4.8 rook will cancel any ongoing orchestration on CR update, this is not the case in 4.7 so the operator runs a provisioning sequence once, then times out and retries with success. Hence the 10min needed to wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1977609 was raised to handle 4.7.3. And also https://access.redhat.com/solutions/6150022
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.7.2 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2632