A FIPS CI job on an e2e-suite PR failed in setup with [1]: E0611 00:52:21.614878 45 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: Get https://api.ci-op-8qi7qqf9-b5b45.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3194&timeoutSeconds=577&watch=true: dial tcp 52.52.41.37:6443: connect: connection refused level=info msg="Pulling debug logs from the bootstrap machine" level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20200611010248.tar.gz\"" level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition" From the log bundle [2]: $ grep '"attempt": [^0]' bootstrap/containers/*.inspect bootstrap/containers/cloud-credential-operator-ad0c8f95af28155d280a26416ede6a94adc46f57b724f6b54f08905018d6fb83.inspect: "attempt": 11, bootstrap/containers/cluster-version-operator-1f6115911bbfa402033b04dcbee0fee34760f69d80d236666a3d7c712efb9cd1.inspect: "attempt": 1, bootstrap/containers/kube-apiserver-de38bbc76573b2c14da39eb8d966dc8698dd0286d75acc0827901dce64405bf6.inspect: "attempt": 1, bootstrap/containers/kube-apiserver-insecure-readyz-7526e4441fd4413f7fc6fbc29785f486342621c74dc7c7639725328ed9834008.inspect: "attempt": 2, bootstrap/containers/kube-apiserver-insecure-readyz-bc4f1fa5dda3d31498b9cc06bbe86318697aa3e36567948b1d592180979ebb46.inspect: "attempt": 1, bootstrap/containers/kube-controller-manager-c1d4b4e90e251f9e20e1afa9ab9a7adc648230fb16ce667325e847daa6a52126.inspect: "attempt": 1, bootstrap/containers/kube-scheduler-ba28db102059ca25af13633430885cdfa3f9c4efafdf3dc6f02bf4fd15294a57.inspect: "attempt": 1, bootstrap/containers/setup-4affb608eb4017587629de44dcf89e31b8a8e40fc52884027c8e061e8b573b73.inspect: "attempt": 1, $ tail -n2 bootstrap/containers/cloud-credential-operator-ad0c8f95af28155d280a26416ede6a94adc46f57b724f6b54f08905018d6fb83.log time="2020-06-11T01:02:38Z" level=info msg="setting up AWS pod identity controller" time="2020-06-11T01:02:38Z" level=fatal msg="unable to register controllers to the manager" error="AWS_POD_IDENTITY_WEBHOOK_IMAGE is not set" Looks like AWS_POD_IDENTITY_WEBHOOK_IMAGE is from a weekish ago [3]. From the manifest [4]: - name: RELEASE_VERSION value: 0.0.1-2020-06-11-001518 - name: AWS_POD_IDENTITY_WEBHOOK_IMAGE value: registry.svc.ci.openshift.org/ci-op-8qi7qqf9/stable@sha256:7181998a260035fdcc06a65fb261289503f8f6b07d36f2db6a14b7161fc15d0c So that looks like it's set to me. But the bootstrap pod has: $ jq .info.runtimeSpec.process.env bootstrap/containers/cloud-credential-operator-ad0c8f95af28155d280a26416ede6a94adc46f57b724f6b54f08905018d6fb83.inspect [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm", "HOSTNAME=ip-10-0-21-86", "foo=bar", "OPENSHIFT_BUILD_NAME=cloud-credential-operator", "OPENSHIFT_BUILD_NAMESPACE=ci-op-gkt0g4hd", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "container=oci" ] And the rendered pod spec was: $ cat rendered-assets/openshift/cco-bootstrap/bootstrap-manifests/cloud-credential-operator-pod.yaml apiVersion: v1 kind: Pod metadata: name: cloud-credential-operator namespace: openshift-cloud-credential-operator spec: containers: - command: - /usr/bin/cloud-credential-operator args: - operator - --log-level=debug - --kubeconfig=/etc/kubernetes/secrets/kubeconfig image: registry.svc.ci.openshift.org/ci-op-8qi7qqf9/stable@sha256:db9944c2ca1c542822860e01b6a17a22cd36dbb615978dc41141f1fe97ba92d1 imagePullPolicy: IfNotPresent name: cloud-credential-operator volumeMounts: - mountPath: /etc/kubernetes/secrets name: secrets readOnly: true hostNetwork: true volumes: - hostPath: path: /etc/kubernetes/bootstrap-secrets name: secrets So maybe cred#195 missed some changes that need to happen to the generated bootstrap pod YAML? Or we need a softening on the requirement like the in-flight [5]. [1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/25094/pull-ci-openshift-origin-master-e2e-aws-fips/3270 [2]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/25094/pull-ci-openshift-origin-master-e2e-aws-fips/3270/artifacts/e2e-aws-fips/installer/ [3]: https://github.com/openshift/cloud-credential-operator/pull/195/files#diff-77b40adb1e22a95f65dd1acda430bc80R110 [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25094/pull-ci-openshift-origin-master-e2e-aws-fips/3270/artifacts/release-latest/release-payload-latest/0000_50_cloud-credential-operator_03-deployment.yaml [5]: https://github.com/openshift/cloud-credential-operator/pull/206#discussion_r438448965
As far as I was aware, the CCO was only run with the `render` command during install, not in actual operator mode. Is this not true any more? If so, why did this change?
The CCO binary is run to render a limited Pod definition https://github.com/openshift/cloud-credential-operator/blob/master/pkg/cmd/render/render.go#L38-L62 that does run as a static Pod on the bootstrap node.
PR in code review.
The bug has fixed. INFO[0003] registering components INFO[0003] setting up scheme INFO[0003] setting up controller INFO[0005] Setting up secret annotator. Platform Type is AWS INFO[0006] setting up AWS pod identity controller WARN[0006] AWS_POD_IDENTITY_WEBHOOK_IMAGE is not set, AWS pod identity webhook will not be deployed controller=awspodidentity INFO[0008] setting up AWS OIDC Discovery Endpoint Controller INFO[0012] initializing AWS actuator INFO[0012] starting the cmd
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196