Description of problem: Do upgrade testing from 4.9 to 4.10 on AWS, found cloud-network-config-controller pod cannot be running due to secret "cloud-credentials" not found $ oc describe pod cloud-network-config-controller-5644f5c845-8ns9f -n openshift-cloud-network-config-controller Name: cloud-network-config-controller-5644f5c845-8ns9f Namespace: openshift-cloud-network-config-controller Priority: 2000000000 Priority Class Name: system-cluster-critical Node: ip-10-0-74-183.us-east-2.compute.internal/10.0.74.183 Start Time: Tue, 11 Jan 2022 13:04:43 +0800 Labels: app=cloud-network-config-controller component=network openshift.io/component=network pod-template-hash=5644f5c845 type=infra Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/cloud-network-config-controller-5644f5c845 Containers: controller: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd9fe9fc419ec1288bb0383f0d4550b9cdfbe662986fe617d72264ca93d38e3 Image ID: Port: <none> Host Port: <none> Command: /usr/bin/cloud-network-config-controller Args: -platform-type AWS -platform-region us-east-2 -secret-name cloud-credentials State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 50Mi Environment: CONTROLLER_NAMESPACE: openshift-cloud-network-config-controller (v1:metadata.namespace) CONTROLLER_NAME: cloud-network-config-controller-5644f5c845-8ns9f (v1:metadata.name) KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.zzhao11103141.qe.devcluster.openshift.com Mounts: /etc/secret/cloudprovider from cloud-provider-secret (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sn7bz (ro) /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: cloud-provider-secret: Type: Secret (a volume populated by a Secret) SecretName: cloud-credentials Optional: false bound-sa-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3600 kube-api-access-sn7bz: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 21m (x50 over 7h16m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-api-access-sn7bz cloud-provider-secret bound-sa-token]: timed out waiting for the condition Warning FailedMount 13m (x217 over 7h18m) kubelet MountVolume.SetUp failed for volume "cloud-provider-secret" : secret "cloud-credentials" not found Warning FailedMount 8m (x99 over 7h14m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[cloud-provider-secret bound-sa-token kube-api-access-sn7bz]: timed out waiting for the condition Warning FailedMount 3m30s (x40 over 7h12m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[bound-sa-token kube-api-access-sn7bz cloud-provider-secret]: timed out waiting for the condition Version-Release number of selected component (if applicable): 4.9 to 4.10 How reproducible: always Steps to Reproduce: 1. do upgrade from 4.9 to 4.10 2. 3. Actual results: Expected results: Additional info:
this seems like duplicate with https://bugzilla.redhat.com/show_bug.cgi?id=2034413
*** This bug has been marked as a duplicate of bug 2034413 ***
reopen this bug since this issue still be happen on 4.9.14-x86_64 --> 4.10.0-0.nightly-2022-01-25-023600 with Disconnected UPI on AWS & Private cluster $ omg get pod -n openshift-cloud-network-config-controller -o wide NAME READY STATUS RESTARTS AGE IP NODE cloud-network-config-controller-5fc5c59cdd-qdsnm 0/1 Pending 0 30m ip-10-0-71-73.us-east-2.compute.internal $ omg get event -n openshift-cloud-network-config-controller LAST SEEN TYPE REASON OBJECT MESSAGE Unknown Normal Scheduled pod/cloud-network-config-controller-5fc5c59cdd-fj75r Successfully assigned openshift-cloud-network-config-controller/cloud-network-config-controller-5fc5c59cdd-fj75r to ip-10-0-57-229.us-east-2.compute.internal 33m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r MountVolume.SetUp failed for volume "cloud-provider-secret" : secret "cloud-credentials" not found 44m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[bound-sa-token kube-api-access-qh6cr cloud-provider-secret kube-cloud-config trusted-ca]: timed out waiting for the condition 46m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-api-access-qh6cr cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token]: timed out waiting for the condition 37m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-cloud-config trusted-ca bound-sa-token kube-api-access-qh6cr cloud-provider-secret]: timed out waiting for the condition 41m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[trusted-ca bound-sa-token kube-api-access-qh6cr cloud-provider-secret kube-cloud-config]: timed out waiting for the condition 39m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-fj75r Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token kube-api-access-qh6cr]: timed out waiting for the condition Unknown Normal Scheduled pod/cloud-network-config-controller-5fc5c59cdd-qdsnm Successfully assigned openshift-cloud-network-config-controller/cloud-network-config-controller-5fc5c59cdd-qdsnm to ip-10-0-71-73.us-east-2.compute.internal 3m59s Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-qdsnm MountVolume.SetUp failed for volume "cloud-provider-secret" : secret "cloud-credentials" not found 28m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-qdsnm Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[bound-sa-token kube-api-access-cjwr8 cloud-provider-secret kube-cloud-config trusted-ca]: timed out waiting for the condition 8m0s Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-qdsnm Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[trusted-ca bound-sa-token kube-api-access-cjwr8 cloud-provider-secret kube-cloud-config]: timed out waiting for the condition 21m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-qdsnm Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-api-access-cjwr8 cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token]: timed out waiting for the condition 10m Warning FailedMount pod/cloud-network-config-controller-5fc5c59cdd-qdsnm Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token kube-api-access-cjwr8]: timed out waiting for the condition 59m Normal SuccessfulCreate replicaset/cloud-network-config-controller-5fc5c59cdd Created pod: cloud-network-config-controller-5fc5c59cdd-fj75r 30m Normal SuccessfulCreate replicaset/cloud-network-config-controller-5fc5c59cdd Created pod: cloud-network-config-controller-5fc5c59cdd-qdsnm 59m Normal ScalingReplicaSet deployment/cloud-network-config-controller Scaled up replica set cloud-network-config-controller-5fc5c59cdd to 1 must-gather download link: http://10.73.131.57:9000/openshift-must-gather/2022-01-25-23-36-44/must-gather.local.6311488711253628560.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=openshift%2F20220125%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220125T233700Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=12c24c8bb0f2d014f018fe575e4d93f04201260343746f684187fa7f8c9edc0c
Hi Zhao Are you performing the manual steps described in: https://bugzilla.redhat.com/show_bug.cgi?id=2034413#c10 ? The secret is clearly not created, hence that pod will not start. On such an environment: it's up to the user to manually create all required secrets (I believe), hence did you when testing? If you did, what didn't work when you tried? We need a step by step reproducer for this /Alex
Looking at the cloud-credential-operator, this does not seem like a bug $ omg logs -c cloud-credential-operator cloud-credential-operator-7bbf4df879-kchl6 -n openshift-cloud-credential-operator | less ... 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info msg="found credentials request for namespace" cr=openshift-cloud-network-config-controller/cloud-credentials namespace=openshift-cloud-network-config-controller 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info msg="found credentials request for namespace" cr=openshift-cloud-network-config-controller/cloud-credentials namespace=openshift-cloud-network-config-controller 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info msg="found credentials request for namespace" cr=openshift-cloud-network-config-controller/cloud-credentials namespace=openshift-cloud-network-config-controller ... 2022-01-25T23:05:55.296494202Z time="2022-01-25T23:05:55Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/openshift-cloud-network-config-controller-aws 2022-01-25T23:05:55.296494202Z time="2022-01-25T23:05:55Z" level=info msg="operator set to disabled / manual mode" controller=credreq cr=openshift-cloud-credential-operator/openshift-cloud-network-config-controller-azure Hence the cloud-credentials by the CNO are deployed and seen by the creedential-operator, but because of manual mode they are not applied. You as a user needs to do this. /Alex
(In reply to Alexander Constantinescu from comment #7) > Looking at the cloud-credential-operator, this does not seem like a bug > > $ omg logs -c cloud-credential-operator > cloud-credential-operator-7bbf4df879-kchl6 -n > openshift-cloud-credential-operator | less > ... > 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info > msg="found credentials request for namespace" > cr=openshift-cloud-network-config-controller/cloud-credentials > namespace=openshift-cloud-network-config-controller > 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info > msg="found credentials request for namespace" > cr=openshift-cloud-network-config-controller/cloud-credentials > namespace=openshift-cloud-network-config-controller > 2022-01-25T23:05:54.644335898Z time="2022-01-25T23:05:54Z" level=info > msg="found credentials request for namespace" > cr=openshift-cloud-network-config-controller/cloud-credentials > namespace=openshift-cloud-network-config-controller > ... > 2022-01-25T23:05:55.296494202Z time="2022-01-25T23:05:55Z" level=info > msg="operator set to disabled / manual mode" controller=credreq > cr=openshift-cloud-credential-operator/openshift-cloud-network-config- > controller-aws > 2022-01-25T23:05:55.296494202Z time="2022-01-25T23:05:55Z" level=info > msg="operator set to disabled / manual mode" controller=credreq > cr=openshift-cloud-credential-operator/openshift-cloud-network-config- > controller-azure > > > Hence the cloud-credentials by the CNO are deployed and seen by the > creedential-operator, but because of manual mode they are not applied. You > as a user needs to do this. > > /Alex Hi, Alex since this cluster is upgraded from 4.9, Do you mean the customer need to do the steps below after upgrade to 4.10 version by manual? if yes, is there any way to make this automatic?. #oc adm release extract registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-01-14-015144 --credentials-requests -a pull-secret --cloud azure created 0000_50_cluster-network-operator_02-cncc-credentials.yaml $cat 0000_50_cluster-network-operator_02-cncc-credentials.yaml ...... secretRef: name: cloud-credentials namespace: openshift-cloud-network-config-controller
Hi I don't know to be honest. I wasn't involved in designing the deployment/support model for a "disconnected UPI install with a private cluster", so I am not sure what the right procedure is. It's however not a bug with any networking component as the cloud-credential-operator clearly states that it is disabled on this environment and can't create the secret automatically. I suppose that when you created that secret manually in #comment 8, everything started working? /Alex
(In reply to Alexander Constantinescu from comment #9) > Hi > > I don't know to be honest. I wasn't involved in designing the > deployment/support model for a "disconnected UPI install with a private > cluster", so I am not sure what the right procedure is. It's however not a > bug with any networking component as the cloud-credential-operator clearly > states that it is disabled on this environment and can't create the secret > automatically. I suppose that when you created that secret manually in > #comment 8, everything started working? > > /Alex When I'm trying to apply that. it told me it already exist, so the issue is no secret 'cloud-credentials', see below $ oc create -f 0000_50_cluster-network-operator_02-cncc-credentials.yaml Error from server (AlreadyExists): error when creating "0000_50_cluster-network-operator_02-cncc-credentials.yaml": credentialsrequests.cloudcredential.openshift.io "openshift-cloud-network-config-controller-aws" already exists ###yes, this already exist. $ oc get credentialsrequests.cloudcredential.openshift.io -n openshift-cloud-credential-operator | grep controller-aws openshift-cloud-network-config-controller-aws 3h41m ###However the pod is not ready in openshift-cloud-network-config-controller $ oc get pod -n openshift-cloud-network-config-controller NAME READY STATUS RESTARTS AGE cloud-network-config-controller-5d8b594b85-ddpvv 0/1 ContainerCreating 0 3h17m ###secret "cloud-credentials" not found found $ oc describe pod cloud-network-config-controller-5d8b594b85-ddpvv -n openshift-cloud-network-config-controller Name: cloud-network-config-controller-5d8b594b85-ddpvv Namespace: openshift-cloud-network-config-controller Priority: 2000000000 Priority Class Name: system-cluster-critical Node: ip-10-0-50-15.us-east-2.compute.internal/10.0.50.15 Start Time: Fri, 28 Jan 2022 14:24:05 +0800 Labels: app=cloud-network-config-controller component=network openshift.io/component=network pod-template-hash=5d8b594b85 type=infra Annotations: openshift.io/scc: restricted Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/cloud-network-config-controller-5d8b594b85 Containers: controller: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4dbab8c851e2fcc755ba7269890982ffb8f389d866a88323c1bd3fd8a974137d Image ID: Port: <none> Host Port: <none> Command: /usr/bin/cloud-network-config-controller Args: -platform-type AWS -platform-region=us-east-2 -platform-api-url= -platform-aws-ca-override= -platform-azure-environment= -secret-name cloud-credentials State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 50Mi Environment: CONTROLLER_NAMESPACE: openshift-cloud-network-config-controller (v1:metadata.namespace) CONTROLLER_NAME: cloud-network-config-controller-5d8b594b85-ddpvv (v1:metadata.name) KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.zzhao28092354.qe.devcluster.openshift.com Mounts: /etc/pki/ca-trust/extracted/pem from trusted-ca (ro) /etc/secret/cloudprovider from cloud-provider-secret (ro) /kube-cloud-config from kube-cloud-config (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-99c9j (ro) /var/run/secrets/openshift/serviceaccount from bound-sa-token (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: cloud-provider-secret: Type: Secret (a volume populated by a Secret) SecretName: cloud-credentials Optional: false kube-cloud-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: kube-cloud-config Optional: false trusted-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: trusted-ca Optional: false bound-sa-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3600 kube-api-access-99c9j: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 162m (x6 over 3h12m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[trusted-ca bound-sa-token kube-api-access-99c9j cloud-provider-secret kube-cloud-config]: timed out waiting for the condition Warning FailedMount 58m (x13 over 3h16m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[bound-sa-token kube-api-access-99c9j cloud-provider-secret kube-cloud-config trusted-ca]: timed out waiting for the condition Warning FailedMount 33m (x8 over 3h9m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-cloud-config trusted-ca bound-sa-token kube-api-access-99c9j cloud-provider-secret]: timed out waiting for the condition Warning FailedMount 22m (x11 over 3h) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[kube-api-access-99c9j cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token]: timed out waiting for the condition Warning FailedMount 8m34s (x29 over 3h7m) kubelet Unable to attach or mount volumes: unmounted volumes=[cloud-provider-secret], unattached volumes=[cloud-provider-secret kube-cloud-config trusted-ca bound-sa-token kube-api-access-99c9j]: timed out waiting for the condition Warning FailedMount 3m25s (x104 over 3h18m) kubelet MountVolume.SetUp failed for volume "cloud-provider-secret" : secret "cloud-credentials" not found $ oc get secret -n openshift-cloud-network-config-controller NAME TYPE DATA AGE builder-dockercfg-xtrtr kubernetes.io/dockercfg 1 3h46m builder-token-bv66q kubernetes.io/service-account-token 4 3h46m builder-token-qmlk5 kubernetes.io/service-account-token 4 3h46m cloud-network-config-controller-dockercfg-6fqnk kubernetes.io/dockercfg 1 3h38m cloud-network-config-controller-token-6nhnk kubernetes.io/service-account-token 4 3h38m cloud-network-config-controller-token-t6wbr kubernetes.io/service-account-token 4 3h38m default-dockercfg-5nmnd kubernetes.io/dockercfg 1 3h46m default-token-2kvht kubernetes.io/service-account-token 4 3h46m default-token-jdknq kubernetes.io/service-account-token 4 3h46m deployer-dockercfg-c4zqm kubernetes.io/dockercfg 1 3h46m deployer-token-nlm2s kubernetes.io/service-account-token 4 3h46m deployer-token-p5mf9 kubernetes.io/service-account-token 4 3h46m
> When I'm trying to apply that. it told me it already exist, so the issue is no secret 'cloud-credentials', see below Yes, the cloud-credential-request have already been created by the CNO as seen in the logs in #comment 7. The CCO is however not creating them because this type of a cluster doesn't allow it to automatically generate the secret for the cloud-credential-request. Please have a look with QE engineers for the CCO on what the procedure is for this type of a cluster. I am not sure, but in any case: there still is no evidence linking this problem to anything with the networking components. /Alex
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days