Description of problem: This issue is found in https://bugzilla.redhat.com/show_bug.cgi?id=1747366#c5, create a new bug for tracking Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-15-052022 How reproducible: Always Steps to Reproduce: 1. Drop internet gateway for private subnets in VPC to create a disconnected env 2. Set up a proxy in public subnets, the proxy could be connected both external and internal network. 3. Enable proxy setting in install-config.yaml 4. Trigger a UPI install on aws Actual results: Workers were unable to be registered with API sever. The kubelet log on worker: [core@ip-10-0-60-51 ~]$ journalctl -f -u kubelet -- Logs begin at Tue 2019-09-10 06:33:45 UTC. -- Sep 10 07:49:13 ip-10-0-60-51 hyperkube[1155]: E0910 07:49:13.732272 1155 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: runtimeclasses.node.k8s.io is forbidden: User "system:anonymous" cannot list resource "runtimeclasses" in API group "node.k8s.io" at the cluster scope Sep 10 07:49:13 ip-10-0-60-51 hyperkube[1155]: I0910 07:49:13.741852 1155 reflector.go:161] Listing and watching *v1.Pod from k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47 Sep 10 07:49:13 ip-10-0-60-51 hyperkube[1155]: E0910 07:49:13.757275 1155 kubelet.go:2254] node "ip-10-0-60-51.us-east-2.compute.internal" not found Sep 10 07:49:13 ip-10-0-60-51 hyperkube[1155]: E0910 07:49:13.857404 1155 kubelet.go:2254] node "ip-10-0-60-51.us-east-2.compute.internal" not found Sep 10 07:49:13 ip-10-0-60-51 hyperkube[1155]: E0910 07:49:13.932371 1155 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope Check the kube-controller-manager pod on masters: # oc get pod -n openshift-kube-controller-manager kube-controller-manager-ip-10-0-50-134.us-east-2.compute.internal -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/config.hash: 0548758d75ee5d5fc31bb9d869247f8f kubernetes.io/config.mirror: 0548758d75ee5d5fc31bb9d869247f8f kubernetes.io/config.seen: "2019-09-10T06:32:21.844041416Z" kubernetes.io/config.source: file creationTimestamp: "2019-09-10T06:32:23Z" labels: app: kube-controller-manager kube-controller-manager: "true" revision: "3" name: kube-controller-manager-ip-10-0-50-134.us-east-2.compute.internal namespace: openshift-kube-controller-manager resourceVersion: "23656" selfLink: /api/v1/namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-50-134.us-east-2.compute.internal uid: bf8108ef-d394-11e9-bb55-02f0584464f2 spec: containers: - args: - --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml - --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig - --authentication-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig - --authorization-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig - --client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt - --requestheader-client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt - -v=2 - --tls-cert-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt - --tls-private-key-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.key command: - hyperkube - kube-controller-manager image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: healthz port: 10257 scheme: HTTPS initialDelaySeconds: 45 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 name: kube-controller-manager-3 ports: - containerPort: 10257 hostPort: 10257 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: healthz port: 10257 scheme: HTTPS initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 resources: requests: cpu: 100m memory: 200Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /etc/kubernetes/static-pod-resources name: resource-dir - mountPath: /etc/kubernetes/static-pod-certs name: cert-dir - args: - --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-controller-cert-syncer-kubeconfig/kubeconfig - --namespace=$(POD_NAMESPACE) - --destination-dir=/etc/kubernetes/static-pod-certs command: - cluster-kube-controller-manager-operator - cert-syncer env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bbaf989e425fe444582e9e8ead17a07d3197e2cdf6a45274650e09dbb68f789c imagePullPolicy: IfNotPresent name: kube-controller-manager-cert-syncer-3 resources: requests: cpu: 10m memory: 50Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /etc/kubernetes/static-pod-resources name: resource-dir - mountPath: /etc/kubernetes/static-pod-certs name: cert-dir dnsPolicy: ClusterFirst enableServiceLinks: true hostNetwork: true initContainers: - args: - | echo -n "Waiting for port :10257 to be released." while [ -n "$(lsof -ni :10257)" ]; do echo -n "." sleep 1 done command: - /usr/bin/timeout - "30" - /bin/bash - -c image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 imagePullPolicy: IfNotPresent name: wait-for-host-port resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: FallbackToLogsOnError nodeName: ip-10-0-50-134.us-east-2.compute.internal priority: 2000001000 priorityClassName: system-node-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists - operator: Exists - effect: NoExecute operator: Exists volumes: - hostPath: path: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-3 type: "" name: resource-dir - hostPath: path: /etc/kubernetes/static-pod-resources/kube-controller-manager-certs type: "" name: cert-dir status: conditions: - lastProbeTime: null lastTransitionTime: "2019-09-10T06:32:23Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-09-10T07:46:56Z" message: 'containers with unready status: [kube-controller-manager-3]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2019-09-10T07:46:56Z" message: 'containers with unready status: [kube-controller-manager-3]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-09-10T06:30:04Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://d183da34c9f8c054c398a07768fe8ef4f45b0a7e6443363b151c23a1437ed71b image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 lastState: terminated: containerID: cri-o://d183da34c9f8c054c398a07768fe8ef4f45b0a7e6443363b151c23a1437ed71b exitCode: 255 finishedAt: "2019-09-10T07:46:55Z" message: | nager.svc,kube-controller-manager.openshift-kube-controller-manager.svc.cluster.local] issuer="openshift-service-serving-signer@1568096957" (2019-09-10 06:29:30 +0000 UTC to 2021-09-09 06:29:31 +0000 UTC (now=2019-09-10 07:42:37.688214235 +0000 UTC)) I0910 07:42:37.688258 1 serving.go:196] [1] "/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt" serving certificate: "openshift-service-serving-signer@1568096957" [] issuer="<self>" (2019-09-10 06:29:16 +0000 UTC to 2020-09-09 06:29:17 +0000 UTC (now=2019-09-10 07:42:37.688251173 +0000 UTC)) I0910 07:42:37.688273 1 secure_serving.go:125] Serving securely on [::]:10257 I0910 07:42:37.688356 1 serving.go:78] Starting DynamicLoader I0910 07:42:37.688479 1 leaderelection.go:217] attempting to acquire leader lease kube-system/kube-controller-manager... I0910 07:44:55.090965 1 leaderelection.go:227] successfully acquired lease kube-system/kube-controller-manager I0910 07:44:55.091000 1 event.go:209] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"kube-controller-manager", UID:"0e994c7f-d394-11e9-bb55-02f0584464f2", APIVersion:"v1", ResourceVersion:"23218", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-50-134_8f79219b-d39e-11e9-b065-02b94235e0a8 became leader W0910 07:44:55.110581 1 plugins.go:118] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release I0910 07:44:55.110750 1 aws.go:1171] Building AWS cloudprovider I0910 07:44:55.110792 1 aws.go:1137] Zone not specified in configuration file; querying AWS metadata service F0910 07:46:55.417060 1 controllermanager.go:235] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-000f41ff52db3f499: "error listing AWS instances: \"RequestError: send request failed\\ncaused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp 52.95.16.2:443: i/o timeout\"" reason: Error startedAt: "2019-09-10T07:42:37Z" name: kube-controller-manager-3 ready: false restartCount: 10 state: waiting: message: Back-off 5m0s restarting failed container=kube-controller-manager-3 pod=kube-controller-manager-ip-10-0-50-134.us-east-2.compute.internal_openshift-kube-controller-manager(0548758d75ee5d5fc31bb9d869247f8f) reason: CrashLoopBackOff - containerID: cri-o://a7d91919e26f2f29f289bc8d1c60d7421ddce995e672ee289a78873504eda12e image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bbaf989e425fe444582e9e8ead17a07d3197e2cdf6a45274650e09dbb68f789c imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bbaf989e425fe444582e9e8ead17a07d3197e2cdf6a45274650e09dbb68f789c lastState: {} name: kube-controller-manager-cert-syncer-3 ready: true restartCount: 0 state: running: startedAt: "2019-09-10T06:32:23Z" hostIP: 10.0.50.134 initContainerStatuses: - containerID: cri-o://6b205452cebfe970f17c6bf6c43be694b153c9f11f01fa82d439db37e5cd1982 image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:139a691d4372f9deab8510d84fed50d126d6dff42d42b09b0c80d82c7df6c8a9 lastState: {} name: wait-for-host-port ready: true restartCount: 0 state: terminated: containerID: cri-o://6b205452cebfe970f17c6bf6c43be694b153c9f11f01fa82d439db37e5cd1982 exitCode: 0 finishedAt: "2019-09-10T06:32:23Z" reason: Completed startedAt: "2019-09-10T06:32:22Z" phase: Running podIP: 10.0.50.134 qosClass: Burstable startTime: "2019-09-10T06:30:04Z" From the log, proxy setting is not injected into controller-manager pod. Expected results: controller-manager should respect proxy setting Additional info: The present workaround is adding privatlink for the vpc to access ec2 endpoint.
I've discussed this with Michal Fojtik and Tomas Nozicka and we're having a hard time trying to justify using PROXY to access infrastructure components. I'll defer to architects to make the call, until then I'm moving the target release for this to 4.3.
This bug is also affect disconnected install on aws. @Stephen, if this bug would not be fixed in 4.2, that means we still need mix proxy and vpc endpints for disconnected install on aws.
(In reply to Johnny Liu from comment #2) > This bug is also affect disconnected install on aws. > @Stephen, if this bug would not be fixed in 4.2, that means we still need > mix proxy and vpc endpints for disconnected install on aws. https://bugzilla.redhat.com/show_bug.cgi?id=1743483#c40
controllermanager.go:235] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-000f41ff52db3f499: "error listing AWS instances: \"RequestError: send request failed\\ncaused by: Post https://ec2.us-east-2.amazonaws.com/: dial tcp 52.95.16.2:443: i/o timeout\"" according to the above error, it appears that this call is not being proxied. Otherwise 'proxyconnect' would be used instead of 'dial'. Can you verify reachability to 52.95.16.2? You can also add `.amazonaws.com` to noProxy to ensure the call is bypassing the proxy.
(In reply to Daneyon Hansen from comment #7) > controllermanager.go:235] error building controller context: cloud provider > could not be initialized: could not init cloud provider "aws": error finding > instance i-000f41ff52db3f499: "error listing AWS instances: \"RequestError: > send request failed\\ncaused by: Post https://ec2.us-east-2.amazonaws.com/: > dial tcp 52.95.16.2:443: i/o timeout\"" > > according to the above error, it appears that this call is not being > proxied. Otherwise 'proxyconnect' would be used instead of 'dial'. Can you > verify reachability to 52.95.16.2? You can also add `.amazonaws.com` to > noProxy to ensure the call is bypassing the proxy. Just like what is mentioned in commeNt 0, the instance have no any reachability to internet (including 52.95.16.2). I am very sure the call never get into proxy (also confirmed from proxy log). The bug is requesting that controllermanager should set proxy when proxy is enabled in install-config.yaml. In my testing, I found kubelet service is initializing its cloudprovider via proxy, why controllermanager not?
This was fixed in https://github.com/openshift/cluster-kube-controller-manager-operator/pull/285
Verified this bug with 4.3.0-0.nightly-2019-10-16-010826, and PASS. $ oc get pod -n openshift-kube-controller-manager kube-controller-manager-ip-10-0-54-121.us-east-2.compute.internal -o yaml|grep -i proxy -A 1 - name: HTTPS_PROXY value: http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 - name: HTTP_PROXY value: http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 - name: NO_PROXY value: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu-42dis8.qe.devcluster.openshift.com,etcd-0.jialiu-42dis8.qe.devcluster.openshift.com,etcd-1.jialiu-42dis8.qe.devcluster.openshift.com,etcd-2.jialiu-42dis8.qe.devcluster.openshift.com,localhost,test.no-proxy.com image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b75a6ed0539724dbdc98c60574254951dd7d435bb3c5816acdcba56df3f410b1 -- - name: HTTPS_PROXY value: http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 - name: HTTP_PROXY value: http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 - name: NO_PROXY value: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu-42dis8.qe.devcluster.openshift.com,etcd-0.jialiu-42dis8.qe.devcluster.openshift.com,etcd-1.jialiu-42dis8.qe.devcluster.openshift.com,etcd-2.jialiu-42dis8.qe.devcluster.openshift.com,localhost,test.no-proxy.com image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dca21970371f9aacb902a04f5e0eed4117cf714a4c7e45ca950175b840b291a9
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062