1745807 – [Proxy]console operator reporting Available False

Bug 1745807 - [Proxy]console operator reporting Available False

Summary: [Proxy]console operator reporting Available False

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Management Console
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Samuel Padgett
QA Contact:	Yadan Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-27 01:23 UTC by Daneyon Hansen
Modified:	2019-10-16 06:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:37:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 2506	0	'None'	closed	Bug 1745807: use kubernetes.default.svc to talk to API server	2021-02-03 17:19:10 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:38:02 UTC

Description Daneyon Hansen 2019-08-27 01:23:06 UTC

Description of problem:
console-operator reports Available=False for an IPI install with proxy enabled. Note: `httpProxy` and `additionalTrustBundle` is specified in install config.

Version-Release number of selected component (if applicable):
4.2.0-0.okd-2019-08-26-235222

How reproducible:
Every time

Steps to Reproduce:
1. Install a cluster with proxy enabled, i.e. http://pastebin.test.redhat.com/789782

$ oc get proxy/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2019-08-27T00:37:55Z"
  generation: 1
  name: cluster
  resourceVersion: "1782"
  selfLink: /apis/config.openshift.io/v1/proxies/cluster
  uid: e96104b9-c862-11e9-a745-022e53c59f42
spec:
  httpProxy: http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3129
  trustedCA:
    name: user-ca-bundle
status:
  httpProxy: http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3129
  noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jcallen-proxy.devcluster.openshift.com,api.jcallen-proxy.devcluster.openshift.com,etcd-0.jcallen-proxy.devcluster.openshift.com,etcd-1.jcallen-proxy.devcluster.openshift.com,etcd-2.jcallen-proxy.devcluster.openshift.com,localhost


2. Check the clusteroperator status:

$ oc get clusteroperator/console -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-08-27T00:45:10Z"
  generation: 1
  name: console
  resourceVersion: "12223"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/console
  uid: ec358346-c863-11e9-a014-020dde85a40c
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-08-27T00:47:50Z"
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-08-27T00:45:11Z"
    message: 'ResourceSyncUpdatesInProgressProgressing: Working toward version 4.2.0-0.okd-2019-08-26-235222'
    reason: ResourceSyncUpdatesInProgressProgressingResourceSyncUpdatesInProgress
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-08-27T00:47:50Z"
    message: |-
      DeploymentIsReadyAvailable: 0 pods available for console deployment
      DeploymentIsUpdatedAvailable: 0 replicas ready at version 4.2.0-0.okd-2019-08-26-235222
    reason: MultipleConditionsMatching
    status: "False"
    type: Available
  - lastTransitionTime: "2019-08-27T00:45:10Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: oauth.openshift.io
    name: console
    resource: oauthclients
  - group: ""
    name: openshift-console-operator
    resource: namespaces
  - group: ""
    name: openshift-console
    resource: namespaces
  - group: ""
    name: console-public
    namespace: openshift-config-managed
    resource: configmaps
  versions:
  - name: operator
    version: 4.2.0-0.okd-2019-08-26-235222


Actual results:
Available=False

Expected results:
Available=True

Additional info:

$ oc get all -n openshift-console-operator
NAME                                    READY   STATUS    RESTARTS   AGE
pod/console-operator-779c7bb45d-5bx6g   1/1     Running   0          32m

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/console-operator   1/1     1            1           33m

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/console-operator-779c7bb45d   1         1         1       33m

$ oc get all -n openshift-console
NAME                             READY   STATUS    RESTARTS   AGE
pod/console-5cdb5b9b46-9xq2d     0/1     Running   8          31m
pod/console-5cdb5b9b46-sds4t     0/1     Running   8          31m
pod/downloads-66ccc9574d-k6x95   1/1     Running   0          34m
pod/downloads-66ccc9574d-rvqp8   1/1     Running   0          34m

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/console     ClusterIP   172.30.123.98    <none>        443/TCP   31m
service/downloads   ClusterIP   172.30.188.101   <none>        80/TCP    35m

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/console     0/2     2            0           31m
deployment.apps/downloads   2/2     2            2           35m

NAME                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/console-5cdb5b9b46     2         2         0       31m
replicaset.apps/downloads-66ccc9574d   2         2         2       35m

NAME                                 HOST/PORT                                                                 PATH   SERVICES    PORT    TERMINATION          WILDCARD
route.route.openshift.io/console     console-openshift-console.apps.jcallen-proxy.devcluster.openshift.com            console     https   reencrypt/Redirect   None
route.route.openshift.io/downloads   downloads-openshift-console.apps.jcallen-proxy.devcluster.openshift.com          downloads   http    edge                 None

$ oc get deployment.apps/console -n openshift-console -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    console.openshift.io/console-config-version: "12048"
    console.openshift.io/image: registry.svc.ci.openshift.org/origin/4.2-2019-08-26-235222@sha256:08181ba95b5fb4933e98915713912324e58d83303cd08be6664e6eaeb2f047f9
    console.openshift.io/oauth-secret-version: "12053"
    console.openshift.io/service-ca-config-version: "12051"
    deployment.kubernetes.io/revision: "1"
    operator.openshift.io/pull-spec: registry.svc.ci.openshift.org/origin/4.2-2019-08-26-235222@sha256:08181ba95b5fb4933e98915713912324e58d83303cd08be6664e6eaeb2f047f9
  creationTimestamp: "2019-08-27T00:47:50Z"
  generation: 2
  labels:
    app: console
    component: ui
  name: console
  namespace: openshift-console
  resourceVersion: "16691"
  selfLink: /apis/apps/v1/namespaces/openshift-console/deployments/console
  uid: 4bc6403b-c864-11e9-a014-020dde85a40c
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: console
      component: ui
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        console.openshift.io/console-config-version: "12048"
        console.openshift.io/image: registry.svc.ci.openshift.org/origin/4.2-2019-08-26-235222@sha256:08181ba95b5fb4933e98915713912324e58d83303cd08be6664e6eaeb2f047f9
        console.openshift.io/oauth-secret-version: "12053"
        console.openshift.io/service-ca-config-version: "12051"
      creationTimestamp: null
      labels:
        app: console
        component: ui
      name: console
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: console
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - command:
        - /opt/bridge/bin/bridge
        - --public-dir=/opt/bridge/static
        - --config=/var/console-config/console-config.yaml
        - --service-ca-file=/var/service-ca/service-ca.crt
        env:
        - name: HTTP_PROXY
          value: http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3129
        - name: NO_PROXY
          value: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jcallen-proxy.devcluster.openshift.com,api.jcallen-proxy.devcluster.openshift.com,etcd-0.jcallen-proxy.devcluster.openshift.com,etcd-1.jcallen-proxy.devcluster.openshift.com,etcd-2.jcallen-proxy.devcluster.openshift.com,localhost
        image: registry.svc.ci.openshift.org/origin/4.2-2019-08-26-235222@sha256:08181ba95b5fb4933e98915713912324e58d83303cd08be6664e6eaeb2f047f9
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8443
            scheme: HTTPS
          initialDelaySeconds: 150
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: console
        ports:
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8443
            scheme: HTTPS
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          requests:
            cpu: 10m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/serving-cert
          name: console-serving-cert
          readOnly: true
        - mountPath: /var/oauth-config
          name: console-oauth-config
          readOnly: true
        - mountPath: /var/console-config
          name: console-config
          readOnly: true
        - mountPath: /var/service-ca
          name: service-ca
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-reachable
        operator: Exists
        tolerationSeconds: 120
      volumes:
      - name: console-serving-cert
        secret:
          defaultMode: 420
          secretName: console-serving-cert
      - name: console-oauth-config
        secret:
          defaultMode: 420
          secretName: console-oauth-config
      - configMap:
          defaultMode: 420
          name: console-config
        name: console-config
      - configMap:
          defaultMode: 420
          name: service-ca
        name: service-ca
status:
  conditions:
  - lastTransitionTime: "2019-08-27T00:47:53Z"
    lastUpdateTime: "2019-08-27T00:47:53Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2019-08-27T00:57:54Z"
    lastUpdateTime: "2019-08-27T00:57:54Z"
    message: ReplicaSet "console-5cdb5b9b46" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 2
  replicas: 2
  unavailableReplicas: 2
  updatedReplicas: 2

Comment 1 Daneyon Hansen 2019-08-27 01:24:31 UTC

This bug is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1744532

Comment 2 Samuel Padgett 2019-08-27 02:49:37 UTC

Note that not all of the console proxy changes were in place until 4.2.0-0.okd-2019-08-27-015622

Comment 3 Samuel Padgett 2019-08-27 02:52:56 UTC

Assigning to Jakub to investigate. It would be good to test a build that has https://github.com/openshift/console-operator/pull/265

Comment 4 Daneyon Hansen 2019-08-27 17:13:05 UTC

I confirmed my installer version is includes the latest console/console-operator pr's:

$ oc adm release info registry.svc.ci.openshift.org/origin/release:4.2 --commits | grep console
  console                                       https://github.com/openshift/console                                       5f1228ea437b36554be2373a416560101ffc458d
  console-operator                              https://github.com/openshift/console-operator                              fc1858166b63c784b866f3dfae0647d7efd024f7

I see the same issue with the console-operator:
$ oc get clusteroperators
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.okd-2019-08-27-162528   True        False         False      5m56s
cloud-credential                           4.2.0-0.okd-2019-08-27-162528   True        False         False      20m
cluster-autoscaler                         4.2.0-0.okd-2019-08-27-162528   True        False         False      16m
console                                    4.2.0-0.okd-2019-08-27-162528   False       True          False      12m

$ oc get clusteroperator/console -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-08-27T16:53:32Z"
  generation: 1
  name: console
  resourceVersion: "13189"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/console
  uid: 33e3d777-c8eb-11e9-8bf2-0a5cd0a7614c
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-08-27T16:57:31Z"
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-08-27T16:53:33Z"
    message: 'ResourceSyncUpdatesInProgressProgressing: Working toward version 4.2.0-0.okd-2019-08-27-162528'
    reason: ResourceSyncUpdatesInProgressProgressingResourceSyncUpdatesInProgress
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-08-27T16:57:31Z"
    message: |-
      DeploymentIsReadyAvailable: 0 pods available for console deployment
      DeploymentIsUpdatedAvailable: 0 replicas ready at version 4.2.0-0.okd-2019-08-27-162528
    reason: MultipleConditionsMatching
    status: "False"
    type: Available
  - lastTransitionTime: "2019-08-27T16:53:32Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: oauth.openshift.io
    name: console
    resource: oauthclients
  - group: ""
    name: openshift-console-operator
    resource: namespaces
  - group: ""
    name: openshift-console
    resource: namespaces
  - group: ""
    name: console-public
    namespace: openshift-config-managed
    resource: configmaps
  versions:
  - name: operator
    version: 4.2.0-0.okd-2019-08-27-162528

Comment 5 Samuel Padgett 2019-08-27 17:42:47 UTC

Can you include the pod logs for one of the console pods in the openshift-console namespace?

Comment 6 Daneyon Hansen 2019-08-27 19:18:47 UTC

After adding '.${AWS_REGION}.compute.internal' allowed me to get past this error when trying to get the console pod's logs:

$ oc logs pod/console-54bc64bd44-8lsmc -n openshift-console
Error from server: Get https://ip-10-0-167-253.us-west-2.compute.internal:10250/containerLogs/openshift-console/console-54bc64bd44-8lsmc/console: x509: certificate signed by unknown authority

I can now view the console's logs and see the following:

$ oc logs pod/console-698cf4b87c-b5gdk -n openshift-console
2019/08/27 19:08:13 cmd/main: cookies are secure!
2019/08/27 19:08:18 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/08/27 19:08:33 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/08/27 19:08:48 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)


and clusteroperator/console still shows Available=False.
$ oc get clusteroperator/console
NAME      VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
console   4.2.0-0.okd-2019-08-27-174841   False       True          False      14m

The console  pod is properly configured for proxy and the ca trust bundle:
$ oc get pod/console-698cf4b87c-b5gdk -n openshift-console -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    console.openshift.io/console-config-version: "13114"
    console.openshift.io/image: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
    console.openshift.io/oauth-secret-version: "13151"
    console.openshift.io/service-ca-config-version: "13117"
    console.openshift.io/trusted-ca-config-version: "13119"
    openshift.io/scc: restricted
    operator.openshift.io/pull-spec: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
  creationTimestamp: "2019-08-27T19:02:11Z"
  generateName: console-698cf4b87c-
  labels:
    app: console
    component: ui
    pod-template-hash: 698cf4b87c
  name: console-698cf4b87c-b5gdk
  namespace: openshift-console
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: console-698cf4b87c
    uid: 2ca08267-c8fd-11e9-86e7-02dce463c496
  resourceVersion: "16804"
  selfLink: /api/v1/namespaces/openshift-console/pods/console-698cf4b87c-b5gdk
  uid: 2ca30d98-c8fd-11e9-86e7-02dce463c496
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              app: console
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - command:
    - /opt/bridge/bin/bridge
    - --public-dir=/opt/bridge/static
    - --config=/var/console-config/console-config.yaml
    - --service-ca-file=/var/service-ca/service-ca.crt
    env:
    - name: HTTPS_PROXY
      value: http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3128
    - name: HTTP_PROXY
      value: http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3128
    - name: NO_PROXY
      value: .cluster.local,.svc,.us-west-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.no-mitm-proxy.devcluster.openshift.com,api.no-mitm-proxy.devcluster.openshift.com,etcd-0.no-mitm-proxy.devcluster.openshift.com,etcd-1.no-mitm-proxy.devcluster.openshift.com,etcd-2.no-mitm-proxy.devcluster.openshift.com,localhost
    image: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8443
        scheme: HTTPS
      initialDelaySeconds: 150
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: console
    ports:
    - containerPort: 443
      name: https
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
        memory: 100Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000460000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /var/serving-cert
      name: console-serving-cert
      readOnly: true
    - mountPath: /var/oauth-config
      name: console-oauth-config
      readOnly: true
    - mountPath: /var/console-config
      name: console-config
      readOnly: true
    - mountPath: /var/service-ca
      name: service-ca
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-n879p
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-5mcrx
  nodeName: ip-10-0-129-230.us-west-2.compute.internal
  nodeSelector:
    node-role.kubernetes.io/master: ""
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000460000
    seLinuxOptions:
      level: s0:c21,c20
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-reachable
    operator: Exists
    tolerationSeconds: 120
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  volumes:
  - name: console-serving-cert
    secret:
      defaultMode: 420
      secretName: console-serving-cert
  - name: console-oauth-config
    secret:
      defaultMode: 420
      secretName: console-oauth-config
  - configMap:
      defaultMode: 420
      name: console-config
    name: console-config
  - configMap:
      defaultMode: 420
      name: service-ca
    name: service-ca
  - name: default-token-n879p
    secret:
      defaultMode: 420
      secretName: default-token-n879p
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-08-27T19:02:11Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-08-27T19:02:11Z"
    message: 'containers with unready status: [console]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-08-27T19:02:11Z"
    message: 'containers with unready status: [console]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-08-27T19:02:11Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://b66c0e5ea57e505685dd2947e83cc9c666fb4c24378486f9ef53deb33c462682
    image: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
    imageID: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
    lastState:
      terminated:
        containerID: cri-o://5ede0688927ea5f7a8dc0904d235056bb025a421fdc2c74de8d03abc76c2b0b2
        exitCode: 2
        finishedAt: "2019-08-27T19:11:12Z"
        message: |
          for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:09:18 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:09:33 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:09:48 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:10:03 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:10:18 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:10:33 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:10:48 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/08/27 19:11:03 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
        reason: Error
        startedAt: "2019-08-27T19:08:13Z"
    name: console
    ready: false
    restartCount: 3
    state:
      running:
        startedAt: "2019-08-27T19:11:13Z"
  hostIP: 10.0.129.230
  phase: Running
  podIP: 10.129.0.33
  qosClass: Burstable
  startTime: "2019-08-27T19:02:11Z"

Comment 7 Daneyon Hansen 2019-08-27 19:32:21 UTC

I can see the correct proxy env vars in the console container:

$ oc exec -it console-698cf4b87c-b5gdk -c console -n openshift-console env | grep PROX
HTTPS_PROXY=http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3128
HTTP_PROXY=http://jcallen:6cpbEH6uCepwEhNr2iB05ixP@52.73.102.120:3128
NO_PROXY=.cluster.local,.svc,.us-west-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.no-mitm-proxy.devcluster.openshift.com,api.no-mitm-proxy.devcluster.openshift.com,etcd-0.no-mitm-proxy.devcluster.openshift.com,etcd-1.no-mitm-proxy.devcluster.openshift.com,etcd-2.no-mitm-proxy.devcluster.openshift.com,localhost

Comment 8 Daneyon Hansen 2019-08-27 21:36:22 UTC

It appears that the issue is no_proxy cidr for the svc net (172.16.0.0/16) is not being respected by the cluster-operator container. After manually adding 172.30.0.1 to no_proxy, the cluster operator contain started:

$ oc get deployment.apps/console -n openshift-console -o yaml | grep image
    console.openshift.io/image: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
        console.openshift.io/image: registry.svc.ci.openshift.org/origin/4.2-2019-08-27-174841@sha256:2ea78cea52c263c6075cb839419d52549694ea7b30d6fa843c88a0b50658988d
        image: quay.io/spadgett/origin-console:k8s-svc
        imagePullPolicy: IfNotPresent

$ oc get po -n openshift-console
NAME                         READY   STATUS    RESTARTS   AGE
console-5ff9d75bd8-5phzl     1/1     Running   0          3m28s
console-5ff9d75bd8-pdx4v     1/1     Running   0          3m58s
downloads-5fcfb5447c-7pzt9   1/1     Running   0          77m
downloads-5fcfb5447c-xhg6j   1/1     Running   0          77m

$ oc logs pod/console-5ff9d75bd8-5phzl -n openshift-console
2019/08/27 20:11:54 cmd/main: cookies are secure!
2019/08/27 20:11:55 cmd/main: Binding to 0.0.0.0:8443...
2019/08/27 20:11:55 cmd/main: using TLS

Note: I am using console image 'quay.io/spadgett/origin-console' that implements the fix.

It appears the the console bin is built using a version of go prior to this fix: https://github.com/golang/go/issues/16704

Comment 10 shahan 2019-09-03 07:25:24 UTC

$ oc get clusteroperator
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2019-09-02-172410   True        False         False      88m
cloud-credential                           4.2.0-0.nightly-2019-09-02-172410   True        False         False      101m
cluster-autoscaler                         4.2.0-0.nightly-2019-09-02-172410   True        False         False      97m
console                                    4.2.0-0.nightly-2019-09-02-172410   True        False         False      91m
dns                                        4.2.0-0.nightly-2019-09-02-172410   True        False         False      101m
...

$ oc exec  -n openshift-console console-7d7ff64b59-hgs6t  -i -t -- env |grep proxy
HTTPS_PROXY=https://proxy-user***@10.0.***8:3130
HTTP_PROXY=http://proxy-user1:J**K@10.0**8:3128
NO_PROXY=.cluster.local,.svc,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.wzheng93.qe.***,api.wzheng93.qe.**,etcd-0.wzheng93.qe.**,etcd-1.wzheng93.***,etcd-2.wzheng93.**,localhost,test.no-proxy.com


[hasha@fedora_pc ~]$ oc get proxy -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Proxy
  metadata:
    creationTimestamp: "2019-09-03T05:35:39Z"
    generation: 1
    name: cluster
    resourceVersion: "1538"
    selfLink: /apis/config.openshift.io/v1/proxies/cluster
    uid: a9a2cd59-ce0c-11e9-a7fe-fa163ea1cd26
  spec:
    httpProxy: http://proxy-user1:***:3128
    httpsProxy: https://proxy-user1:***:3130
    noProxy: test.no-proxy.com
    trustedCA:
      name: user-ca-bundle
  status:
    httpProxy: http://proxy-user1:***:3128
    httpsProxy: https://proxy-user1:***:3130
    noProxy: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,172.30.0.0/16,api-int.wzheng93.***,api.wzheng93.qe.**,etcd-0.wzheng93.**,etcd-1.wzheng93.**,etcd-2.wzheng93.**,localhost,test.no-proxy.com
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
 
Console works well with https proxy enabled.
4.2.0-0.nightly-2019-09-02-172410

Comment 11 W. Trevor King 2019-10-09 18:52:38 UTC

(In reply to shahan from comment #10)
> Console works well with https proxy enabled.
> 4.2.0-0.nightly-2019-09-02-172410

I dunno what the flake rate is, but I think I see this same issue in this 4.2.0-rc.0->4.2.0-rc.3 upgrade test [1,2]:

    lastState:
      terminated:
        containerID: cri-o://2a94ca623772ba2a7362e1b6e4044b921177be9e1bb46223baf480be997b5669
        exitCode: 2
        finishedAt: 2019-10-09T15:41:12Z
        message: |
          eaders)
          2019/10/9 15:39:17 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:39:32 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:39:47 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:40:02 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:40:17 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:40:32 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:40:47 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
          2019/10/9 15:41:02 auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
        reason: Error
        startedAt: 2019-10-09T15:38:12Z

Despite 4.2.0-rc.0 being cut from 4.2.0-0.nightly-2019-10-01-124419 [3].

From comment 8:

> It appears the the console bin is built using a version of go prior to this fix: https://github.com/golang/go/issues/16704

Do we know which versions of Go have that fix? 

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2/11
[2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.2/11/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-2130377ba7ab9dbed8350c52b098dae1575a7dbafe279f8c013e6455d2da6a93/namespaces/openshift-console/pods/console-7d756bff-5s8q5/console-7d756bff-5s8q5.yaml
[3]: https://openshift-release.svc.ci.openshift.org/releasestream/4-stable/release/4.2.0-rc.0

Comment 12 W. Trevor King 2019-10-09 18:54:55 UTC

Looks like at least Go 1.11 [1].  Not sure if there were backports to earlier releases.

[1]: https://golang.org/doc/go1.11#net/http

Comment 13 W. Trevor King 2019-10-09 19:15:12 UTC

Using [1]:

$ oc adm release info --image-for=console quay.io/openshift-release-dev/ocp-release:4.2.0-rc.0
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a17711f3bba31e3c9418751804f2ec91e071515e69363688714e3b4ee152804
$ oc image extract --file /opt/bridge/bin/bridge quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a17711f3bba31e3c9418751804f2ec91e071515e69363688714e3b4ee152804
$ gdb bridge 
(gdb) p 'runtime.buildVersion'
$1 = 10148116

But I'm not sure how to translate that into a major/minor/patch yet.

[1]: https://dave.cheney.net/2017/06/20/how-to-find-out-which-go-version-built-your-binary

Comment 14 W. Trevor King 2019-10-09 19:36:42 UTC

Luke dug the current 4.2.0-rc* Go version out of the build logs [1]: openshift-golang-builder-container-v1.11.13-3.1

So the issue I'm seeing with the 4.2.0-rc.0->4.2.0-rc.3 upgrade may be a different issue with the same symptoms.

[1]: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=982761

Comment 15 W. Trevor King 2019-10-09 20:56:45 UTC

Somehow I missed the fact that *this* bug is proxy-specific while my 4.2.0-rc.0->4.2.0-rc.3 upgrade issue had no proxy involved.  Spun the upgrade off into bug 1760103.

Comment 16 errata-xmlrpc 2019-10-16 06:37:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.