Hide Forgot
Description of problem: If ingress is not completely linked end-to-end before console-operator does its health check, the console ClusterOperator will report Degraded=True for a long time because it does not recheck the health unless something triggers a sync in the controller. https://github.com/openshift/console-operator/blob/fc764e21d15174c495a8b79aa46d4f22158f1049/pkg/console/controllers/route/controller.go#L150 Version-Release number of selected component (if applicable): 4.7.5 How reproducible: Always when console comes up before ingress Steps to Reproduce: 1. 2. 3. Actual results: console ClusterOperator is degraded even though the health route is working Expected results: console ClusterOperator should check health route periodically to minimize time between the health route starting to work and Degraded condition on the console ClusterOperator being set False. Additional info: We have a test that is waiting for all ClusterOperators to be Available=True and Degraded=False as a metrics of cluster installation completion. However, this does not work at the moment with console being Degraded for a long (bound unknown) period of time.
1. Fist set deployment/console-operator unmanaged then update deployment/console-operator to a higher log level, such as Debug $ cat <<EOF >version-patch-first-override.yaml - op: add path: /spec/overrides value: - kind: Deployment group: apps/v1 name: console-operator namespace: openshift-console-operator unmanaged: true EOF $ oc patch clusterversion version --type json -p "$(cat version-patch-first-override.yaml)" $ oc edit deployment console-operator containers: - args: - -v=2 => change it to 4 and Save - --config=/var/run/configmaps/config/controller-config.yaml command: - console - operator 2. Check new console-operator pod logs $ oc get pods -n openshift-console-operator NAME READY STATUS RESTARTS AGE console-operator-d6bb5df8d-2dp9h 1/1 Running 0 3m $ oc logs -f console-operator-d6bb5df8d-2dp9h ..... I0409 05:55:13.234330 1 operator.go:177] started syncing operator "cluster" (2021-04-09 05:55:13.234323559 +0000 UTC m=+86.734094448) ... I0409 05:56:13.233622 1 operator.go:177] started syncing operator "cluster" (2021-04-09 05:56:13.233613838 +0000 UTC m=+146.733384727) ..... I0409 05:57:13.236961 1 operator.go:177] started syncing operator "cluster" (2021-04-09 05:57:13.236956616 +0000 UTC m=+206.736727605) I0409 05:57:13.243762 1 apps.go:154] Deployment "openshift-console/downloads" changes: {"spec":{"progressDeadlineSeconds":null,"revisionHistoryLimit":null,"strategy":{"rollingUpdate":null,"type":null},"template":{"spec":{"containers":[{"args":["-c","cat \u003c\u003cEOF \u003e\u003e/tmp/serve.py\nimport errno, http.server, os, re, signal, socket, sys, tarfile, tempfile, threading, time, zipfile\n\nsignal.signal(signal.SIGTERM, lambda signum, frame: sys.exit(0))\n\ndef write_index(path, message):\n\twith open(path, 'wb') as f:\n\t\tf.write('\\n'.join([\n\t\t\t'\u003c!doctype html\u003e',\n\t\t\t'\u003chtml lang=\"en\"\u003e',\n\t\t\t'\u003chead\u003e',\n\t\t\t' \u003cmeta charset=\"utf-8\"\u003e',\n\t\t\t'\u003c/head\u003e',\n\t\t\t'\u003cbody\u003e',\n\t\t\t' {}'.format(message),\n\t\t\t'\u003c/body\u003e',\n\t\t\t'\u003c/html\u003e',\n\t\t\t'',\n\t\t]).encode('utf-8'))\n\n# Launch multiple listeners as threads\nclass Thread(threading.Thread):\n\tdef __init__(self, i, socket):\n\t\tthreading.Thread.__init__(self)\n\t\tself.i = i\n\t\tself.socket = socket\n\t\tself.daemon = True\n\t\tself.start()\n\n\tdef run(self):\n\t\thttpd = http.server.HTTPServer(addr, http.server.SimpleHTTPRequestHandler, False)\n\n\t\t# Prevent the HTTP server from re-binding every handler.\n\t\t# https://stackoverflow.com/questions/46210672/\n\t\thttpd.socket = self.socket\n\t\thttpd.server_bind = self.server_close = lambda self: None\n\n\t\thttpd.serve_forever()\n\ntemp_dir = tempfile.mkdtemp()\nprint('serving from {}'.format(temp_dir))\nos.chdir(temp_dir)\nfor arch in ['amd64']:\n\tos.mkdir(arch)\n\tfor operating_system in ['linux', 'mac', 'windows']:\n\t\tos.mkdir(os.path.join(arch, operating_system))\nfor arch in ['arm64', 'ppc64le', 's390x']:\n\tos.mkdir(arch)\n\tfor operating_system in ['linux']:\n\t\tos.mkdir(os.path.join(arch, operating_system))\ncontent = ['\u003ca href=\"oc-license\"\u003elicense\u003c/a\u003e']\nos.symlink('/usr/share/openshift/LICENSE', 'oc-license')\n\nfor arch, operating_system, path in [\n\t\t('amd64', 'linux', '/usr/share/openshift/linux_amd64/oc'),\n\t\t('amd64', 'mac', '/usr/share/openshift/mac/oc'),\n\t\t('amd64', 'windows', '/usr/share/openshift/windows/oc.exe'),\n\t\t('arm64', 'linux', '/usr/share/openshift/linux_arm64/oc'),\n\t\t('ppc64le', 'linux', '/usr/share/openshift/linux_ppc64le/oc'),\n\t\t('s390x', 'linux', '/usr/share/openshift/linux_s390x/oc'),\n\t\t]:\n\tbasename = os.path.basename(path)\n\ttarget_path = os.path.join(arch, operating_system, basename)\n\tos.symlink(path, target_path)\n\tbase_root, _ = os.path.splitext(basename)\n\tarchive_path_root = os.path.join(arch, operating_system, base_root)\n\twith tarfile.open('{}.tar'.format(archive_path_root), 'w') as tar:\n\t\ttar.add(path, basename)\n\twith zipfile.ZipFile('{}.zip'.format(archive_path_root), 'w') as zip:\n\t\tzip.write(path, basename)\n\tcontent.append('\u003ca href=\"{0}\"\u003eoc ({1} {2})\u003c/a\u003e (\u003ca href=\"{0}.tar\"\u003etar\u003c/a\u003e \u003ca href=\"{0}.zip\"\u003ezip\u003c/a\u003e)'.format(target_path, arch, operating_system))\n\nfor root, directories, filenames in os.walk(temp_dir):\n\troot_link = os.path.relpath(temp_dir, os.path.join(root, 'child')).replace(os.path.sep, '/')\n\tfor directory in directories:\n\t\twrite_index(\n\t\t\tpath=os.path.join(root, directory, 'index.html'),\n\t\t\tmessage='\u003cp\u003eDirectory listings are disabled. See \u003ca href=\"{}\"\u003ehere\u003c/a\u003e for available content.\u003c/p\u003e'.format(root_link),\n\t\t)\n\nwrite_index(\n\tpath=os.path.join(temp_dir, 'index.html'),\n\tmessage='\\n'.join(\n\t\t['\u003cul\u003e'] +\n\t\t[' \u003cli\u003e{}\u003c/li\u003e'.format(entry) for entry in content] +\n\t\t['\u003c/ul\u003e']\n\t),\n)\n\n# Create socket\n# IPv6 should handle IPv4 passively so long as it is not bound to a\n# specific address or set to IPv6_ONLY\n# https://stackoverflow.com/questions/25817848/python-3-does-http-server-support-ipv6\ntry:\n\taddr = ('::', 8080)\n\tsock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)\nexcept socket.error as err:\n\t# errno.EAFNOSUPPORT is \"socket.error: [Errno 97] Address family not supported by protocol\"\n\t# When IPv6 is disabled, socket will bind using IPv4.\n\tif err.errno == errno.EAFNOSUPPORT:\n\t\taddr = ('', 8080)\n\t\tsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n\telse:\n\t\traise \nsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)\nsock.bind(addr)\nsock.listen(5)\n\n[Thread(i, socket=sock) for i in range(100)]\ntime.sleep(9e9)\nEOF\nexec python3 /tmp/serve.py"],"command":["/bin/sh"],"image":"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c18c81505b650222a12eb50d5fdeff4546a01e05d7ae722009a2ef66fa54a07","imagePullPolicy":"IfNotPresent","livenessProbe":{"httpGet":{"path":"/","port":8080,"scheme":"HTTP"}},"name":"download-server","ports":[{"containerPort":8080,"name":"http","protocol":"TCP"}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/","port":8080,"scheme":"HTTP"}},"resources":{"requests":{"cpu":"10m","memory":"50Mi"}},"terminationMessagePolicy":"FallbackToLogsOnError"}],"dnsPolicy":null,"restartPolicy":null,"schedulerName":null,"securityContext":null}}}} I0409 05:57:13.249214 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"c959a033-0d2a-4c6b-9d1f-f313f719f71c", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeploymentUpdated' Updated Deployment.apps/downloads -n openshift-console because it changed I0409 05:57:13.249640 1 route.go:151] route "downloads" ingress 'default' found and admitted, host: downloads-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com I0409 05:57:13.251632 1 operator.go:226] console is in a managed state. I0409 05:57:13.251647 1 sync_v400.go:49] running sync loop 4.0.0 I0409 05:57:13.253425 1 controller.go:238] oc-cli-downloads consoleclidownloads custom resource exists and is in the correct state I0409 05:57:13.261187 1 controller.go:238] odo-cli-downloads consoleclidownloads custom resource exists and is in the correct state I0409 05:57:13.282418 1 route.go:151] route "console" ingress 'default' found and admitted, host: console-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com E0409 05:57:13.290784 1 sync_v400.go:514] failed to get "prometheus-plugin2" plugin: consoleplugin.console.openshift.io "prometheus-plugin2" not found I0409 05:57:13.322490 1 sync_v400.go:377] service-ca configmap exists and is in the correct state I0409 05:57:13.336928 1 sync_v400.go:408] trusted-ca-bundle configmap exists and is in the correct state I0409 05:57:13.336970 1 sync_v400.go:478] no custom logo configured I0409 05:57:13.435253 1 request.go:591] Throttling request took 98.236356ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-console/configmaps/default-ingress-cert I0409 05:57:13.635641 1 request.go:591] Throttling request took 196.582115ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-console/secrets/console-oauth-config I0409 05:57:13.639303 1 route.go:151] route "console" ingress 'default' found and admitted, host: console-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com I0409 05:57:13.687257 1 sync_v400.go:135] ----------------------- I0409 05:57:13.687275 1 sync_v400.go:136] sync loop 4.0.0 resources updated: false I0409 05:57:13.687285 1 sync_v400.go:137] ----------------------- I0409 05:57:13.687299 1 sync_v400.go:166] sync_v400: updating console status I0409 05:57:13.687306 1 route.go:151] route "console" ingress 'default' found and admitted, host: console-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com I0409 05:57:13.687314 1 metrics.go:47] metric console_url host transition: https://console-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com to https://console-openshift-console.apps.yapei48.qe.azure.devcluster.openshift.com I0409 05:57:13.835404 1 request.go:591] Throttling request took 147.995695ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/configmaps/console-public I0409 05:57:13.838633 1 sync_v400.go:183] sync loop 4.0.0 complete I0409 05:57:13.838671 1 operator.go:218] finished syncing operator "cluster" (23.3µs) I0409 05:57:16.799595 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="7.240007ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54160" resp=200 I0409 05:57:17.957826 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:18.275074 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:18.452014 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:20.097500 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="1.958728ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54190" resp=200 I0409 05:57:26.244241 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="15.018822ms" userAgent="Prometheus/2.24.1" srcIP="10.131.0.37:35174" resp=200 I0409 05:57:26.793129 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="516.707µs" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54266" resp=200 I0409 05:57:27.958254 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:28.275695 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:28.452950 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:30.099916 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="4.304364ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54298" resp=200 I0409 05:57:31.095523 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="3.880158ms" userAgent="Prometheus/2.24.1" srcIP="10.128.2.25:54256" resp=200 I0409 05:57:36.800937 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="8.219122ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54372" resp=200 I0409 05:57:37.959423 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:38.276210 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:38.453366 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:40.098124 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="2.273034ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54400" resp=200 I0409 05:57:46.797793 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="4.834372ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54480" resp=200 I0409 05:57:47.961905 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:48.276846 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:48.453834 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:50.102234 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="6.474696ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54516" resp=200 I0409 05:57:56.243922 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="14.954522ms" userAgent="Prometheus/2.24.1" srcIP="10.131.0.37:35174" resp=200 I0409 05:57:56.797847 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="4.847772ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54584" resp=200 I0409 05:57:57.962275 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:58.277971 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:57:58.455054 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:58:00.097874 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="1.931529ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54612" resp=200 I0409 05:58:01.109778 1 httplog.go:89] "HTTP" verb="GET" URI="/metrics" latency="3.764557ms" userAgent="Prometheus/2.24.1" srcIP="10.128.2.25:54256" resp=200 I0409 05:58:06.796767 1 httplog.go:89] "HTTP" verb="GET" URI="/readyz" latency="4.063661ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54684" resp=200 I0409 05:58:07.963448 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:58:07.964909 1 reflector.go:434] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: watch of *v1.ConfigMap closed with: too old resource version: 113453 (113491) I0409 05:58:08.279187 1 reflector.go:530] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: Watch close - *v1.ConfigMap total 0 items received I0409 05:58:08.280922 1 reflector.go:434] k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172: watch of *v1.ConfigMap closed with: too old resource version: 113456 (113492) I0409 05:58:08.455950 1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received I0409 05:58:08.457314 1 reflector.go:434] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: watch of *v1.ConfigMap closed with: too old resource version: 113457 (113492) I0409 05:58:09.293723 1 reflector.go:255] Listing and watching *v1.ConfigMap from k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206 I0409 05:58:09.346901 1 reflector.go:255] Listing and watching *v1.ConfigMap from k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206 I0409 05:58:09.485458 1 reflector.go:255] Listing and watching *v1.ConfigMap from k8s.io/apiserver/pkg/authentication/request/headerrequest/requestheader_controller.go:172 I0409 05:58:10.101434 1 httplog.go:89] "HTTP" verb="GET" URI="/healthz" latency="6.096593ms" userAgent="kube-probe/1.20" srcIP="10.129.0.1:54716" resp=200 I0409 05:58:13.238385 1 controller.go:83] console is in a managed state: syncing service I0409 05:58:13.238949 1 controller.go:101] console is in a managed state: syncing ConsoleCliDownloads custom resources .... I0409 05:58:13.241447 1 operator.go:177] started syncing operator "cluster" (2021-04-09 05:58:13.2414394 +0000 UTC m=+266.741210389) From console-operator pog log, we can see that console operator is syncing every minutes, console route health check is synced every minute in route sync controller in which we print log `console is in a managed state: syncing route` for our case Moving to VERIFIED, this is checked against 4.8.0-0.nightly-2021-04-08-124424
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438