Description of problem: 4.7 to 4.6 downgrade stuck at openshift-apiserver with various log errors. Blocking the test of https://issues.redhat.com/browse/MSTR-1055 , so adding the TestBlocker keyword. Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2021-01-05-203053 upgrades to 4.7.0-0.nightly-2021-01-06-222035 then downgrades to 4.6.0-0.nightly-2021-01-05-203053 How reproducible: Not sure Steps to Reproduce: 1. Install 4.6.0-0.nightly-2021-01-05-203053 UPI GCP env successfully. Run: oc patch apiserver/cluster -p '{"spec":{"encryption": {"type":"aescbc"}}}' --type merge Wait for 20 mins. Check all pods/nodes/COs, all are well. 2. Upgrade to 4.7.0-0.nightly-2021-01-06-222035 successfully. Check all pods/nodes/COs again, all are still well. 3. Downgrade back to 4.6.0-0.nightly-2021-01-05-203053 Actual results: 3. Downgrade failed. Stuck at below state, some debugging as below: [xxia@pres 2021-01-07 14:52:44 CST my]$ ogpcn # my script that gets abnormal projects/pods/COs openshift-multus Terminating 4h49m openshift-network-diagnostics Terminating 139m openshift-sdn Terminating 4h49m openshift-multus network-metrics-daemon-xm84l 0/2 Terminating 0 138m 10.128.2.3 xxia07story-f4zvs-worker-a-pdxts.c.openshift-qe.internal ... openshift-network-diagnostics network-check-target-77gct 0/1 Terminating 0 139m 10.128.2.2 xxia07story-f4zvs-worker-a-pdxts.c.openshift-qe.internal ... openshift-oauth-apiserver apiserver-5d44b68d87-qt74h 0/1 Terminating 0 104m 10.129.0.37 xxia07story-f4zvs-m-1.c.openshift-qe.internal ... Clusteroperators which are not 4.6.0-0.nightly-2021-01-05-203053 True False False: authentication 4.6.0-0.nightly-2021-01-05-203053 False True True 52m baremetal 4.7.0-0.nightly-2021-01-06-222035 True False False 144m console 4.6.0-0.nightly-2021-01-05-203053 True False True 54m dns 4.7.0-0.nightly-2021-01-06-222035 True False False 4h46m machine-config 4.7.0-0.nightly-2021-01-06-222035 True False False 110m monitoring 4.6.0-0.nightly-2021-01-05-203053 False False True 50m network 4.7.0-0.nightly-2021-01-06-222035 True True True 134m openshift-apiserver 4.6.0-0.nightly-2021-01-05-203053 False False False 52m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2021-01-05-203053 False True False 52m # all nodes are Ready and v1.20.0+b1e9f0d NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME xxia07story-f4zvs-m-0.c.openshift-qe.internal Ready master 4h49m v1.20.0+b1e9f0d 10.0.0.5 <none> Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 .. [xxia@pres 2021-01-07 14:54:16 CST my]$ oc describe co openshift-apiserver ... Message: APIServicesAvailable: "build.openshift.io.v1" is not ready: 503 (the server is currently unable to handle the request) [xxia@pres 2021-01-07 14:55:51 CST my]$ oc get ns openshift-sdn -o yaml |& tee openshift-sdn-ns.yaml ... - lastTransitionTime: "2021-01-07T06:00:19Z" message: 'Discovery failed for some groups, 12 failing: unable to retrieve the complete list of server APIs: apps.openshift.io/v1: the server is currently unable to handle the request, ... reason: DiscoveryFailed status: "True" type: NamespaceDeletionDiscoveryFailure Check kube-apiserver logs, many errors like below: 2021-01-07T06:57:43.586271409Z E0107 06:57:43.586000 18 available_controller.go:490] v1.quota.openshift.io failed with: failing or missing response from https://10.129.0.50:8443/apis/quota.openshift.io/v1: Get "https://10.129.0.50:8443/apis/quota.openshift.io/v1": dial tcp 10.129.0.50:8443: connect: no route to host [xxia@pres 2021-01-07 15:14:52 CST downgrade]$ oloas # my script that gets OAS pods and logs apiserver-697545cc9c-cv64d 2/2 Running 1 81m 10.130.0.38 xxia07story-f4zvs-m-0.c.openshift-qe.internal <none> <none> apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=697545cc9c,revision=5 apiserver-697545cc9c-dhkbc 2/2 Running 1 83m 10.128.0.32 xxia07story-f4zvs-m-2.c.openshift-qe.internal <none> <none> apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=697545cc9c,revision=5 apiserver-697545cc9c-lml9m 2/2 Running 1 82m 10.129.0.50 xxia07story-f4zvs-m-1.c.openshift-qe.internal <none> <none> apiserver=true,app=openshift-apiserver-a,openshift-apiserver-anti-affinity=true,pod-template-hash=697545cc9c,revision=5 Check openshift-apiserver container logs, many errors like below: 2021-01-07T07:15:27.332243189Z E0107 07:15:27.332172 1 cacher.go:416] cacher (*oauth.OAuthAccessToken): unexpected ListAndWatch error: failed to list *oauth.OAuthAccessToken: illegal base64 data at input byte 3; reinitializing... Check `oc get po -n openshift-apiserver -o yaml`, saw "restartCount: 1" exists in openshift-apiserver-check-endpoints container. Check its logs, the last lines are as below: [xxia@pres 2021-01-07 15:18:53 CST downgrade]$ oc logs -p -c openshift-apiserver-check-endpoints -n openshift-apiserver apiserver-697545cc9c-cv64d I0107 05:55:44.320250 1 base_controller.go:113] Shutting down worker of CheckEndpointsTimeToStart controller ... I0107 05:55:44.320297 1 base_controller.go:103] All CheckEndpointsTimeToStart workers have been terminated ... I0107 05:55:44.421010 1 base_controller.go:109] Starting #1 worker of check-endpoints controller ... I0107 06:01:27.841861 1 start_stop_controllers.go:70] The server doesn't have a resource type "podnetworkconnectivitychecks.controlplane.operator.openshift.io". [xxia@pres 2021-01-07 15:19:39 CST downgrade]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-06-222035 True True 106m Unable to apply 4.6.0-0.nightly-2021-01-05-203053: the cluster operator openshift-apiserver has not yet successfully rolled out [xxia@pres 2021-01-07 15:52:36 CST downgrade]$ oc rsh -n openshift-kube-apiserver kube-apiserver-xxia07story-f4zvs-m-0.c.openshift-qe.internal sh-4.4# curl -k https://10.129.0.50:8443 curl: (7) Failed to connect to 10.129.0.50 port 8443: No route to host [xxia@pres 2021-01-07 16:03:37 CST downgrade]$ oc rsh -n openshift-apiserver apiserver-697545cc9c-lml9m sh-4.4# curl -k https://10.129.0.50:8443 ... "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"", "reason": "Forbidden", "code": 403 ... [xxia@pres 2021-01-07 16:22:28 CST downgrade]$ oc get pods -n openshift-sdn No resources found in openshift-sdn namespace. [xxia@pres 2021-01-07 16:23:37 CST downgrade]$ oc describe co network Last Transition Time: 2021-01-07T06:00:09Z Message: Waiting for DaemonSet "openshift-multus/multus" to be created Waiting for DaemonSet "openshift-multus/network-metrics-daemon" to be created Waiting for DaemonSet "openshift-multus/multus-admission-controller" to be created Waiting for DaemonSet "openshift-sdn/sdn-controller" to be created Waiting for DaemonSet "openshift-sdn/ovs" to be created Waiting for DaemonSet "openshift-sdn/sdn" to be created Reason: Deploying Status: True Type: Progressing [xxia@pres 2021-01-07 16:23:43 CST downgrade]$ oc get po -n openshift-network-operator NAME READY STATUS RESTARTS AGE network-operator-674b58cd88-pmtbv 1/1 Running 0 166m [xxia@pres 2021-01-07 16:46:48 CST downgrade]$ oc logs network-operator-674b58cd88-pmtbv -n openshift-network-operator 2021/01/07 05:59:55 Go Version: go1.15.5 ... 2021/01/07 06:00:05 Became the leader. I0107 06:00:06.745945 1 request.go:621] Throttling request took 1.04468026s, request: GET:https://api-int...com:6443/apis/metal3.io/v1alpha1?timeout=32s 2021/01/07 06:00:08 Registering Components. ... 2021/01/07 06:00:08 ConfigMap "openshift-service-ca" not found 2021/01/07 06:00:08 ERROR ConfigMap "openshift-service-ca" not found - Reconciler error ... 2021/01/07 08:35:12 Reconciling update to openshift-multus/multus-admission-controller 2021/01/07 08:35:12 Error getting DaemonSet "openshift-multus/multus": DaemonSet.apps "multus" not found 2021/01/07 08:35:12 Error getting DaemonSet "openshift-multus/network-metrics-daemon": DaemonSet.apps "network-metrics-daemon" not found 2021/01/07 08:35:12 Error getting DaemonSet "openshift-multus/multus-admission-controller": DaemonSet.apps "multus-admission-controller" not found 2021/01/07 08:35:12 Error getting DaemonSet "openshift-sdn/sdn-controller": DaemonSet.apps "sdn-controller" not found 2021/01/07 08:35:12 Error getting DaemonSet "openshift-sdn/ovs": DaemonSet.apps "ovs" not found 2021/01/07 08:35:12 Error getting DaemonSet "openshift-sdn/sdn": DaemonSet.apps "sdn" not found 2021/01/07 08:35:26 Reconciling update for openshift-service-ca from /cluster 2021/01/07 08:35:26 ConfigMap "openshift-service-ca" not found 2021/01/07 08:35:26 ERROR ConfigMap "openshift-service-ca" not found - Reconciler error 2021/01/07 08:40:10 Reconciling update to openshift-multus/multus 2021/01/07 08:40:10 Error getting DaemonSet "openshift-multus/multus": DaemonSet.apps "multus" not found 2021/01/07 08:40:10 Error getting DaemonSet "openshift-multus/network-metrics-daemon": DaemonSet.apps "network-metrics-daemon" not found 2021/01/07 08:40:10 Error getting DaemonSet "openshift-multus/multus-admission-controller": DaemonSet.apps "multus-admission-controller" not found 2021/01/07 08:40:10 Error getting DaemonSet "openshift-sdn/sdn-controller": DaemonSet.apps "sdn-controller" not found 2021/01/07 08:40:10 Error getting DaemonSet "openshift-sdn/ovs": DaemonSet.apps "ovs" not found 2021/01/07 08:40:10 Error getting DaemonSet "openshift-sdn/sdn": DaemonSet.apps "sdn" not found 2021/01/07 08:40:10 Reconciling update to openshift-multus/network-metrics-daemon ... [xxia@pres 2021-01-07 16:48:58 CST downgrade]$ oc get cm -A | grep " openshift-service-ca " openshift-controller-manager openshift-service-ca 1 6h44m Expected results: 3. Downgrade should succeed Additional info: must-gather failed: oc adm must-gather --dest-dir must-gather-xxia07story-130119 error: gather did not start for pod must-gather-7zdw2: timed out waiting for the condition [xxia 2021-01-07 15:24:06 CST my]$ du -sh must-gather-xxia07story-130119 12K must-gather-xxia07story-130119
*** This bug has been marked as a duplicate of bug 1906936 ***