Thank you for the report Liquan. Some initial thoughts: - TuneD should not start without having a valid profile from the API server. This happened. - We need to review the timeout code. - And add a RESYNC to handle connection issues to the API server. The way to test this is fixed would be to make the API server inaccessible to the Nodes for a prolonged period of time.
Verified Result: Scale worker nodes to 7 nodes, then upgrade to latest nightly version, no co in progress after upgrade [ocpadmin@ec2-18-217-45-133 ~]$ oc get profile NAME TUNED APPLIED DEGRADED AGE ip-10-0-134-234.us-east-2.compute.internal openshift-control-plane True False 3h10m ip-10-0-146-194.us-east-2.compute.internal openshift-node True False 3h5m ip-10-0-163-202.us-east-2.compute.internal openshift-node True False 3h7m ip-10-0-177-7.us-east-2.compute.internal openshift-control-plane True False 3h10m ip-10-0-194-38.us-east-2.compute.internal openshift-node True False 3h7m ip-10-0-206-151.us-east-2.compute.internal openshift-node True False 55m ip-10-0-208-186.us-east-2.compute.internal openshift-node True False 55m ip-10-0-213-169.us-east-2.compute.internal openshift-node True False 55m ip-10-0-214-176.us-east-2.compute.internal openshift-node True False 55m ip-10-0-217-112.us-east-2.compute.internal openshift-control-plane True False 3h10m [ocpadmin@ec2-18-217-45-133 ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-08-08-193833 True False False 176m baremetal 4.12.0-0.nightly-2022-08-08-193833 True False False 3h10m cloud-controller-manager 4.12.0-0.nightly-2022-08-08-193833 True False False 3h12m cloud-credential 4.12.0-0.nightly-2022-08-08-193833 True False False 3h12m cluster-autoscaler 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m config-operator 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m console 4.12.0-0.nightly-2022-08-08-193833 True False False 178m csi-snapshot-controller 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m dns 4.12.0-0.nightly-2022-08-08-193833 True False False 3h10m etcd 4.12.0-0.nightly-2022-08-08-193833 True False False 3h8m image-registry 4.12.0-0.nightly-2022-08-08-193833 True False False 3h4m ingress 4.12.0-0.nightly-2022-08-08-193833 True False False 3h5m insights 4.12.0-0.nightly-2022-08-08-193833 True False False 3h5m kube-apiserver 4.12.0-0.nightly-2022-08-08-193833 True False False 3h7m kube-controller-manager 4.12.0-0.nightly-2022-08-08-193833 True False False 3h8m kube-scheduler 4.12.0-0.nightly-2022-08-08-193833 True False False 3h7m kube-storage-version-migrator 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m machine-api 4.12.0-0.nightly-2022-08-08-193833 True False False 3h7m machine-approver 4.12.0-0.nightly-2022-08-08-193833 True False False 3h10m machine-config 4.12.0-0.nightly-2022-08-08-193833 True False False 3h9m marketplace 4.12.0-0.nightly-2022-08-08-193833 True False False 3h10m monitoring 4.12.0-0.nightly-2022-08-08-193833 True False False 3h2m network 4.12.0-0.nightly-2022-08-08-193833 True False False 3h12m node-tuning 4.12.0-0.nightly-2022-08-08-193833 True False False 36m openshift-apiserver 4.12.0-0.nightly-2022-08-08-193833 True False False 3h5m openshift-controller-manager 4.12.0-0.nightly-2022-08-08-193833 True False False 3h7m openshift-samples 4.12.0-0.nightly-2022-08-08-193833 True False False 39m operator-lifecycle-manager 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m operator-lifecycle-manager-catalog 4.12.0-0.nightly-2022-08-08-193833 True False False 3h10m operator-lifecycle-manager-packageserver 4.12.0-0.nightly-2022-08-08-193833 True False False 3h5m service-ca 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m storage 4.12.0-0.nightly-2022-08-08-193833 True False False 3h11m [ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-08-193833 True False 23m Cluster version is 4.12.0-0.nightly-2022-08-08-193833
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399