Bug 2114779 - Node Tuning Operator(NTO) - OCP upgrade failed due to node-tuning CO still progressing
Summary: Node Tuning Operator(NTO) - OCP upgrade failed due to node-tuning CO still pr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.11
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 4.12.0
Assignee: Jiří Mencák
QA Contact: liqcui
URL:
Whiteboard:
Depends On:
Blocks: 2116009
TreeView+ depends on / blocked
 
Reported: 2022-08-03 09:17 UTC by liqcui
Modified: 2023-01-17 19:54 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:54:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 421 0 None open Bug 2114779: openshift-tuned: remember recommended profile 2022-08-04 14:51:37 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:54:35 UTC

Comment 1 Jiří Mencák 2022-08-03 09:35:53 UTC
Thank you for the report Liquan.  Some initial thoughts:
  - TuneD should not start without having a valid profile from the API server.  This happened.
  - We need to review the timeout code.
  - And add a RESYNC to handle connection issues to the API server.

The way to test this is fixed would be to make the API server inaccessible to the Nodes for a prolonged period of time.

Comment 3 liqcui 2022-08-09 09:24:18 UTC
Verified Result:

Scale worker nodes to 7 nodes, then upgrade to latest nightly version, no co in progress after upgrade

[ocpadmin@ec2-18-217-45-133 ~]$ oc get profile
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-134-234.us-east-2.compute.internal   openshift-control-plane   True      False      3h10m
ip-10-0-146-194.us-east-2.compute.internal   openshift-node            True      False      3h5m
ip-10-0-163-202.us-east-2.compute.internal   openshift-node            True      False      3h7m
ip-10-0-177-7.us-east-2.compute.internal     openshift-control-plane   True      False      3h10m
ip-10-0-194-38.us-east-2.compute.internal    openshift-node            True      False      3h7m
ip-10-0-206-151.us-east-2.compute.internal   openshift-node            True      False      55m
ip-10-0-208-186.us-east-2.compute.internal   openshift-node            True      False      55m
ip-10-0-213-169.us-east-2.compute.internal   openshift-node            True      False      55m
ip-10-0-214-176.us-east-2.compute.internal   openshift-node            True      False      55m
ip-10-0-217-112.us-east-2.compute.internal   openshift-control-plane   True      False      3h10m
[ocpadmin@ec2-18-217-45-133 ~]$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-08-08-193833   True        False         False      176m    
baremetal                                  4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h10m   
cloud-controller-manager                   4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h12m   
cloud-credential                           4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h12m   
cluster-autoscaler                         4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
config-operator                            4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
console                                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      178m    
csi-snapshot-controller                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
dns                                        4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h10m   
etcd                                       4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h8m    
image-registry                             4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h4m    
ingress                                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h5m    
insights                                   4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h5m    
kube-apiserver                             4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h7m    
kube-controller-manager                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h8m    
kube-scheduler                             4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h7m    
kube-storage-version-migrator              4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
machine-api                                4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h7m    
machine-approver                           4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h10m   
machine-config                             4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h9m    
marketplace                                4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h10m   
monitoring                                 4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h2m    
network                                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h12m   
node-tuning                                4.12.0-0.nightly-2022-08-08-193833   True        False         False      36m     
openshift-apiserver                        4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h5m    
openshift-controller-manager               4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h7m    
openshift-samples                          4.12.0-0.nightly-2022-08-08-193833   True        False         False      39m     
operator-lifecycle-manager                 4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
operator-lifecycle-manager-catalog         4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h10m   
operator-lifecycle-manager-packageserver   4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h5m    
service-ca                                 4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
storage                                    4.12.0-0.nightly-2022-08-08-193833   True        False         False      3h11m   
[ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-08-08-193833   True        False         23m     Cluster version is 4.12.0-0.nightly-2022-08-08-193833

Comment 6 errata-xmlrpc 2023-01-17 19:54:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.