Bug 2102109 - co/node-tuning: Waiting for 15/72 Profiles to be applied
Summary: co/node-tuning: Waiting for 15/72 Profiles to be applied
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.12.0
Assignee: Jiří Mencák
QA Contact: liqcui
URL:
Whiteboard:
Depends On:
Blocks: 2105106
TreeView+ depends on / blocked
 
Reported: 2022-06-29 10:51 UTC by Hongkai Liu
Modified: 2023-01-17 19:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 381 0 None open Bug 2102109: Remove stale Profiles. 2022-07-01 15:08:37 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:51:16 UTC

Description Hongkai Liu 2022-06-29 10:51:15 UTC
Description of problem:
co/node-tuning seems stuck with PROGRESSING=true.

oc --context build02 get co node-tuning
NAME          VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
node-tuning   4.11.0-fc.3   True        True          False      2m      Waiting for 15/72 Profiles to be applied

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-fc.3   True        False         5h30m   Cluster version is 4.11.0-fc.3


oc --context build02 get profile -n openshift-cluster-node-tuning-operator | grep Unknown | wc -l
      15

Collecting logs ...

oc --context build02 adm inspect namespace/openshift-cluster-node-tuning-operator
Gathering data for ns/openshift-cluster-node-tuning-operator...
Wrote inspect data to inspect.local.7527134849971413775.

oc --context build02 get profile -n openshift-cluster-node-tuning-operator -o yaml > profile.yaml

Every "Unknown" profile is for a node that no long exists in the cluster.
E.g.,

oc --context build02 get profile -n openshift-cluster-node-tuning-operator build0-gstfj-w-c-mgpxt.c.openshift-ci-build-farm.internal
NAME                                                        TUNED            APPLIED   DEGRADED   AGE
build0-gstfj-w-c-mgpxt.c.openshift-ci-build-farm.internal   openshift-node   Unknown   Unknown    147d

oc --context build02 get node build0-gstfj-w-c-mgpxt.c.openshift-ci-build-farm.internal
Error from server (NotFound): nodes "build0-gstfj-w-c-mgpxt.c.openshift-ci-build-farm.internal" not found


There are "APPLIED" profiles for deleted nodes too. E.g.,

oc --context build02 get profile -n openshift-cluster-node-tuning-operator build0-gstfj-ci-tests-worker-d-v5knp
NAME                                   TUNED            APPLIED   DEGRADED   AGE
build0-gstfj-ci-tests-worker-d-v5knp   openshift-node   True      True       19h

oc --context build02 get node build0-gstfj-ci-tests-worker-d-v5knp
Error from server (NotFound): nodes "build0-gstfj-ci-tests-worker-d-v5knp" not found

Is it expected that profiles are still being process for those deleted node?


There are also degraded profiles:

oc --context build02 get profile -n openshift-cluster-node-tuning-operator build0-gstfj-ci-builds-worker-b-2klr7 -o yaml
apiVersion: tuned.openshift.io/v1
kind: Profile
metadata:
  annotations:
    tuned.openshift.io/generated-by-operand-version: 4.11.0-fc.3
  creationTimestamp: "2022-06-29T10:25:54Z"
  generation: 3
  name: build0-gstfj-ci-builds-worker-b-2klr7
  namespace: openshift-cluster-node-tuning-operator
  ownerReferences:
  - apiVersion: tuned.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Tuned
    name: default
    uid: 911afa44-9f73-4930-ae17-8f4b60228a5d
  resourceVersion: "2810822439"
  uid: 4f3a6528-4612-4baa-9dba-a7723d73ce0c
spec:
  config:
    debug: false
    providerName: gce
    tunedConfig: {}
    tunedProfile: openshift-node
status:
  bootcmdline: ""
  conditions:
  - lastTransitionTime: "2022-06-29T10:26:15Z"
    message: TuneD profile applied.
    reason: AsExpected
    status: "True"
    type: Applied
  - lastTransitionTime: "2022-06-29T10:26:15Z"
    message: 'TuneD daemon issued one or more error message(s) during profile application.
      TuneD stderr:  WARNING  tuned.plugins.plugin_cpu: your CPU doesn''t support
      MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias'
    reason: TunedError
    status: "True"
    type: Degraded
  tunedProfile: openshift-node

Should we worry about them?

Comment 9 liqcui 2022-07-12 16:08:59 UTC
$ oc get profile
NAME                                                 TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-128-11.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-128-170.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-129-206.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-130-14.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-130-87.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-132-208.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-134-212.us-east-2.compute.internal           openshift-realtime        False     True       88m
ip-10-0-135-207.us-east-2.compute.internal           openshift-realtime        False     True       45m
ip-10-0-141-48.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-143-100.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-145-53.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-147-88.us-east-2.compute.internal            openshift-control-plane   True      False      92m
ip-10-0-148-143.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-148-146.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-152-208.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-153-212.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-154-165.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-156-19.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-157-200.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-157-236.us-east-2.compute.internal           openshift-realtime        False     True       46m
ip-10-0-159-89.us-east-2.compute.internal            openshift-realtime        False     True       46m
ip-10-0-161-225.us-east-2.compute.internal           openshift-control-plane   True      False      92m
ip-10-0-162-252.us-east-2.compute.internal           openshift-realtime        False     True       88m
ip-10-0-211-176.us-east-2.compute.internal           openshift-control-plane   True      False      92m
ip-10-0-223-81.us-east-2.compute.internal            openshift-realtime        False     True       88m
worker-does-not-exist-openshift-aws-devel.internal   openshift-node                                 3m57s
[ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-fc.3   True        False         68m     Cluster version is 4.11.0-fc.3

After upgrade:
 oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-07-11-054352   True        False         False      152m    
baremetal                                  4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
cloud-controller-manager                   4.12.0-0.nightly-2022-07-11-054352   True        False         False      3h3m    
cloud-credential                           4.12.0-0.nightly-2022-07-11-054352   True        False         False      3h3m    
cluster-autoscaler                         4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
config-operator                            4.12.0-0.nightly-2022-07-11-054352   True        False         False      177m    
console                                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      160m    
csi-snapshot-controller                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      172m    
dns                                        4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
etcd                                       4.12.0-0.nightly-2022-07-11-054352   True        False         False      174m    
image-registry                             4.12.0-0.nightly-2022-07-11-054352   True        False         False      162m    
ingress                                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      162m    
insights                                   4.12.0-0.nightly-2022-07-11-054352   True        False         False      174m    
kube-apiserver                             4.12.0-0.nightly-2022-07-11-054352   True        False         False      171m    
kube-controller-manager                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      171m    
kube-scheduler                             4.12.0-0.nightly-2022-07-11-054352   True        False         False      171m    
kube-storage-version-migrator              4.12.0-0.nightly-2022-07-11-054352   True        False         False      176m    
machine-api                                4.12.0-0.nightly-2022-07-11-054352   True        False         False      170m    
machine-approver                           4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
machine-config                             4.12.0-0.nightly-2022-07-11-054352   True        False         False      171m    
marketplace                                4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
monitoring                                 4.12.0-0.nightly-2022-07-11-054352   True        False         False      159m    
network                                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      3h2m    
node-tuning                                4.12.0-0.nightly-2022-07-11-054352   True        False         False      5m8s    
openshift-apiserver                        4.12.0-0.nightly-2022-07-11-054352   True        False         False      170m    
openshift-controller-manager               4.12.0-0.nightly-2022-07-11-054352   True        False         False      170m    
openshift-samples                          4.12.0-0.nightly-2022-07-11-054352   True        False         False      51m     
operator-lifecycle-manager                 4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
operator-lifecycle-manager-catalog         4.12.0-0.nightly-2022-07-11-054352   True        False         False      173m    
operator-lifecycle-manager-packageserver   4.12.0-0.nightly-2022-07-11-054352   True        False         False      170m    
service-ca                                 4.12.0-0.nightly-2022-07-11-054352   True        False         False      176m    
storage                                    4.12.0-0.nightly-2022-07-11-054352   True        False         False      170m    
[ocpadmin@ec2-18-217-45-133 ~]$ oc get profile
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-128-11.us-east-2.compute.internal    openshift-realtime        False     True       10m
ip-10-0-128-170.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-129-206.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-130-14.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-130-87.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-132-208.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-134-212.us-east-2.compute.internal   openshift-realtime        False     True       169m
ip-10-0-135-207.us-east-2.compute.internal   openshift-realtime        False     True       126m
ip-10-0-141-48.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-143-100.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-145-53.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-147-88.us-east-2.compute.internal    openshift-control-plane   True      False      173m
ip-10-0-148-143.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-148-146.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-152-208.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-153-212.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-154-165.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-156-19.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-157-200.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-157-236.us-east-2.compute.internal   openshift-realtime        False     True       127m
ip-10-0-159-89.us-east-2.compute.internal    openshift-realtime        False     True       127m
ip-10-0-161-225.us-east-2.compute.internal   openshift-control-plane   True      False      173m
ip-10-0-162-252.us-east-2.compute.internal   openshift-realtime        False     True       169m
ip-10-0-211-176.us-east-2.compute.internal   openshift-control-plane   True      False      173m
ip-10-0-223-81.us-east-2.compute.internal    openshift-realtime        False     True       169m
[ocpadmin@ec2-18-217-45-133 ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-07-11-054352   True        False         2m33s   Cluster version is 4.12.0-0.nightly-2022-07-11-054352

The node's profile that doesn't exist has been removed after upgrade to OCP4.12 nightly version.

Comment 12 errata-xmlrpc 2023-01-17 19:50:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.