Description of problem: The following Tuned profile created at 2021-08-26T15:16:49Z includes a configuration(include=openshift-node-performance-profile) which would be created by a PerformanceProfile at a later time 2021-08-26T15:25:04Z. After creating the PerformanceProfile the Tuned configuration still doesn't get applied and the performance profile reports a TunedError. I'd expect that once the performance profile gets created the performance-patch Tuned profile which includes it can continue its configuration. apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: creationTimestamp: "2021-08-26T15:16:49Z" generation: 1 name: performance-patch namespace: openshift-cluster-node-tuning-operator resourceVersion: "25666" uid: 99e9a0ec-d9dc-4f7e-a515-6ae5b2b2047b spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-profile [bootloader] cmdline_crash=nohz_full=2-23,26-47 [sysctl] kernel.timer_migration=1 [service] service.stalld=start,enable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: master priority: 19 profile: performance-patch apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2021-08-26T15:25:04Z" finalizers: - foreground-deletion generation: 1 name: openshift-node-performance-profile resourceVersion: "36276" uid: bf81e817-6347-4393-afff-6ee1850e09e8 spec: additionalKernelArgs: - idle=poll cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: false status: conditions: - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Available - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Upgradeable - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Progressing - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" message: | Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Reason: TunedError. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Message: Tuned daemon issued one or more error message(s) during profile application.. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Reason: TunedError. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Message: Tuned daemon issued one or more error message(s) during profile application.. reason: TunedProfileDegraded status: "True" type: Degraded runtimeClass: performance-openshift-node-performance-profile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile Version-Release number of selected component (if applicable): 4.8.5 How reproducible: 100% Steps to Reproduce: 1. Create a Tuned profile which includes configuration set by a performance profile which does not yet exist 2. Create the performance profile at a later time than step 1 Actual results: Performance profile reports Tuned errors Expected results: Tuned configuration retries and succeeds once the performance profile is created Additional info: This issue has been observed while testing the DU ZTP flow where the profiles get created by ACM policies and there is no ordering in which resource gets created first.
Thank you for the report. Could you please provide either must-gather, or the output of: $ oc get profile -n openshift-cluster-node-tuning-operator and the logs from the Tuned container on the node that fail to apply the profile?
No need for must-gather or the output I asked for. Have a minimal reproducer for NTO.
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-01-193941 True False 3h45m Cluster version is 4.9.0-0.nightly-2021-09-01-193941 $ node=$(oc get nodes | grep -m 1 worker | cut -f 1 -d ' ') && echo $node pod=$(oc get pods -n openshift-cluster-node-tuning-operator -o wide | grep $node | cut -d ' ' -f 1) && echo $pod ip-10-0-136-123.us-east-2.compute.internal tuned-xsxrv $ oc get routes -n openshift-console NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD console console-openshift-console.apps.skordas92b.qe.devcluster.openshift.com console https reencrypt/Redirect None downloads downloads-openshift-console.apps.skordas92b.qe.devcluster.openshift.com downloads http edge/Redirect None # Log in into console # Install Performance Addon Operator # Operators -> Operator Hub -> Performance Addon Operator -> Install $ oc get pods -n openshift-operators NAME READY STATUS RESTARTS AGE performance-operator-7fc5bcb7c9-4m67g 1/1 Running 0 91s # Create tuned oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-profile [bootloader] cmdline_crash=nohz_full=2-23,26-47 [sysctl] kernel.timer_migration=1 [service] service.stalld=start,enable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: master priority: 19 profile: performance-patch EOF $ oc get tuned -n openshift-cluster-node-tuning-operator NAME AGE default 4h31m performance-patch 14s rendered 4h31m $ oc get profiles -n openshift-cluster-node-tuning-operator NAME TUNED APPLIED DEGRADED AGE ip-10-0-136-123.us-east-2.compute.internal openshift-node True False 4h24m ip-10-0-147-0.us-east-2.compute.internal performance-patch False True 4h31m ip-10-0-161-12.us-east-2.compute.internal performance-patch False True 4h31m ip-10-0-178-33.us-east-2.compute.internal openshift-node True False 4h24m ip-10-0-199-56.us-east-2.compute.internal performance-patch False True 4h31m ip-10-0-204-47.us-east-2.compute.internal openshift-node True False 4h24m # create Performance profile oc create -f- <<EOF apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: finalizers: - foreground-deletion name: openshift-node-performance-profile spec: additionalKernelArgs: - idle=poll cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: false EOF $ oc get performanceprofiles.performance.openshift.io -n openshift-operators -o yaml apiVersion: v1 items: - apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2021-09-02T15:54:19Z" finalizers: - foreground-deletion generation: 1 name: openshift-node-performance-profile resourceVersion: "105104" uid: a227e6c2-8480-49c9-b7d6-619292d2f8eb spec: additionalKernelArgs: - idle=poll cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: false status: conditions: - lastHeartbeatTime: "2021-09-02T15:54:20Z" lastTransitionTime: "2021-09-02T15:54:20Z" status: "True" type: Available - lastHeartbeatTime: "2021-09-02T15:54:20Z" lastTransitionTime: "2021-09-02T15:54:20Z" status: "True" type: Upgradeable - lastHeartbeatTime: "2021-09-02T15:54:20Z" lastTransitionTime: "2021-09-02T15:54:20Z" status: "False" type: Progressing - lastHeartbeatTime: "2021-09-02T15:54:20Z" lastTransitionTime: "2021-09-02T15:54:20Z" status: "False" type: Degraded runtimeClass: performance-openshift-node-performance-profile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile kind: List metadata: resourceVersion: "" selfLink: "" No errors after applying performance after tuned
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759