Bug 1998247
| Summary: | Tuned configuration fails and does not recover when profile references a not yet existing performance profile configuration | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> | |
| Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
| Status: | CLOSED ERRATA | QA Contact: | Simon <skordas> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.8 | CC: | aos-bugs, dagray, imiller | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1999608 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 17:49:21 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1999608 | |||
Thank you for the report. Could you please provide either must-gather, or the output of: $ oc get profile -n openshift-cluster-node-tuning-operator and the logs from the Tuned container on the node that fail to apply the profile? No need for must-gather or the output I asked for. Have a minimal reproducer for NTO. $ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-09-01-193941 True False 3h45m Cluster version is 4.9.0-0.nightly-2021-09-01-193941
$ node=$(oc get nodes | grep -m 1 worker | cut -f 1 -d ' ') && echo $node
pod=$(oc get pods -n openshift-cluster-node-tuning-operator -o wide | grep $node | cut -d ' ' -f 1) && echo $pod
ip-10-0-136-123.us-east-2.compute.internal
tuned-xsxrv
$ oc get routes -n openshift-console
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
console console-openshift-console.apps.skordas92b.qe.devcluster.openshift.com console https reencrypt/Redirect None
downloads downloads-openshift-console.apps.skordas92b.qe.devcluster.openshift.com downloads http edge/Redirect None
# Log in into console
# Install Performance Addon Operator
# Operators -> Operator Hub -> Performance Addon Operator -> Install
$ oc get pods -n openshift-operators
NAME READY STATUS RESTARTS AGE
performance-operator-7fc5bcb7c9-4m67g 1/1 Running 0 91s
# Create tuned
oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: performance-patch
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Configuration changes profile inherited from performance created tuned
include=openshift-node-performance-profile
[bootloader]
cmdline_crash=nohz_full=2-23,26-47
[sysctl]
kernel.timer_migration=1
[service]
service.stalld=start,enable
name: performance-patch
recommend:
- machineConfigLabels:
machineconfiguration.openshift.io/role: master
priority: 19
profile: performance-patch
EOF
$ oc get tuned -n openshift-cluster-node-tuning-operator
NAME AGE
default 4h31m
performance-patch 14s
rendered 4h31m
$ oc get profiles -n openshift-cluster-node-tuning-operator
NAME TUNED APPLIED DEGRADED AGE
ip-10-0-136-123.us-east-2.compute.internal openshift-node True False 4h24m
ip-10-0-147-0.us-east-2.compute.internal performance-patch False True 4h31m
ip-10-0-161-12.us-east-2.compute.internal performance-patch False True 4h31m
ip-10-0-178-33.us-east-2.compute.internal openshift-node True False 4h24m
ip-10-0-199-56.us-east-2.compute.internal performance-patch False True 4h31m
ip-10-0-204-47.us-east-2.compute.internal openshift-node True False 4h24m
# create Performance profile
oc create -f- <<EOF
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
finalizers:
- foreground-deletion
name: openshift-node-performance-profile
spec:
additionalKernelArgs:
- idle=poll
cpu:
isolated: 2-23,26-47
reserved: 0-1,24-25
globallyDisableIrqLoadBalancing: true
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 32
size: 1G
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
node-role.kubernetes.io/master: ""
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: false
EOF
$ oc get performanceprofiles.performance.openshift.io -n openshift-operators -o yaml
apiVersion: v1
items:
- apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
creationTimestamp: "2021-09-02T15:54:19Z"
finalizers:
- foreground-deletion
generation: 1
name: openshift-node-performance-profile
resourceVersion: "105104"
uid: a227e6c2-8480-49c9-b7d6-619292d2f8eb
spec:
additionalKernelArgs:
- idle=poll
cpu:
isolated: 2-23,26-47
reserved: 0-1,24-25
globallyDisableIrqLoadBalancing: true
hugepages:
defaultHugepagesSize: 1G
pages:
- count: 32
size: 1G
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
node-role.kubernetes.io/master: ""
numa:
topologyPolicy: restricted
realTimeKernel:
enabled: false
status:
conditions:
- lastHeartbeatTime: "2021-09-02T15:54:20Z"
lastTransitionTime: "2021-09-02T15:54:20Z"
status: "True"
type: Available
- lastHeartbeatTime: "2021-09-02T15:54:20Z"
lastTransitionTime: "2021-09-02T15:54:20Z"
status: "True"
type: Upgradeable
- lastHeartbeatTime: "2021-09-02T15:54:20Z"
lastTransitionTime: "2021-09-02T15:54:20Z"
status: "False"
type: Progressing
- lastHeartbeatTime: "2021-09-02T15:54:20Z"
lastTransitionTime: "2021-09-02T15:54:20Z"
status: "False"
type: Degraded
runtimeClass: performance-openshift-node-performance-profile
tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile
kind: List
metadata:
resourceVersion: ""
selfLink: ""
No errors after applying performance after tuned
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |
Description of problem: The following Tuned profile created at 2021-08-26T15:16:49Z includes a configuration(include=openshift-node-performance-profile) which would be created by a PerformanceProfile at a later time 2021-08-26T15:25:04Z. After creating the PerformanceProfile the Tuned configuration still doesn't get applied and the performance profile reports a TunedError. I'd expect that once the performance profile gets created the performance-patch Tuned profile which includes it can continue its configuration. apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: creationTimestamp: "2021-08-26T15:16:49Z" generation: 1 name: performance-patch namespace: openshift-cluster-node-tuning-operator resourceVersion: "25666" uid: 99e9a0ec-d9dc-4f7e-a515-6ae5b2b2047b spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-profile [bootloader] cmdline_crash=nohz_full=2-23,26-47 [sysctl] kernel.timer_migration=1 [service] service.stalld=start,enable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: master priority: 19 profile: performance-patch apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2021-08-26T15:25:04Z" finalizers: - foreground-deletion generation: 1 name: openshift-node-performance-profile resourceVersion: "36276" uid: bf81e817-6347-4393-afff-6ee1850e09e8 spec: additionalKernelArgs: - idle=poll cpu: isolated: 2-23,26-47 reserved: 0-1,24-25 globallyDisableIrqLoadBalancing: true hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: false status: conditions: - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Available - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Upgradeable - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" status: "False" type: Progressing - lastHeartbeatTime: "2021-08-26T15:43:21Z" lastTransitionTime: "2021-08-26T15:43:21Z" message: | Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Reason: TunedError. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Message: Tuned daemon issued one or more error message(s) during profile application.. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Reason: TunedError. Tuned sno.kni-qe-1.lab.eng.rdu2.redhat.com Degraded Message: Tuned daemon issued one or more error message(s) during profile application.. reason: TunedProfileDegraded status: "True" type: Degraded runtimeClass: performance-openshift-node-performance-profile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile Version-Release number of selected component (if applicable): 4.8.5 How reproducible: 100% Steps to Reproduce: 1. Create a Tuned profile which includes configuration set by a performance profile which does not yet exist 2. Create the performance profile at a later time than step 1 Actual results: Performance profile reports Tuned errors Expected results: Tuned configuration retries and succeeds once the performance profile is created Additional info: This issue has been observed while testing the DU ZTP flow where the profiles get created by ACM policies and there is no ordering in which resource gets created first.