Description of problem: Removing 'worker-cnf' label from the worker node should remove RT Kernel from the worker node thereby reverting it to its previous state. Version-Release number of selected component (if applicable): v4.4.0-77 How reproducible: Not consistently reproducible. But we hit this issue twice during our nightly test execution. Steps to Reproduce: 1. Install OCP 4.4.7 2. Label worker node with 'worker-cnf'. 3. Deploy Performance Addon Operator and Performance Profile with 'realTimeKernel: {enabled: true}' 3. Ensure that RT Kernel is installed on 'worker-cnf' labeled worker node. 4. Remove 'worker-cnf' label. 5. Verify if RT Kernel is reverted back to non-RT Kernel. Actual results: RT Kernel is still installed. Expected results: RT Kernel is reverted back to non-RT Kernel Additional info:
*Note: The following information is from a _similar_ cluster where the issue could not be reproduced.* # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-0 Ready master 158m v1.17.1+3f6f40d 192.168.111.20 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 master-1 Ready master 158m v1.17.1+3f6f40d 192.168.111.21 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 master-2 Ready master 158m v1.17.1+3f6f40d 192.168.111.22 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 worker-0 Ready worker,worker-cnf 137m v1.17.1+3f6f40d 192.168.111.23 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.rt24.101.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 worker-1 Ready worker 138m v1.17.1+3f6f40d 192.168.111.24 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 worker-2 Ready worker 138m v1.17.1+3f6f40d 192.168.111.25 <none> Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 # oc describe performanceprofiles.performance.openshift.io performance Name: performance Namespace: Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"performance.openshift.io/v1alpha1","kind":"PerformanceProfile","metadata":{"annotations":{},"name":"performance"},"spec":{"... API Version: performance.openshift.io/v1alpha1 Kind: PerformanceProfile Metadata: Creation Timestamp: 2020-06-09T09:36:40Z Finalizers: foreground-deletion Generation: 7 Resource Version: 91146 Self Link: /apis/performance.openshift.io/v1alpha1/performanceprofiles/performance UID: a2bc52cc-9b13-4904-a47e-9f7a592daf5b Spec: Cpu: Isolated: 1-3 Reserved: 0 Hugepages: Default Hugepages Size: 1G Pages: Count: 1 Size: 1G Node Selector: node-role.kubernetes.io/worker-cnf: Real Time Kernel: Enabled: true Status: Conditions: Last Heartbeat Time: 2020-06-09T10:08:01Z Last Transition Time: 2020-06-09T10:08:01Z Status: True Type: Available Last Heartbeat Time: 2020-06-09T10:08:01Z Last Transition Time: 2020-06-09T10:08:01Z Status: True Type: Upgradeable Last Heartbeat Time: 2020-06-09T10:08:01Z Last Transition Time: 2020-06-09T10:08:01Z Status: False Type: Progressing Last Heartbeat Time: 2020-06-09T10:08:01Z Last Transition Time: 2020-06-09T10:08:01Z Status: False Type: Degraded Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Creation succeeded 87m performance-profile-controller Succeeded to create all components Warning Creation failed 56m performance-profile-controller Failed to create all components: Operation cannot be fulfilled on machineconfigs.machineconfiguration.openshift.io "performance-performance": the object has been modified; please apply your changes to the latest version and try again Normal Creation succeeded 10m (x5 over 69m) performance-profile-controller Succeeded to create all components # oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-0112f787eaed5267898b3988aec444eb True False False 3 3 3 0 150m worker rendered-worker-2e3a4df0240eb8745b2845521a1f28f8 True False False 2 2 2 0 150m worker-cnf rendered-worker-cnf-5c311bbe9c9f762497d1753a8fb27536 True False False 1 1 1 0 97m # oc describe mcp worker-cnf Name: worker-cnf Namespace: Labels: machineconfiguration.openshift.io/role=worker-cnf Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfigPool","metadata":{"annotations":{},"labels":{"machineconfigurati... API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool Metadata: Creation Timestamp: 2020-06-09T09:29:14Z Generation: 6 Resource Version: 79333 Self Link: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker-cnf UID: 68b55a8e-1d12-4e8f-b57d-86cea701f80d Spec: Configuration: Name: rendered-worker-cnf-5c311bbe9c9f762497d1753a8fb27536 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker-chronyd-custom API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 98-worker-92ef12b3-6b66-4348-b91f-f49b08f49348-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 98-worker-cnf-68b55a8e-1d12-4e8f-b57d-86cea701f80d-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-92ef12b3-6b66-4348-b91f-f49b08f49348-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-cnf-68b55a8e-1d12-4e8f-b57d-86cea701f80d-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: performance-performance Machine Config Selector: Match Expressions: Key: machineconfiguration.openshift.io/role Operator: In Values: worker-cnf worker Node Selector: Match Labels: node-role.kubernetes.io/worker-cnf: Paused: false # oc describe mc performance-performance Name: performance-performance Namespace: Labels: machineconfiguration.openshift.io/role=worker-cnf Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Metadata: Creation Timestamp: 2020-06-09T09:36:49Z Generation: 4 Owner References: API Version: performance.openshift.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: performance UID: a2bc52cc-9b13-4904-a47e-9f7a592daf5b Resource Version: 91150 Self Link: /apis/machineconfiguration.openshift.io/v1/machineconfigs/performance-performance UID: 841ae0a3-4762-4602-900f-10e988fcb14e Spec: Config: Ignition: Config: Security: Tls: Timeouts: Version: 2.2.0 Networkd: Passwd: Storage: Files: Contents: Source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1ldW8gcGlwZWZhaWwKClNZU1RFTV9DT05GSUdfRklMRT0iL2V0Yy9zeXN0ZW1kL3N5c3RlbS5jb25mIgpTWVNURU1fQ09ORklHX0NVU1RPTV9GSUxFPSIvZXRjL3N5c3RlbWQvc3lzdGVtLmNvbmYuZC9zZXRBZmZpbml0eS5jb25mIgoKaWYgWyAtZiAvZXRjL3N5c2NvbmZpZy9pcnFiYWxhbmNlIF0gJiYgWyAtZiAke1NZU1RFTV9DT05GSUdfQ1VTVE9NX0ZJTEV9IF0gICYmIHJwbS1vc3RyZWUgc3RhdHVzIC1iIHwgZ3JlcCAtcSAtZSAiJHtTWVNURU1fQ09ORklHX0ZJTEV9ICR7U1lTVEVNX0NPTkZJR19DVVNUT01fRklMRX0iICYmIGVncmVwIC13cSAiXklSUUJBTEFOQ0VfQkFOTkVEX0NQVVM9JHtSRVNFUlZFRF9DUFVfTUFTS19JTlZFUlR9IiAvZXRjL3N5c2NvbmZpZy9pcnFiYWxhbmNlOyB0aGVuCiAgICBlY2hvICJQcmUgYm9vdCB0dW5pbmcgY29uZmlndXJhdGlvbiBhbHJlYWR5IGFwcGxpZWQiCmVsc2UKICAgICNTZXQgSVJRIGJhbGFuY2UgYmFubmVkIGNwdXMKICAgIGlmIFsgISAtZiAvZXRjL3N5c2NvbmZpZy9pcnFiYWxhbmNlIF07IHRoZW4KICAgICAgICB0b3VjaCAvZXRjL3N5c2NvbmZpZy9pcnFiYWxhbmNlCiAgICBmaQoKICAgIGlmIGdyZXAgLWxzICJJUlFCQUxBTkNFX0JBTk5FRF9DUFVTPSIgL2V0Yy9zeXNjb25maWcvaXJxYmFsYW5jZTsgdGhlbgogICAgICAgIHNlZCAtaSAicy9eLipJUlFCQUxBTkNFX0JBTk5FRF9DUFVTPS4qJC9JUlFCQUxBTkNFX0JBTk5FRF9DUFVTPSR7UkVTRVJWRURfQ1BVX01BU0tfSU5WRVJUfS8iIC9ldGMvc3lzY29uZmlnL2lycWJhbGFuY2UKICAgIGVsc2UKICAgICAgICBlY2hvICJJUlFCQUxBTkNFX0JBTk5FRF9DUFVTPSR7UkVTRVJWRURfQ1BVX01BU0tfSU5WRVJUfSIgPj4vZXRjL3N5c2NvbmZpZy9pcnFiYWxhbmNlCiAgICBmaQoKICAgIHJwbS1vc3RyZWUgaW5pdHJhbWZzIC0tZW5hYmxlIC0tYXJnPS1JIC0tYXJnPSIke1NZU1RFTV9DT05GSUdfRklMRX0gJHtTWVNURU1fQ09ORklHX0NVU1RPTV9GSUxFfSIgCgogICAgdG91Y2ggL3Zhci9yZWJvb3QKZmkK Verification: Filesystem: root Mode: 448 Path: /usr/local/bin/pre-boot-tuning.sh Contents: Source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1ldW8gcGlwZWZhaWwKCm5vZGVzX3BhdGg9Ii9zeXMvZGV2aWNlcy9zeXN0ZW0vbm9kZSIKaHVnZXBhZ2VzX2ZpbGU9IiR7bm9kZXNfcGF0aH0vbm9kZSR7TlVNQV9OT0RFfS9odWdlcGFnZXMvaHVnZXBhZ2VzLSR7SFVHRVBBR0VTX1NJWkV9a0IvbnJfaHVnZXBhZ2VzIgoKaWYgWyAhIC1mICAke2h1Z2VwYWdlc19maWxlfSBdOyB0aGVuCiAgICBlY2hvICJFUlJPUjogJHtodWdlcGFnZXNfZmlsZX0gZG9lcyBub3QgZXhpc3QiCiAgICBleGl0IDEKZmkKCmVjaG8gJHtIVUdFUEFHRVNfQ09VTlR9ID4gJHtodWdlcGFnZXNfZmlsZX0K Verification: Filesystem: root Mode: 448 Path: /usr/local/bin/hugepages-allocation.sh Contents: Source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKc2V0IC1ldW8gcGlwZWZhaWwKCmlmIFtbIC1mIC92YXIvcmVib290IF1dOyB0aGVuIAogICAgcm0gLWYgL3Zhci9yZWJvb3QKICAgIGVjaG8gIkZpbGUgL3Zhci9yZWJvb3QgZXhpc3RzLCBpbml0aWF0ZSByZWJvb3QiCiAgICBzeXN0ZW1jdGwgcmVib290CmZpCg== Verification: Filesystem: root Mode: 448 Path: /usr/local/bin/reboot.sh Contents: Source: data:text/plain;charset=utf-8;base64,W01hbmFnZXJdCkNQVUFmZmluaXR5PTA= Verification: Filesystem: root Mode: 448 Path: /etc/systemd/system.conf.d/setAffinity.conf Systemd: Units: Contents: [Unit] Description=Preboot tuning patch Before=kubelet.service Before=reboot.service [Service] Environment=RESERVED_CPUS=0 Environment=RESERVED_CPU_MASK_INVERT=ffffffff,fffffffe Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/pre-boot-tuning.sh [Install] WantedBy=multi-user.target Enabled: true Name: pre-boot-tuning.service Contents: [Unit] Description=Reboot initiated by pre-boot-tuning Wants=network-online.target After=network-online.target Before=kubelet.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/reboot.sh [Install] WantedBy=multi-user.target Enabled: true Name: reboot.service Fips: false Kernel Arguments: nohz=on nosoftlockup skew_tick=1 intel_pstate=disable intel_iommu=on iommu=pt rcu_nocbs=1-3 tuned.non_isolcpus=00000001 default_hugepagesz=1G hugepagesz=1G hugepages=1 Kernel Type: realtime Os Image URL: Events: <none> # oc get node worker-1 -o yaml apiVersion: v1 kind: Node metadata: annotations: machine.openshift.io/machine: openshift-machine-api/ostest-worker-0-f78ck machineconfiguration.openshift.io/currentConfig: rendered-worker-2e3a4df0240eb8745b2845521a1f28f8 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2e3a4df0240eb8745b2845521a1f28f8 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2020-06-09T08:56:29Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: worker-1 kubernetes.io/os: linux node-role.kubernetes.io/worker: "" node.openshift.io/os_id: rhcos ptp/slave: "" name: worker-1 resourceVersion: "97743" selfLink: /api/v1/nodes/worker-1 uid: 4560f482-ce60-4aa2-8237-5864c38727da spec: {} status: addresses: - address: 192.168.111.24 type: InternalIP - address: worker-1 type: Hostname allocatable: cpu: 3500m ephemeral-storage: "17683605064" hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 6995204Ki pods: "250" capacity: cpu: "4" ephemeral-storage: 19876Mi hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 8146180Ki pods: "250" conditions: - lastHeartbeatTime: "2020-06-09T11:09:13Z" lastTransitionTime: "2020-06-09T10:49:13Z" message: kubelet has sufficient memory available reason: KubeletHasSufficientMemory status: "False" type: MemoryPressure - lastHeartbeatTime: "2020-06-09T11:09:13Z" lastTransitionTime: "2020-06-09T10:49:13Z" message: kubelet has no disk pressure reason: KubeletHasNoDiskPressure status: "False" type: DiskPressure - lastHeartbeatTime: "2020-06-09T11:09:13Z" lastTransitionTime: "2020-06-09T10:49:13Z" message: kubelet has sufficient PID available reason: KubeletHasSufficientPID status: "False" type: PIDPressure - lastHeartbeatTime: "2020-06-09T11:09:13Z" lastTransitionTime: "2020-06-09T10:49:13Z" message: kubelet is posting ready status reason: KubeletReady status: "True" type: Ready daemonEndpoints: kubeletEndpoint: Port: 10250 images: - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec5c9406f29ac98580228688db7849590f949e01df39952327adfb261197c16c - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 773373898 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d5441ced3440395aa27b2d6ceec3315acf55f2dccc19d76a2f0e704d30b77cc0 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 467404338 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a16b0dab9867070830d18ba5cab98d02b92fa367b69d464e6e22860f0a6293e0 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 454102624 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:398e40715b7e39428c2c8d8cedfcc024cf0bed8844b80bf22ff4107345e2e298 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 429949466 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:79a4986a92c449401e69c14f34f2e0ad92bc219dd43f716ed806d551e2e09f72 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 428902230 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d879eb6e4426c976d1fffef15a38ff2454bca5d382c13c7b157181db9373f43e - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 367621904 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81dce0d0947963d8ce70c74bc63eaf2f2cdf17c40b839623c39d52f6a5ef2d61 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 364552618 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:127b6610891c6e38d8ee474e134b9799ecc6cf0ae9be659f06f933ea84e7c877 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 342705361 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b89e9a469c30c0b707766d0ae31665ccd6a13135c8074087f617ddee12860774 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 338328160 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e4d6bd70d5c481267ec2c420b7ee40d2addecb190e4787ac912996a29c90d00 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 334993182 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba38bc55a210372ca5243310f470d98e6b0d6712609eccfbcc4e5961c1cb0205 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 333782407 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8376e01b6c5a4f045548a16946d5b24918ddd88f156e26249e2ff38ab539d512 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 325752144 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c5ec8d53faf22153f54fc003b094fa263b769b50a457c878824c800e344b1b2f - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 318103617 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:19f3539c87ed84a02ee5a15438928083aa9e82294ff9a8dcb86d6eeeeff68c0b - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 311095159 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3e185bf343f119d7e64d3d4297ad358c86a613bd047f8d45176a0a484f5d87f3 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 305661722 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:43e5bf12c08a33946f99f475ae062187242d5ec74b7d23b48621a1b9f91e44b3 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 299586006 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:12cce448b8888f398604eeb6a3a7a14c2850cd9dfd463229bd0735aa68ada885 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 283513263 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294f21bc3baa4221f3c02ed3e2dadcf3e61490813691579be9f05d69b3639f3c - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 277408258 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dbc06ac527acaed79a4b189ba091fb4a37bfaa4b061f399011e07163fa4fcd26 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 277232582 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9166f5f53cbaddbf11d4895c96d73b16a9b9002cb3fadefd9f7e13e96ee16edc - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 276916452 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c69cb2203988cae3506a58d8616ac928b7f2f796d147fbc6640f644398fb1949 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 267689884 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b3e62612cd9192b7af779ff34a95b039cac3058b0c39129eb37a7deec08908e - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 264283423 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d7a71a527f95dd9b607cb2db3a2809003818573e6f550a96238ac2a55cc8a635 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 258272205 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e08c2b9e5bf641a524636e7b699830d27464636e431dd992a4a49f56b27d4dd2 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 257276911 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c14d0178b17e5184c17090bcc624da3e21f84497140865a2e41d6c3d54a0e42a - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 255893796 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:570784e273b695e401ef77926ca6acc2af523f706e9eff1ddd2bba1568d610a5 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 251107160 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8f5e521c35f624e1d40570f476c590490ee8624c55834c22f3dfa7e7ab4fd79f - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 243361727 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b56ea96a9f19d51b2822719cb5b7e4cd34a5253f97f34538b64f3a7111489d8 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<none> sizeBytes: 237382357 nodeInfo: architecture: amd64 bootID: e26520f5-f180-4826-ac7a-b206ca16f76b containerRuntimeVersion: cri-o://1.17.4-14.dev.rhaos4.4.gitb93af5d.el8 kernelVersion: 4.18.0-147.8.1.el8_1.x86_64 kubeProxyVersion: v1.17.1+3f6f40d kubeletVersion: v1.17.1+3f6f40d machineID: 7751b004e15743aea649269a173add91 operatingSystem: linux osImage: Red Hat Enterprise Linux CoreOS 44.81.202006080130-0 (Ootpa) systemUUID: 7751b004-e157-43ae-a649-269a173add91
I always wondered if the label selector in mcp is correct: Machine Config Selector: Match Expressions: Key: machineconfiguration.openshift.io/role Operator: In Values: worker-cnf worker Why not match just on worker-cnf?
Automation hit this issue again. Observe that worker-1 has no 'worker-cnf' label, however, RT kernel is still installed. # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-0 Ready master 4h18m v1.17.1+912792b 192.168.111.20 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 master-1 Ready master 4h17m v1.17.1+912792b 192.168.111.21 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 master-2 Ready master 4h17m v1.17.1+912792b 192.168.111.22 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 worker-0 Ready worker,worker-cnf 3h31m v1.17.1+912792b 192.168.111.23 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.8.1.rt24.101.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 worker-1 Ready worker 3h30m v1.17.1+912792b 192.168.111.24 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.8.1.rt24.101.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 worker-2 Ready worker 3h25m v1.17.1+912792b 192.168.111.25 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8 worker-3 Ready worker 3h34m v1.17.1+912792b 192.168.111.26 <none> Red Hat Enterprise Linux CoreOS 44.81.202006161946-0 (Ootpa) 4.18.0-147.20.1.el8_1.x86_64 cri-o://1.17.4-17.dev.rhaos4.4.gitf0cfdfc.el8
From machine-config-daemon logs I see it tries to remove rt-kernel and all additional stuff from /proc/cmdline (at least log shows that rpm-ostree command was executed without errors): I0623 15:45:13.267134 2751 update.go:1291] Running rpm-ostree [kargs --delete=nohz=on --delete=nosoftlockup --delete=skew_tick=1 --delete=intel_pstate=disable --delete=intel_iommu=on --delete=iommu=pt --delete=rcu_nocbs=1-3 --delete=tuned.non_isolcpus=00000001 --delete=default_hugepagesz=1G --delete=hugepagesz=1G --delete=hugepages=1] I0623 15:47:47.761103 2751 update.go:1291] Initiating switch from kernel realtime to default I0623 15:47:47.769240 2751 update.go:1291] Switching to kernelType=default, invoking rpm-ostree ["override" "reset" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra" "--uninstall" "kernel-rt-core" "--uninstall" "kernel-rt-modules" "--uninstall" "kernel-rt-modules-extra"] I0623 15:49:48.824461 2751 update.go:1291] initiating reboot: Node will reboot into config rendered-worker-266d98cce7d051b2576afb3add50ec44 I0623 15:49:50.421631 2751 daemon.go:553] Shutting down MachineConfigDaemon I0623 15:51:05.278092 2586 start.go:74] Version: v4.4.0-202006160135-dirty (b6c95fea3987483780994c8a5809a6afd15a633d) I0623 15:51:05.290953 2586 start.go:84] Calling chroot("/rootfs") I0623 15:51:05.293832 2586 rpm-ostree.go:366] Running captured: rpm-ostree status --json I0623 15:51:05.835798 2586 daemon.go:209] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0dcbd38df25ba42c4f62aba0520509e0fcab5c3ca2679df2f26d1e7de78f1f67 (44.81.202006161946-0) I0623 15:51:05.845673 2586 metrics.go:106] Registering Prometheus metrics I0623 15:51:05.845776 2586 metrics.go:111] Starting metrics listener on 127.0.0.1:8797 I0623 15:51:05.861258 2586 update.go:1291] Starting to manage node: worker-1 I0623 15:51:05.892696 2586 rpm-ostree.go:366] Running captured: rpm-ostree status I0623 15:51:06.170340 2586 daemon.go:778] State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0dcbd38df25ba42c4f62aba0520509e0fcab5c3ca2679df2f26d1e7de78f1f67 CustomOrigin: Managed by machine-config-operator Version: 44.81.202006161946-0 (2020-06-16T19:52:18Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.20.1.el8_1 LocalPackages: kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 Initramfs: -I '/etc/systemd/system.conf /etc/systemd/system.conf.d/setAffinity.conf' pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0dcbd38df25ba42c4f62aba0520509e0fcab5c3ca2679df2f26d1e7de78f1f67 CustomOrigin: Managed by machine-config-operator Version: 44.81.202006161946-0 (2020-06-16T19:52:18Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.20.1.el8_1 LocalPackages: kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 I0623 15:51:06.170367 2586 rpm-ostree.go:366] Running captured: journalctl --list-boots I0623 15:51:06.214931 2586 daemon.go:785] journalctl --list-boots: -6 7348b2aa010442fe8f0130c400f2934e Tue 2020-06-23 13:14:00 UTC—Tue 2020-06-23 13:27:49 UTC -5 7bddde3d635b4625abd942c8615e8d61 Tue 2020-06-23 13:28:02 UTC—Tue 2020-06-23 14:08:06 UTC -4 9de73ead42a442d089fef57fbe57cde9 Tue 2020-06-23 14:08:19 UTC—Tue 2020-06-23 14:30:48 UTC -3 c35644ae4cc34beba4c9a48b3c705c23 Tue 2020-06-23 14:30:59 UTC—Tue 2020-06-23 15:35:30 UTC -2 3ca184967bfd49e29b7999321d362137 Tue 2020-06-23 15:35:44 UTC—Tue 2020-06-23 15:38:48 UTC -1 1a83ee5e6c48431e9ef47ae63ab554dd Tue 2020-06-23 15:39:04 UTC—Tue 2020-06-23 15:49:55 UTC 0 b749a5bc768f4a1c9a9e96e100617aeb Tue 2020-06-23 15:50:11 UTC—Tue 2020-06-23 15:51:06 UTC I0623 15:51:06.215422 2586 daemon.go:528] Starting MachineConfigDaemon I0623 15:51:06.215734 2586 daemon.go:535] Enabling Kubelet Healthz Monitor E0623 15:51:09.944725 2586 reflector.go:153] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1.MachineConfig: Get https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: no route to host E0623 15:51:09.945801 2586 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Node: Get https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: no route to host I0623 15:56:02.145210 2586 daemon.go:731] Current config: rendered-worker-test-ca4f34a17c0142a571e1a28dbd605d89 I0623 15:56:02.145255 2586 daemon.go:732] Desired config: rendered-worker-266d98cce7d051b2576afb3add50ec44 I0623 15:56:02.161923 2586 update.go:1291] Disk currentConfig rendered-worker-266d98cce7d051b2576afb3add50ec44 overrides node annotation rendered-worker-test-ca4f34a17c0142a571e1a28dbd605d89 I0623 15:56:02.169620 2586 daemon.go:955] Validating against pending config rendered-worker-266d98cce7d051b2576afb3add50ec44 I0623 15:56:02.176061 2586 daemon.go:971] Validated on-disk state I0623 15:56:02.196827 2586 daemon.go:1005] Completing pending config rendered-worker-266d98cce7d051b2576afb3add50ec44 I0623 15:56:02.204937 2586 update.go:1291] completed update for config rendered-worker-266d98cce7d051b2576afb3add50ec44 I0623 15:56:02.212888 2586 daemon.go:1021] In desired config rendered-worker-266d98cce7d051b2576afb3add50ec44 but kernel was not reverted back.. as well as values in cmdline are still present: # cat /proc/cmdline BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-fc107e60e7e98cdbc54ca91fd9294d4cbf2ff5447d7b81511d7204df2f0b0e6c/vmlinuz-4.18.0-147.8.1.rt24.101.el8_1.x86_64 rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/fc107e60e7e98cdbc54ca91fd9294d4cbf2ff5447d7b81511d7204df2f0b0e6c/0 ignition.platform.id=openstack nohz=on nosoftlockup skew_tick=1 intel_pstate=disable intel_iommu=on iommu=pt rcu_nocbs=1-3 tuned.non_isolcpus=00000001 default_hugepagesz=1G hugepagesz=1G hugepages=1
There is a log from journalctl: Jun 27 14:27:25 worker-4 systemd[1]: Unmounting Boot partition... Jun 27 14:27:25 worker-4 systemd[1]: Unmounting /var/lib/containers/storage/overlay... Jun 27 14:27:25 worker-4 systemd[1]: Stopped target Host and Network Name Lookups. Jun 27 14:27:25 worker-4 systemd[1]: Removed slice system-serial\x2dgetty.slice. Jun 27 14:27:25 worker-4 systemd[1]: system-serial\x2dgetty.slice: Consumed 175ms CPU time Jun 27 14:27:25 worker-4 systemd[1]: Stopping Permit User Sessions... Jun 27 14:27:25 worker-4 systemd[1]: Removed slice system-getty.slice. Jun 27 14:27:25 worker-4 systemd[1]: system-getty.slice: Consumed 37ms CPU time Jun 27 14:27:25 worker-4 systemd[1]: Unmounted Boot partition. Jun 27 14:27:25 worker-4 systemd[1]: boot.mount: Consumed 26ms CPU time . Jun 27 14:27:25 worker-4 ostree[55230]: error: Unexpected state: /run/ostree-booted found, but no /boot/loader directory Jun 27 14:27:25 worker-4 systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited status=1 Jun 27 14:27:25 worker-4 systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'. Jun 27 14:27:25 worker-4 systemd[1]: Stopped OSTree Finalize Staged Deployment. Jun 27 14:27:25 worker-4 systemd[1]: ostree-finalize-staged.service: Consumed 37ms CPU time As i understand, it unmounts /boot/ partition first and after that ostree can't finish job there..
Closing as the fix happened in RHCOS and will be part of the next OCP release.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days