+++ This bug was initially created as a clone of Bug #1894910 +++ Description of problem: Our CI started to fail recently because the node dropped to the degraded state when we are trying to update it to use the machineconfig with the real-time option enabled. I saw two different errors: 1. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-kni_performance-addon-operators/433/pull-ci-openshift-kni-performance-addon-operators-master-e2e-gcp/1324108099925053440/artifacts/e2e-gcp/gather-extra/ { "lastTransitionTime": "2020-11-04T23:10:07Z", "message": "Node ci-op-lx8l2lsg-24cc7-fg9bn-worker-b-8bvw7 is reporting: \"error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: error: System transaction in progress\\n: exit status 1\"", "reason": "1 nodes are reporting degraded status on sync", "status": "True", "type": "NodeDegraded" }, 2. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift-kni_performance-addon-operators/434/pull-ci-openshift-kni-performance-addon-operators-master-e2e-gcp/1323998850603552768/artifacts/e2e-gcp/gather-extra/ "message": "Node ci-op-4pgtrg3b-24cc7-zz7c8-worker-b-58l2r is reporting: \"error removing staged deployment: error running rpm-ostree cleanup -p: error: System transaction in progress\\n: exit status 1: error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: Checking out tree 30e9764...done\\nEnabled rpm-md repositories: coreos-extensions\\nrpm-md repo 'coreos-extensions' (cached); generated: 2020-11-04T00:35:32Z\\nImporting rpm-md...done\\nResolving dependencies...done\\nerror: Could not depsolve transaction; 4 problems detected:\\n Problem 1: conflicting requests\\n - nothing provides linux-firmware \u003e= 20200619-99.git3890db36 needed by kernel-rt-core-4.18.0-240.rt7.54.el8.x86_64\\n Problem 2: package kernel-rt-modules-extra-4.18.0-240.rt7.54.el8.x86_64 requires kernel-rt-uname-r = 4.18.0-240.rt7.54.el8.x86_64, but none of the providers can be installed\\n - conflicting requests\\n - nothing provides linux-firmware \u003e= 20200619-99.git3890db36 needed by kernel-rt-core-4.18.0-240.rt7.54.el8.x86_64\\n Problem 3: package kernel-rt-modules-4.18.0-240.rt7.54.el8.x86_64 requires kernel-rt-uname-r = 4.18.0-240.rt7.54.el8.x86_64, but none of the providers can be installed\\n - conflicting requests\\n - nothing provides linux-firmware \u003e= 20200619-99.git3890db36 needed by kernel-rt-core-4.18.0-240.rt7.54.el8.x86_64\\n Problem 4: package kernel-rt-kvm-4.18.0-240.rt7.54.el8.x86_64 requires kernel-rt = 4.18.0-240.rt7.54.el8, but none of the providers can be installed\\n - conflicting requests\\n - nothing provides linux-firmware \u003e= 20200619-99.git3890db36 needed by kernel-rt-core-4.18.0-240.rt7.54.el8.x86_64\\n: exit status 1\"", "reason": "1 nodes are reporting degraded status on sync" Version-Release number of selected component (if applicable): master How reproducible: Always under the CI Steps to Reproduce: 1. 2. 3. Actual results: The update of the node to work with RT kernel fails Expected results: The update of the node to work with the RT kernel should succeed Additional info: You can find all relevant information under the CI links that I provided above(MCP, MC, must-gather...) --- Additional comment from Sinny Kumari on 2020-11-05 13:33:30 UTC --- Both 4.6 and 4.7 issue would be most likely related. We are seeing trimmed error message in 4.6 because it doesn't have verbose log enabled from rpm-ostree - https://github.com/openshift/machine-config-operator/pull/2097. It seems RHCOS is shipping linux-firmware-20200512-98.gitb2cad6a2.el8 but we are shipping kernel-rt 4.18.0-240.rt7.54.el8 package in latest machine-os-content which needs linux-firmware-20200619-99.git3890db36 . This needs machine-OS-content update to have correct linux-firmware dependency available for kernel-rt install to succeed. Making this bug as urgent as this also effect MCO 4.7 and 4.6 ci: 4.6 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2193/pull-ci-openshift-machine-config-operator-release-4.6-e2e-gcp-op/1324193039505166336 4.7 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2035/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1324274818467500032 --- Additional comment from Colin Walters on 2020-11-05 13:56:19 UTC --- This is on track to be fixed by https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1162 --- Additional comment from Micah Abbott on 2020-11-05 14:00:25 UTC --- Targeting 4.7; will need a clone for 4.6.z
Verified with 4.6.0-0.nightly-2020-11-07-035509 on GCP ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-11-07-035509 True False 102s Cluster version is 4.6.0-0.nightly-2020-11-07-035509 $ cat machineConfigs/worker-realtime.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kerneltype spec: kernelType: realtime $ oc apply -f machineConfigs/worker-realtime.yaml machineconfig.machineconfiguration.openshift.io/99-worker-kerneltype created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 00-worker 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 01-master-container-runtime 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 01-master-kubelet 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 01-worker-container-runtime 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 01-worker-kubelet 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 99-master-generated-registries 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 99-master-ssh 3.1.0 113m 99-worker-generated-registries 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m 99-worker-kerneltype 73m 99-worker-ssh 3.1.0 113m rendered-master-26289f039b78077aab0d57f41e7c83fc 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m rendered-worker-07945f23ee9807f5fe10a2cca7d94019 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 72m rendered-worker-1c3093c409badcfbe46fb3060728b1ef 054f6197a19ceffff44f361674bd24644d1a2bcb 3.1.0 106m $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-26289f039b78077aab0d57f41e7c83fc True False False 3 3 3 0 105m worker rendered-worker-07945f23ee9807f5fe10a2cca7d94019 True False False 3 3 3 0 105m $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-5j7t03k-f76d1-mg4d4-master-0 Ready master 106m v1.19.0+9f84db3 ci-ln-5j7t03k-f76d1-mg4d4-master-1 Ready master 107m v1.19.0+9f84db3 ci-ln-5j7t03k-f76d1-mg4d4-master-2 Ready master 106m v1.19.0+9f84db3 ci-ln-5j7t03k-f76d1-mg4d4-worker-b-kvjlh Ready worker 98m v1.19.0+9f84db3 ci-ln-5j7t03k-f76d1-mg4d4-worker-c-2jmgg Ready worker 98m v1.19.0+9f84db3 ci-ln-5j7t03k-f76d1-mg4d4-worker-d-wg5tx Ready worker 98m v1.19.0+9f84db3 $ oc debug node/ci-ln-5j7t03k-f76d1-mg4d4-worker-b-kvjlh Starting pod/ci-ln-5j7t03k-f76d1-mg4d4-worker-b-kvjlh-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.32.4 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# rpm-ostree status State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 CustomOrigin: Managed by machine-config-operator Version: 46.82.202011061621-0 (2020-11-06T16:25:16Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-193.29.1.el8_2 LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:47e1213c98063dfd7f5ccae41e611a25446c8ac493cfdd05d8f1c46b61ab13d4 CustomOrigin: Managed by machine-config-operator Version: 46.82.202011061621-0 (2020-11-06T16:25:16Z) sh-4.4# uname -a Linux ci-ln-5j7t03k-f76d1-mg4d4-worker-b-kvjlh 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1 SMP PREEMPT RT Fri Oct 16 14:11:07 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.4 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4987