Description of problem: The second worker node does not have 'Initramfs:' in rpm-ostree status. Version-Release number of selected component (if applicable): performance-addon-operator-container-v4.4.0-34 How reproducible: Always Steps to Reproduce: 1. Ensure that you have multiple worker nodes labeled as 'worker-cnf'. 2. Deploy performance feature. 3. SSH into the worker nodes and observe the output of `rpm-ostree status -v` Actual results: Worker node 1: [core@worker-0 ~]$ rpm-ostree status -v State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5fb2f82367d61e0e65103c6ff9721851ae9b7e625b078753fcec4bd6cc774acd CustomOrigin: Managed by machine-config-operator Version: 44.81.202002240530-0 (2020-02-24T05:36:09Z) BaseCommit: 6d68969ffe13a0c2b6d9df62a427d22c8421a652b79ec3d2881c6eaa4ee6e7b2 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) Commit: 46a992715d34ef5b9729a0057a02e6f971c0cdaa0835ab217b6fc52aa2ff5c12 Staged: no StateRoot: rhcos RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.5.1.el8_1 LocalPackages: kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64 Initramfs: -I /etc/systemd/system.conf /etc/systemd/system.conf.d/setAffinity.conf pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5fb2f82367d61e0e65103c6ff9721851ae9b7e625b078753fcec4bd6cc774acd CustomOrigin: Managed by machine-config-operator Version: 44.81.202002240530-0 (2020-02-24T05:36:09Z) BaseCommit: 6d68969ffe13a0c2b6d9df62a427d22c8421a652b79ec3d2881c6eaa4ee6e7b2 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) Commit: 233544bfc3a6c2d5fdde98bbec1fc78ceed9fd107570bebbaead4d9809d00e41 StateRoot: rhcos RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.5.1.el8_1 LocalPackages: kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64 [core@worker-0 ~]$ Worker node 2: [core@worker-1 ~]$ rpm-ostree status -v State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5fb2f82367d61e0e65103c6ff9721851ae9b7e625b078753fcec4bd6cc774acd CustomOrigin: Managed by machine-config-operator Version: 44.81.202002240530-0 (2020-02-24T05:36:09Z) BaseCommit: 6d68969ffe13a0c2b6d9df62a427d22c8421a652b79ec3d2881c6eaa4ee6e7b2 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) Commit: 3d94c3c2ce50de7a7d85ebf8d9fb26bd3711eed1a4bdd714445048662df8b41d Staged: no StateRoot: rhcos RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.5.1.el8_1 LocalPackages: kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64 pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5fb2f82367d61e0e65103c6ff9721851ae9b7e625b078753fcec4bd6cc774acd CustomOrigin: Managed by machine-config-operator Version: 44.81.202002240530-0 (2020-02-24T05:36:09Z) Commit: 6d68969ffe13a0c2b6d9df62a427d22c8421a652b79ec3d2881c6eaa4ee6e7b2 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) StateRoot: rhcos [core@worker-1 ~]$ Expected results: Initramfs in all the worker nodes labeled with 'worker-cnf' Additional info: # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-0 Ready master 9h v1.17.1 192.168.111.20 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 master-1 Ready master 9h v1.17.1 192.168.111.21 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 master-2 Ready master 9h v1.17.1 192.168.111.22 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 worker-0 Ready worker,worker-cnf 9h v1.17.1 192.168.111.23 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.rt24.98.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 worker-1 Ready worker,worker-cnf 9h v1.17.1 192.168.111.24 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.rt24.98.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 worker-2 Ready worker 9h v1.17.1 192.168.111.25 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8 worker-3 Ready worker 9h v1.17.1 192.168.111.26 <none> Red Hat Enterprise Linux CoreOS 44.81.202002240530-0 (Ootpa) 4.18.0-147.5.1.el8_1.x86_64 cri-o://1.17.0-4.dev.rhaos4.4.gitc3436cc.el8
# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-5750e45d70387c7d0df7a824eca28690 True False False 3 3 3 0 9h worker rendered-worker-a8dcfdd7c435150f70dc3a557418e9f3 True False False 2 2 2 0 9h worker-cnf rendered-worker-cnf-9665ba66c03c86f98fde3c54015a44cb True False False 2 2 2 0 4h49m # oc get mcp worker-cnf -o yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"machineconfiguration.openshift.io/v1","kind":"MachineConfigPool","metadata":{"annotations":{},"labels":{"machineconfiguration.openshift.io/role":"worker-cnf"},"name":"worker-cnf"},"spec":{"machineConfigSelector":{"matchExpressions":[{"key":"machineconfiguration.openshift.io/role","operator":"In","values":["worker-cnf","worker"]}]},"maxUnavailable":null,"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/worker-cnf":""}},"paused":false}} creationTimestamp: "2020-02-25T06:14:38Z" generation: 5 labels: machineconfiguration.openshift.io/role: worker-cnf name: worker-cnf resourceVersion: "150323" selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker-cnf uid: 19c3e3b3-7c4d-4785-bde8-5099ec708393 spec: configuration: name: rendered-worker-cnf-9665ba66c03c86f98fde3c54015a44cb source: - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-worker - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-worker-chronyd-custom - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-container-runtime - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 98-worker-c77c686b-12fd-4e58-bbe1-aff3187bb6a2-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 98-worker-cnf-19c3e3b3-7c4d-4785-bde8-5099ec708393-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-c77c686b-12fd-4e58-bbe1-aff3187bb6a2-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-cnf-19c3e3b3-7c4d-4785-bde8-5099ec708393-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-ssh - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: load-sctp-module - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: performance-performance machineConfigSelector: matchExpressions: - key: machineconfiguration.openshift.io/role operator: In values: - worker-cnf - worker nodeSelector: matchLabels: node-role.kubernetes.io/worker-cnf: "" paused: false status: conditions: - lastTransitionTime: "2020-02-25T06:14:43Z" message: "" reason: "" status: "False" type: RenderDegraded - lastTransitionTime: "2020-02-25T06:14:48Z" message: "" reason: "" status: "False" type: NodeDegraded - lastTransitionTime: "2020-02-25T06:14:48Z" message: "" reason: "" status: "False" type: Degraded - lastTransitionTime: "2020-02-25T06:41:59Z" message: All nodes are updated with rendered-worker-cnf-9665ba66c03c86f98fde3c54015a44cb reason: "" status: "True" type: Updated - lastTransitionTime: "2020-02-25T06:41:59Z" message: "" reason: "" status: "False" type: Updating configuration: name: rendered-worker-cnf-9665ba66c03c86f98fde3c54015a44cb source: - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-worker - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 00-worker-chronyd-custom - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-container-runtime - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 01-worker-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 98-worker-c77c686b-12fd-4e58-bbe1-aff3187bb6a2-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 98-worker-cnf-19c3e3b3-7c4d-4785-bde8-5099ec708393-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-c77c686b-12fd-4e58-bbe1-aff3187bb6a2-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-cnf-19c3e3b3-7c4d-4785-bde8-5099ec708393-kubelet - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-registries - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: 99-worker-ssh - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: load-sctp-module - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig name: performance-performance degradedMachineCount: 0 machineCount: 2 observedGeneration: 5 readyMachineCount: 2 unavailableMachineCount: 0 updatedMachineCount: 2
I have a suspicion that it is related to race/sync. Since we have a system unit : pre-boot-tuning.service that runs: rpm-ostree initramfs --enable --arg=-I --arg="/etc/systemd/system.conf /etc/systemd/system.conf.d/setAffinity.conf" touch /var/reboot And another system reboot.service unit that runs: if [[ -f /var/reboot ]]; then rm -f /var/reboot echo "File /var/reboot exists, initiate reboot" systemctl reboot fi We might need to remove reboot.service completely and just run under pre-boot-tuning.service script: rpm-ostree initramfs --enable --arg=-I --arg="/etc/systemd/system.conf /etc/systemd/system.conf.d/setAffinity.conf" (Added a reboot -r flag)
Can you inspect the second worker node to see if the `pre-boot-tuning.service` actually ran? Or if it is a race as suggested in comment#2, can you adjust the ordering of the systemd units so that `reboot.service` fires after `pre-boot-tuning.service`? Though the suggestion of using `-r` as part of `rpm-ostree initramfs` would be a good path forward, too.
(In reply to Micah Abbott from comment #3) > Can you inspect the second worker node to see if the > `pre-boot-tuning.service` actually ran? I have taken a look on the system where Gowrishankar Rajaiyan ran the deployment and saw that the pre-boot-tuning.service ran and the systemctl logs looks exactly the same as the one we have on a working node. (shanks you can re-verify me on that - even by just running journalctl -u pre-boot-tuning.service on both nodes). > Or if it is a race as suggested in comment#2, can you adjust the ordering of > the systemd units so that `reboot.service` fires after > `pre-boot-tuning.service`? The reboot.service is a one shot run that should fire after pre-boot-tuning.service: [Unit] Description=Preboot tuning patch Before=kubelet.service Before=reboot.service > Though the suggestion of using `-r` as part of `rpm-ostree initramfs` would > be a good path forward, too. I am working on a PR for that under the performance-addons-operator, but that is out of an assessment of what might have gone wrong. Still need to understand if that is a sync issue or something else.
I am raising the severity as we are in trouble if the low latency deployment is not reliable. I do not see any workaround either.
Added a PR for using reboot directly in the rpm-ostree initramfs command: https://github.com/openshift-kni/performance-addon-operators/pull/104
I wonder if this could be related to the other initrd issue we have (file not being added): https://bugzilla.redhat.com/show_bug.cgi?id=1806588
>> Though the suggestion of using `-r` as part of `rpm-ostree initramfs` would >> be a good path forward, too. >I am working on a PR for that under the performance-addons-operator, but that is out of an assessment of what might have gone wrong. >Still need to understand if that is a sync issue or something else. Although using -r for rpm-ostree initramfs injection seem to pass CI in the suggest PR: https://github.com/openshift-kni/performance-addon-operators/pull/104 I still see inconsistent behaviour on my local machine using that fix. -- I tried adding a sync+ sleep as well to see if possible race conditions can be solved using: rpm-ostree initramfs --enable --arg=-I --arg="${SYSTEM_CONFIG_FILE} ${SYSTEM_CONFIG_CUSTOM_FILE}" sleep 10 sync systemctl reboot but still i see locally: rpm-ostree status -v State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:43e531d989859972b6ec5ed6277ee98288b8a238a9da933d2a9bec6eb93528a6 CustomOrigin: Managed by machine-config-operator Version: 44.81.202001240931.0 (2020-01-24T09:36:54Z) BaseCommit: ad2b3d5d3595322bc00278bb0f09542e33f43d4c0d92c91757ea6996818912a1 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) Commit: b338e6fe85c5bb358c4111b13939d32ef198ad2601dea8b3797b80e1c643789f Staged: no StateRoot: rhcos RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.3.1.el8_1 LocalPackages: kernel-rt-modules-extra-4.18.0-147.3.1.rt24.96.el8_1.x86_64 kernel-rt-modules-4.18.0-147.3.1.rt24.96.el8_1.x86_64 kernel-rt-core-4.18.0-147.3.1.rt24.96.el8_1.x86_64 pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:43e531d989859972b6ec5ed6277ee98288b8a238a9da933d2a9bec6eb93528a6 CustomOrigin: Managed by machine-config-operator Version: 44.81.202001240931.0 (2020-01-24T09:36:54Z) Commit: ad2b3d5d3595322bc00278bb0f09542e33f43d4c0d92c91757ea6996818912a1 |- art-rhcos-4.3 ((invalid timestamp)) |- art-rhcos-4.4 ((invalid timestamp)) |- rhel8-fast-datapath ((invalid timestamp)) |- rhel8-baseos ((invalid timestamp)) `- rhel8-appstream ((invalid timestamp)) StateRoot: rhcos Service log: journalctl -u pre-boot-tuning.service -- Logs begin at Tue 2020-01-28 15:58:30 UTC, end at Thu 2020-02-27 09:13:20 UTC. -- Feb 27 07:35:29 test-1-84xwn-worker-0-kfpx4 systemd[1]: Starting Preboot tuning patch... Feb 27 07:35:30 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: /etc/sysconfig/irqbalance Feb 27 07:35:45 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Checking out tree ad2b3d5...done Feb 27 07:35:45 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Enabled rpm-md repositories: Feb 27 07:35:45 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Importing rpm-md...done Feb 27 07:35:46 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Resolving dependencies...done Feb 27 07:35:46 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Applying 4 overrides and 3 overlays Feb 27 07:35:47 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Processing packages...done Feb 27 07:35:47 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Running pre scripts...done Feb 27 07:36:15 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Running post scripts...done Feb 27 07:36:34 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Running posttrans scripts...done Feb 27 07:36:36 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Writing rpmdb...done Feb 27 07:41:16 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Generating initramfs...done Feb 27 07:41:22 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Writing OSTree commit...done Feb 27 07:41:28 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Staging deployment...done Feb 27 07:41:32 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Freed: 913.4 MB (pkgcache branches: 0) Feb 27 07:41:32 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1823]: Initramfs regeneration is now: enabled Feb 27 07:41:43 test-1-84xwn-worker-0-kfpx4 systemd[1]: pre-boot-tuning.service: Main process exited, code=killed, status=15/TERM Feb 27 07:41:43 test-1-84xwn-worker-0-kfpx4 systemd[1]: pre-boot-tuning.service: Failed with result 'signal'. Feb 27 07:41:43 test-1-84xwn-worker-0-kfpx4 systemd[1]: Stopped Preboot tuning patch. Feb 27 07:41:43 test-1-84xwn-worker-0-kfpx4 systemd[1]: pre-boot-tuning.service: Consumed 307ms CPU time -- Reboot -- Feb 27 07:42:59 test-1-84xwn-worker-0-kfpx4 systemd[1]: Starting Preboot tuning patch... Feb 27 07:43:00 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1812]: Pre boot tuning configuration already applied Feb 27 07:43:00 test-1-84xwn-worker-0-kfpx4 pre-boot-tuning.sh[1812]: Setting kernel rcuo* threads to the housekeeping cpus To rule out the possibility that realtime kernel have changed the rpm-ostree outcome- Realtime kernel updates : Feb 27 00:37:51 localhost kernel: Linux version 4.18.0-147.3.1.el8_1.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP Wed Nov 27 01:11:44 UTC 2019 Feb 27 07:34:23 localhost kernel: Linux version 4.18.0-147.3.1.rt24.96.el8_1.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP PREEMPT RT Wed Nov 27 18:29:55 UTC 2019 Feb 27 07:42:01 localhost kernel: Linux version 4.18.0-147.3.1.rt24.96.el8_1.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP PREEMPT RT Wed Nov 27 18:29:55 UTC 2019
Added a retry mechanism for initramfs injection under https://github.com/openshift-kni/performance-addon-operators/pull/104 This can resolve the issue with an additional reboot
(In reply to Martin Sivák from comment #7) > I wonder if this could be related to the other initrd issue we have (file > not being added): https://bugzilla.redhat.com/show_bug.cgi?id=1806588 Yeah, I'm pretty sure this is a duplicate. *** This bug has been marked as a duplicate of bug 1806588 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days