Description of problem: During upgrade CI jobs, MCO node daemon would pass an event to be sent out asynchronously via client go's event recorder and shortly after reboot the node. Sometimes, the OSUpgradeStaged event would not be sent out before MCO node daemon is terminated. An alert depends on this event reliably sent out. Version-Release number of MCO (Machine Config Operator) (if applicable): 4.11 Platform (AWS, VSphere, Metal, etc.): All Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)? (Y/N/Not sure): Yes How reproducible: It is only reproducible on upgrade runs every 1 in 15 runs with OVN-K as the network plugin. I have looked up SDN. Did you catch this issue by running a Jenkins job? If yes, please list: 1. OCP CI Steps to Reproduce: 1. Run the upgrade job 20 times 2. Alert 'Nodes should reach OSUpdateStaged in a timely fashion' will fire Actual results: Expected results: Additional info:
verified on 4.12.0-0.nightly-2022-07-05-083442 $ oc create -f change-workers-chrony-configuration.yaml machineconfig.machineconfiguration.openshift.io/change-workers-chrony-configuration created $ oc get mc/change-workers-chrony-configuration -o yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: creationTimestamp: "2022-07-05T12:21:40Z" generation: 1 labels: machineconfiguration.openshift.io/role: worker name: change-workers-chrony-configuration resourceVersion: "32685" uid: 0aaed570-61f3-4fe7-b26f-d2628f942595 spec: config: ignition: config: {} security: tls: {} timeouts: {} version: 3.2.0 networkd: {} passwd: {} storage: files: - contents: source: data:text/plain;charset=utf-8;base64,cG9vbCAwLnJoZWwucG9vbC5udHAub3JnIGlidXJzdApkcmlmdGZpbGUgL3Zhci9saWIvY2hyb255L2RyaWZ0Cm1ha2VzdGVwIDEuMCAzCnJ0Y3N5bmMKbG9nZGlyIC92YXIvbG9nL2Nocm9ueQo= mode: 420 overwrite: true path: /etc/chrony.conf osImageURL: "" check events on updated worker nodes $ for node in $(oc get node -l node-role.kubernetes.io/worker -o name);do echo;echo $node;oc get events -n default --field-selector involvedObject.name=$(echo $node|awk -F/ '{print $2}'),type!=Warning;done node/ip-10-0-136-4.us-west-2.compute.internal LAST SEEN TYPE REASON OBJECT MESSAGE 83m Normal RegisteredNode node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal event: Registered Node ip-10-0-136-4.us-west-2.compute.internal in Controller 82m Normal NodeDone node/ip-10-0-136-4.us-west-2.compute.internal Setting node ip-10-0-136-4.us-west-2.compute.internal, currentConfig rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 to Done 82m Normal Uncordon node/ip-10-0-136-4.us-west-2.compute.internal Update completed for config rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 and node has been uncordoned 82m Normal ConfigDriftMonitorStarted node/ip-10-0-136-4.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 82m Normal Starting node/ip-10-0-136-4.us-west-2.compute.internal openshift-sdn done initializing node networking. 80m Normal RegisteredNode node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal event: Registered Node ip-10-0-136-4.us-west-2.compute.internal in Controller 79m Normal RegisteredNode node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal event: Registered Node ip-10-0-136-4.us-west-2.compute.internal in Controller 76m Normal RegisteredNode node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal event: Registered Node ip-10-0-136-4.us-west-2.compute.internal in Controller 73m Normal RegisteredNode node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal event: Registered Node ip-10-0-136-4.us-west-2.compute.internal in Controller 54m Normal ConfigDriftMonitorStopped node/ip-10-0-136-4.us-west-2.compute.internal Config Drift Monitor stopped 54m Normal Cordon node/ip-10-0-136-4.us-west-2.compute.internal Cordoned node to apply update 54m Normal Drain node/ip-10-0-136-4.us-west-2.compute.internal Draining node to update config. 53m Normal NodeNotSchedulable node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeNotSchedulable 52m Normal OSUpdateStarted node/ip-10-0-136-4.us-west-2.compute.internal 52m Normal OSUpgradeSkipped node/ip-10-0-136-4.us-west-2.compute.internal OS upgrade skipped; new MachineConfig (rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46) has same OS image (quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04b54950ce296d73746f22b66ff6c5484c37be78cb34aaf352338359112fa241) as old MachineConfig (rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201) >>52m Normal OSUpdateStaged node/ip-10-0-136-4.us-west-2.compute.internal Changes to OS staged 52m Normal PendingConfig node/ip-10-0-136-4.us-west-2.compute.internal Written pending config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 >>52m Normal Reboot node/ip-10-0-136-4.us-west-2.compute.internal Node will reboot into config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 52m Normal NodeNotReady node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeNotReady 51m Normal Starting node/ip-10-0-136-4.us-west-2.compute.internal Starting kubelet. 51m Normal NodeAllocatableEnforced node/ip-10-0-136-4.us-west-2.compute.internal Updated Node Allocatable limit across pods 51m Normal NodeHasSufficientMemory node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeHasSufficientMemory 51m Normal NodeHasNoDiskPressure node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeHasNoDiskPressure 51m Normal NodeHasSufficientPID node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeHasSufficientPID 51m Normal NodeReady node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeReady 51m Normal NodeNotSchedulable node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeNotSchedulable 51m Normal Starting node/ip-10-0-136-4.us-west-2.compute.internal openshift-sdn done initializing node networking. 51m Normal NodeDone node/ip-10-0-136-4.us-west-2.compute.internal Setting node ip-10-0-136-4.us-west-2.compute.internal, currentConfig rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 to Done 51m Normal Uncordon node/ip-10-0-136-4.us-west-2.compute.internal Update completed for config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 and node has been uncordoned 51m Normal ConfigDriftMonitorStarted node/ip-10-0-136-4.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 51m Normal NodeSchedulable node/ip-10-0-136-4.us-west-2.compute.internal Node ip-10-0-136-4.us-west-2.compute.internal status is now: NodeSchedulable node/ip-10-0-145-10.us-west-2.compute.internal LAST SEEN TYPE REASON OBJECT MESSAGE 83m Normal RegisteredNode node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal event: Registered Node ip-10-0-145-10.us-west-2.compute.internal in Controller 82m Normal Starting node/ip-10-0-145-10.us-west-2.compute.internal openshift-sdn done initializing node networking. 82m Normal NodeDone node/ip-10-0-145-10.us-west-2.compute.internal Setting node ip-10-0-145-10.us-west-2.compute.internal, currentConfig rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 to Done 82m Normal Uncordon node/ip-10-0-145-10.us-west-2.compute.internal Update completed for config rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 and node has been uncordoned 82m Normal ConfigDriftMonitorStarted node/ip-10-0-145-10.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 80m Normal RegisteredNode node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal event: Registered Node ip-10-0-145-10.us-west-2.compute.internal in Controller 79m Normal RegisteredNode node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal event: Registered Node ip-10-0-145-10.us-west-2.compute.internal in Controller 76m Normal RegisteredNode node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal event: Registered Node ip-10-0-145-10.us-west-2.compute.internal in Controller 73m Normal RegisteredNode node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal event: Registered Node ip-10-0-145-10.us-west-2.compute.internal in Controller 56m Normal ConfigDriftMonitorStopped node/ip-10-0-145-10.us-west-2.compute.internal Config Drift Monitor stopped 56m Normal Cordon node/ip-10-0-145-10.us-west-2.compute.internal Cordoned node to apply update 56m Normal Drain node/ip-10-0-145-10.us-west-2.compute.internal Draining node to update config. 56m Normal NodeNotSchedulable node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeNotSchedulable 55m Normal OSUpdateStarted node/ip-10-0-145-10.us-west-2.compute.internal 55m Normal OSUpgradeSkipped node/ip-10-0-145-10.us-west-2.compute.internal OS upgrade skipped; new MachineConfig (rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46) has same OS image (quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04b54950ce296d73746f22b66ff6c5484c37be78cb34aaf352338359112fa241) as old MachineConfig (rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201) >>55m Normal OSUpdateStaged node/ip-10-0-145-10.us-west-2.compute.internal Changes to OS staged 55m Normal PendingConfig node/ip-10-0-145-10.us-west-2.compute.internal Written pending config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 >>55m Normal Reboot node/ip-10-0-145-10.us-west-2.compute.internal Node will reboot into config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 54m Normal NodeNotReady node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeNotReady 54m Normal Starting node/ip-10-0-145-10.us-west-2.compute.internal Starting kubelet. 54m Normal NodeAllocatableEnforced node/ip-10-0-145-10.us-west-2.compute.internal Updated Node Allocatable limit across pods 54m Normal NodeHasSufficientMemory node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeHasSufficientMemory 54m Normal NodeHasNoDiskPressure node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeHasNoDiskPressure 54m Normal NodeHasSufficientPID node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeHasSufficientPID 54m Normal NodeNotReady node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeNotReady 54m Normal NodeNotSchedulable node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeNotSchedulable 54m Normal NodeDone node/ip-10-0-145-10.us-west-2.compute.internal Setting node ip-10-0-145-10.us-west-2.compute.internal, currentConfig rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 to Done 54m Normal Starting node/ip-10-0-145-10.us-west-2.compute.internal openshift-sdn done initializing node networking. 54m Normal NodeReady node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeReady 54m Normal NodeSchedulable node/ip-10-0-145-10.us-west-2.compute.internal Node ip-10-0-145-10.us-west-2.compute.internal status is now: NodeSchedulable 54m Normal Uncordon node/ip-10-0-145-10.us-west-2.compute.internal Update completed for config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 and node has been uncordoned 54m Normal ConfigDriftMonitorStarted node/ip-10-0-145-10.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 node/ip-10-0-228-131.us-west-2.compute.internal LAST SEEN TYPE REASON OBJECT MESSAGE 84m Normal Starting node/ip-10-0-228-131.us-west-2.compute.internal Starting kubelet. 84m Normal NodeHasSufficientMemory node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasSufficientMemory 84m Normal NodeHasNoDiskPressure node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasNoDiskPressure 84m Normal NodeHasSufficientPID node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasSufficientPID 84m Normal NodeAllocatableEnforced node/ip-10-0-228-131.us-west-2.compute.internal Updated Node Allocatable limit across pods 84m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 83m Normal NodeDone node/ip-10-0-228-131.us-west-2.compute.internal Setting node ip-10-0-228-131.us-west-2.compute.internal, currentConfig rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 to Done 83m Normal Uncordon node/ip-10-0-228-131.us-west-2.compute.internal Update completed for config rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 and node has been uncordoned 83m Normal ConfigDriftMonitorStarted node/ip-10-0-228-131.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201 83m Normal Starting node/ip-10-0-228-131.us-west-2.compute.internal openshift-sdn done initializing node networking. 83m Normal NodeReady node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeReady 83m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 83m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 80m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 79m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 76m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 73m Normal RegisteredNode node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal event: Registered Node ip-10-0-228-131.us-west-2.compute.internal in Controller 51m Normal ConfigDriftMonitorStopped node/ip-10-0-228-131.us-west-2.compute.internal Config Drift Monitor stopped 51m Normal Cordon node/ip-10-0-228-131.us-west-2.compute.internal Cordoned node to apply update 51m Normal Drain node/ip-10-0-228-131.us-west-2.compute.internal Draining node to update config. 51m Normal NodeNotSchedulable node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeNotSchedulable 50m Normal OSUpdateStarted node/ip-10-0-228-131.us-west-2.compute.internal 50m Normal OSUpgradeSkipped node/ip-10-0-228-131.us-west-2.compute.internal OS upgrade skipped; new MachineConfig (rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46) has same OS image (quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04b54950ce296d73746f22b66ff6c5484c37be78cb34aaf352338359112fa241) as old MachineConfig (rendered-worker-48e0e3dccfd04e5cbe958e278e8ed201) >>50m Normal OSUpdateStaged node/ip-10-0-228-131.us-west-2.compute.internal Changes to OS staged 50m Normal PendingConfig node/ip-10-0-228-131.us-west-2.compute.internal Written pending config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 >>50m Normal Reboot node/ip-10-0-228-131.us-west-2.compute.internal Node will reboot into config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 49m Normal NodeNotReady node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeNotReady 49m Normal Starting node/ip-10-0-228-131.us-west-2.compute.internal Starting kubelet. 49m Normal NodeAllocatableEnforced node/ip-10-0-228-131.us-west-2.compute.internal Updated Node Allocatable limit across pods 49m Normal NodeHasSufficientMemory node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasSufficientMemory 49m Normal NodeHasNoDiskPressure node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasNoDiskPressure 49m Normal NodeHasSufficientPID node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeHasSufficientPID 49m Normal NodeReady node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeReady 49m Normal NodeNotSchedulable node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeNotSchedulable 49m Normal Starting node/ip-10-0-228-131.us-west-2.compute.internal openshift-sdn done initializing node networking. 49m Normal NodeDone node/ip-10-0-228-131.us-west-2.compute.internal Setting node ip-10-0-228-131.us-west-2.compute.internal, currentConfig rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 to Done 49m Normal Uncordon node/ip-10-0-228-131.us-west-2.compute.internal Update completed for config rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 and node has been uncordoned 49m Normal ConfigDriftMonitorStarted node/ip-10-0-228-131.us-west-2.compute.internal Config Drift Monitor started, watching against rendered-worker-85595fd0729ecd6e79b0772b1f4b4a46 49m Normal NodeSchedulable node/ip-10-0-228-131.us-west-2.compute.internal Node ip-10-0-228-131.us-west-2.compute.internal status is now: NodeSchedulable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399