Bug 1916169

Summary:

a reboot while MCO is applying changes leaves the node in undesirable state and MCP looks fine (UPDATED=true)

Product:

OpenShift Container Platform

Reporter:

Sabina Aledort <saledort>

Component:

Machine Config Operator

Assignee:

mkenigsb

Machine Config Operator sub component:

Machine Config Operator

QA Contact:

Rio Liu <rioliu>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

high

CC:

alukiano, aos-bugs, bbreard, dblack, dollierp, fromani, imcleod, jlebon, jligon, keyoung, markmc, mcornea, miabbott, mkarg, mkenigsb, mkrejci, nstielau, rioliu, sbkirova, skumari, smilner, tmulquee, vgrinber, walters, william.caban, ykashtan

Version:

4.7

Keywords:

Reopened

Target Milestone:

---

Flags:

mkenigsb: needinfo-
mkenigsb: needinfo-

Target Release:

4.10.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

* Before this update, the Machine Config Operator (MCO) stored pending configuration changes to disk before it applied them to {op-system-first}. If a power loss interrupted the MCO from applying the configuration, it treated the configuration as applied and did not validate the changes. If this configuration contained invalid changes, applying them failed. With this update, the MCO saves a configuration to disk only after being applied. This way, if the power is lost while the MCO is applying the configuration, it reapplies the configuration after it restarts. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1916169[BZ#1916169])

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-03-10 16:02:37 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
reproducer script	none

Description Sabina Aledort 2021-01-14 11:43:49 UTC

Description of problem:
When applying RT kernel on worker node, the node gets stuck in SchedulingDisabled and we see an error from rpm-ostree.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-12-150634 

How reproducible:
Create custom MCP and enable RT kernel on worker node.

Steps to Reproduce:
1. Create custom MCP:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-duprofile
  labels:
    machineconfiguration.openshift.io/role: worker-duprofile
spec:
  machineConfigSelector:
    matchExpressions:
      - {
          key: machineconfiguration.openshift.io/role,
          operator: In,
          values: [worker-duprofile, worker],
        }
  paused: false
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-duprofile: ""
      
2. Lable worker node to be part of the custom MCP.

3. Enable RT kernel:

apiVersion: performance.openshift.io/v1
kind: PerformanceProfile
metadata:
  name: perf-example
spec:
  additionalKernelArgs:
  - nosmt
  cpu:
    isolated: 1,3,5,7,9-51
    reserved: 0,2,4,6,8
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 16
      node: 0
      size: 1G
  nodeSelector:
    node-role.kubernetes.io/worker-duprofile: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true

Actual results:
The node is stuck in SchedulingDisabled, RT kernel wasn't applied and we see the following error in the machine config daemon log:

I0113 03:29:17.566348    9261 update.go:445] Rolling back applied changes to OS due to error: error running rpm-ostree kargs --delete=skew_tick=1 --delete=nohz=on --delete=rcu_nocbs=1,3,5,7,9-51 --delete=tuned.non_isolcpus=000000ff,ffffffff,fff00000,00000155 --delete=intel_pstate=disable --delete=nosoftlockup --delete=tsc=nowatchdog --delete=intel_iommu=on --delete=iommu=pt --delete=isolcpus=managed_irq,1,3,5,7,9-51 --delete=systemd.cpu_affinity=0,2,4,6,8,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 --delete=default_hugepagesz=1G --delete=nosmt --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=1,3,5,7,9-51 --append=tuned.non_isolcpus=000000ff,ffffffff,fff00000,00000155 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,1,3,5,7,9-51 --append=systemd.cpu_affinity=0,2,4,6,8,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 --append=default_hugepagesz=1G --append=nosmt: error: No key 'skew_tick' found
: exit status 1
 
[root@cnfdd5-installer cnf-internal-deploy]# oc get node
NAME                                             STATUS                     ROLES                     AGE   VERSION
cnfdd5.clus2.t5g.lab.eng.bos.redhat.com          Ready                      worker,worker-cnf         11h   v1.20.0+31b56ef
cnfdd6.clus2.t5g.lab.eng.bos.redhat.com          Ready                      worker,worker-cnf         11h   v1.20.0+31b56ef
cnfdd7.clus2.t5g.lab.eng.bos.redhat.com          Ready,SchedulingDisabled   worker,worker-duprofile   11h   v1.20.0+31b56ef
dhcp19-17-115.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            11h   v1.20.0+31b56ef
dhcp19-17-116.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            11h   v1.20.0+31b56ef
dhcp19-17-117.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            11h   v1.20.0+31b56ef

Expected results:
The RT kernel should be applied and the node should become Ready.

Additional info:

Comment 1 Artyom 2021-01-14 12:38:35 UTC

I looked at the machine-config-daemon log and I see that for some reason MCD decided that the node has the desired config when in reality it did not apply kernel arguments at all.


I0114 05:10:26.414913    7452 update.go:1852] Starting update from rendered-worker-duprofile-fddde77e6dd45d93e36ecf425f9f2e3a to rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452: &{osUpdate:false kargs:true fips:false passwd:false files:true units:true kernelType:true extensions:false} <- start of the update
...
A lot of errors regarding network is unreachable
...
W0114 05:13:40.994793    7452 run.go:44] nice failed: running nice -- ionice -c 3 podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-re
lease-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14 failed: Error: unable to pull quay.io/openshift-release-de
v/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: Error initializing source docker://quay.io/openshift-release-dev/
ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: (Mirrors also failed: [dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.c
om:5000/ocp4/openshift4@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: error pinging docker registry dhcp19-17-175.clus2.t5g.lab.en
g.bos.redhat.com:5000: Get "https://dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com:5000/v2/": dial tcp: lookup dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com
 on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c0
2a944642a93b80f4aa80c364218c97b8bfb3c14: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.11.5.19:53: dial ud
p 10.11.5.19:53: connect: network is unreachable
: exit status 125; retrying...
I0114 05:18:30.769401   10199 start.go:108] Version: v4.7.0-202101130853.p0-dirty (69ac8b941b0f29d3cfdfced35aded406d75bc84a) <- last error and jump to start
...
I0114 05:18:32.172888   10199 daemon.go:401] Node cnfdd7.clus2.t5g.lab.eng.bos.redhat.com is not labeled node-role.kubernetes.io/master
I0114 05:18:32.177594   10199 daemon.go:816] Current config: rendered-worker-duprofile-fddde77e6dd45d93e36ecf425f9f2e3a
I0114 05:18:32.177604   10199 daemon.go:817] Desired config: rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452
I0114 05:18:32.184604   10199 update.go:1852] Disk currentConfig rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452 overrides node's currentConfig annotation rendered-worker-duprofile-fddde77e6dd45d93e36ecf425f9f2e3a
I0114 05:18:32.186708   10199 daemon.go:1102] Validating against current config rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452
I0114 05:18:32.206530   10199 daemon.go:1110] Validated on-disk state
I0114 05:18:32.206541   10199 daemon.go:1181] In desired config rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452 <- machine config daemon decided that the machine has desired state and update the current machine config, when in reality it does not updated kernel arguments

Comment 4 Ken Young 2021-01-15 16:32:52 UTC

In a little further analysis of this issue, it appears in our automation of the DU profile specifically which is using the PAO.  In other automated test cases, we are not seeing this issue so we may need to use our environment to correct the issue.

Comment 5 Ben Howard 2021-01-18 17:27:49 UTC

Can you please attach a must-gather?  I *think* I know the problem, but would need more information.

Comment 6 Ken Young 2021-01-18 17:34:17 UTC

Ben,

The CI run is here: https://master-jenkins-csb-cnfqe.cloud.paas.psi.redhat.com/view/CNF%20Dev/job/ran-tests/44/

A must gather is here: https://master-jenkins-csb-cnfqe.cloud.paas.psi.redhat.com/view/CNF%20Dev/job/ran-tests/44/artifact/must-gather.tar.gz

/KenY

Comment 7 Sinny Kumari 2021-01-20 10:47:34 UTC

It seems that original problem is what Artyom mentioned in comment#1

Summarizing the problem after looking at must-gather log and machine-config-daemon-gk49g pod log running on node cnfdd7.clus2.t5g.lab.eng.bos.redhat.com in there:

- Worker was initially running fine and then image registry and pull-secret has been update  <--- Applied successfully
...
2021-01-14T02:51:04.726315851Z I0114 02:51:04.726275    7452 update.go:1852] Starting update from rendered-worker-11d9e183829cdce30de4b92da89fe047 to rendered-worker-fddde77e6dd45d93e36ecf425f9f2e3a: &{osUpdate:false kargs:false fips:false passwd:false files:true units:false kernelType:false extensions:false}
2021-01-14T02:51:04.759054365Z I0114 02:51:04.759025    7452 update.go:512] File diff: detected change to /etc/containers/registries.conf
2021-01-14T02:51:04.759121811Z I0114 02:51:04.759113    7452 update.go:512] File diff: detected change to /var/lib/kubelet/config.json
...

- A new MachineConfigPpool worker-duprofile was created and added the node cnfdd7.clus2.t5g.lab.eng.bos.redhat.com in that pool < --- applied suucessfully

- MachineConfig 50-nto-worker-duprofile.yaml was applied which adds kargs, files, unit and switches to RT kernel <---- Problem started here

....
2021-01-14T05:10:26.361347085Z I0114 05:10:26.361303    7452 update.go:596] Checking Reconcilable for config rendered-worker-duprofile-fddde77e6dd45d93e36ecf425f9f2e3a to rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452
2021-01-14T05:10:26.414948492Z I0114 05:10:26.414913    7452 update.go:1852] Starting update from rendered-worker-duprofile-fddde77e6dd45d93e36ecf425f9f2e3a to rendered-worker-duprofile-0469ef4eb76db5f6c67f60132d80d452: &{osUpdate:false kargs:true fips:false passwd:false files:true units:true kernelType:true extensions:false}
2021-01-14T05:10:26.442420037Z I0114 05:10:26.442397    7452 update.go:1852] Update prepared; beginning drain
...

2021-01-14T05:12:10.153351159Z I0114 05:12:10.153328    7452 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-662841183 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14
2021-01-14T05:12:10.379123537Z error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable
2021-01-14T05:12:10.382602694Z W0114 05:12:10.382581    7452 run.go:44] nice failed: running nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-662841183 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14 failed: error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable
2021-01-14T05:12:10.382602694Z : exit status 1; retrying...

...

2021-01-14T05:12:25.537532667Z I0114 05:12:25.537525    7452 update.go:371] Falling back to using podman cp to fetch OS image content


2021-01-14T05:12:25.537559798Z I0114 05:12:25.537551    7452 run.go:18] Running: nice -- ionice -c 3 podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14
2021-01-14T05:12:25.627480952Z Error: unable to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: (Mirrors also failed: [dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com:5000/ocp4/openshift4@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: error pinging docker registry dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com:5000: Get "https://dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com:5000/v2/": dial tcp: lookup dhcp19-17-175.clus2.t5g.lab.eng.bos.redhat.com on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14: error pinging docker registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable
...


Due to network unreachable, MCO couldn't download the machine-os-content which it will use later on to switch to RT kernel. This looks to me that most likely either pull-secret config or image registry change might have caused network unreachable. @Sabina can you please check it?

Comment 8 Sinny Kumari 2021-01-20 10:59:41 UTC

To add to the issue, we are seeing in the error message as:

I0113 03:29:17.566348    9261 update.go:445] Rolling back applied changes to OS due to error: error running rpm-ostree kargs --delete=skew_tick=1 --delete=nohz=on --delete=rcu_nocbs=1,3,5,7,9-51 --delete=tuned.non_isolcpus=000000ff,ffffffff,fff00000,00000155 --delete=intel_pstate=disable --delete=nosoftlockup --delete=tsc=nowatchdog --delete=intel_iommu=on --delete=iommu=pt --delete=isolcpus=managed_irq,1,3,5,7,9-51 --delete=systemd.cpu_affinity=0,2,4,6,8,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 --delete=default_hugepagesz=1G --delete=nosmt --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=1,3,5,7,9-51 --append=tuned.non_isolcpus=000000ff,ffffffff,fff00000,00000155 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,1,3,5,7,9-51 --append=systemd.cpu_affinity=0,2,4,6,8,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 --append=default_hugepagesz=1G --append=nosmt: error: No key 'skew_tick' found
: exit status 1


We are seeing this error because MCO keeps retrying to move to desired config. Note from comment#7 that MCO (rpm-ostree) never applied successfully any kargs and hence during recovery `rpm-ostree kargs -delete=skew_tick=1 ...` failed as rpm-ostree never applied it. This is happening because MCO is not checking today for kargs diff between old and new MachineConfig before updating kargs. We will fix this but please note that actual cause of failure lies in comment#7

Comment 9 Sabina Aledort 2021-01-20 16:54:03 UTC

I was able to pull the image manually on the affected node.

[root@cnfdd5-installer ~]# oc get node
NAME                                             STATUS                     ROLES                     AGE   VERSION
cnfdd5.clus2.t5g.lab.eng.bos.redhat.com          Ready                      worker,worker-cnf         16h   v1.20.0+d9c52cc
cnfdd6.clus2.t5g.lab.eng.bos.redhat.com          Ready                      worker,worker-cnf         16h   v1.20.0+d9c52cc
cnfdd7.clus2.t5g.lab.eng.bos.redhat.com          Ready,SchedulingDisabled   worker,worker-duprofile   16h   v1.20.0+d9c52cc
dhcp19-17-115.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            17h   v1.20.0+d9c52cc
dhcp19-17-116.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            17h   v1.20.0+d9c52cc
dhcp19-17-117.clus2.t5g.lab.eng.bos.redhat.com   Ready                      master,virtual            17h   v1.20.0+d9c52cc

[root@cnfdd5-installer ~]# ssh core.t5g.lab.eng.bos.redhat.com
Red Hat Enterprise Linux CoreOS 47.83.202101171239-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html

---
Last login: Wed Jan 20 16:49:49 2021 from 10.19.17.175
[core@cnfdd7 ~]$ sudo -i
[root@cnfdd7 ~]# podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:96c41d0f9f751623c8dc2d1c02a944642a93b80f4aa80c364218c97b8bfb3c14
a117ce8f9053528c60b17c8cafcf16aaa5dcb71e7187b00dafdb3ccca39b91c6
[root@cnfdd7 ~]# 

[root@cnfdd5-installer ~]# oc describe node cnfdd7.clus2.t5g.lab.eng.bos.redhat.com
Name:               cnfdd7.clus2.t5g.lab.eng.bos.redhat.com
Roles:              worker,worker-duprofile
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=cnfdd7.clus2.t5g.lab.eng.bos.redhat.com
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node-role.kubernetes.io/worker-duprofile=
                    node.openshift.io/os_id=rhcos
                    ptp/slave=
Annotations:        k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_cnfdd7.clus2.t5g.lab.eng.bos.redhat.com","mac-address":"0c:42:a1:55:e4:8e","ip-addresses...
                    k8s.ovn.org/node-chassis-id: aca6e191-5798-4313-ab86-45a77cade266
                    k8s.ovn.org/node-local-nat-ip: {"default":["169.254.10.219"]}
                    k8s.ovn.org/node-mgmt-port-mac-address: ca:3e:43:82:4f:3f
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.19.16.104/23","ipv6":"2620:52:0:1310:5c4f:1483:696b:da9b/64"}
                    k8s.ovn.org/node-subnets: {"default":"10.133.2.0/23"}
                    machine.openshift.io/machine: openshift-machine-api/cnfdd5-9v4v4-worker-0-tnqz4
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-duprofile-ca50fb1e10457688976170bcb20389b8
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-duprofile-e5cca08848ad42fb183fccdb4b27420d
                    machineconfiguration.openshift.io/reason:
                      error running rpm-ostree kargs --delete=skew_tick=1 --delete=nohz=on --delete=rcu_nocbs=1,3,5,7,9-51 --delete=tuned.non_isolcpus=000000ff,...
                      : exit status 1
                    machineconfiguration.openshift.io/state: Degraded
                    sriovnetwork.openshift.io/state: Idle
                    volumes.kubernetes.io/controller-managed-attach-detach: true

Comment 10 Yuval Kashtan 2021-01-21 11:09:05 UTC

maybe it's because that image extract command doesnt wait for networkmanager to finish?

```
I0113 03:15:58.010936    8404 update.go:1470] Writing systemd unit dropin "10-ovs-vswitchd-restart.conf"
I0113 03:15:58.249746    8404 update.go:1459] Preset systemd unit ovs-vswitchd.service
I0113 03:15:58.249762    8404 update.go:1470] Writing systemd unit dropin "10-ovsdb-restart.conf"
I0113 03:15:58.250067    8404 update.go:1470] Writing systemd unit dropin "10-mco-default-env.conf"
I0113 03:15:58.255736    8404 update.go:1542] Could not reset unit preset for pivot.service, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file pivot.service does not exist.
)
I0113 03:15:58.255748    8404 update.go:1470] Writing systemd unit dropin "mco-disabled.conf"
I0113 03:15:58.260998    8404 update.go:1542] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to preset unit: Unit file zincati.service does not exist.
)
I0113 03:15:58.261010    8404 update.go:1505] Writing systemd unit "stalld.service"
I0113 03:15:58.261196    8404 update.go:1505] Writing systemd unit "update-rps@.service"
I0113 03:15:58.498474    8404 update.go:1459] Preset systemd unit update-rps@.service
I0113 03:15:58.498495    8404 update.go:1505] Writing systemd unit "hugepages-allocation-1048576kB-NUMA0.service"
I0113 03:15:58.747437    8404 update.go:1437] Enabled systemd units: [etc-NetworkManager-system\x2dconnections\x2dmerged.mount kubelet.service machine-config-daemon-firstboot.service machine-config-daemon-pull.service node-valid-hostname.service nodeip-configuration.service openvswitch.service ovs-configuration.service ovsdb-server.service stalld.service hugepages-allocation-1048576kB-NUMA0.service]
I0113 03:15:58.747461    8404 update.go:1288] Deleting stale data
I0113 03:15:58.772154    8404 update.go:1683] Writing SSHKeys at "/home/core/.ssh/authorized_keys"
I0113 03:15:58.798307    8404 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-893969747 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41b5ef878bdd94890a904535618abb95e190f011a0f3bb21b3f9c36baa74c288
I0113 03:16:06.563475    8404 daemon.go:600] Got SIGTERM, but actively updating
W0113 03:16:06.570524    8404 run.go:44] nice failed: running nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-893969747 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41b5ef878bdd94890a904535618abb95e190f011a0f3bb21b3f9c36baa74c288 failed: : signal: terminated; retrying...
W0113 03:16:10.810028    8404 daemon.go:645] Got an error from auxiliary tools: kubelet health check has failed 1 times: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused
I0113 03:16:11.570645    8404 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-893969747 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41b5ef878bdd94890a904535618abb95e190f011a0f3bb21b3f9c36baa74c288
error: unable to connect to image repository quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41b5ef878bdd94890a904535618abb95e190f011a0f3bb21b3f9c36baa74c288: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.11.5.19:53: dial udp 10.11.5.19:53: connect: network is unreachable
```

Comment 11 Sinny Kumari 2021-01-21 13:12:30 UTC

When we extract OSContainer image, RHCOS is fully running along with basic systemd services that are part of boot.

NM related logs that you seeing in machine-config-daemon:

I0113 03:15:58.747437    8404 update.go:1437] Enabled systemd units: [etc-NetworkManager-system\x2dconnections\x2dmerged.mount kubelet.service machine-config-daemon-firstboot.service machine-config-daemon-pull.service node-valid-hostname.service nodeip-configuration.service openvswitch.service ovs-configuration.service ovsdb-server.service stalld.service hugepages-allocation-1048576kB-NUMA0.service]

this basically just make sure that NM service will start when node reboots, it doesn't impact active boot.


Adding here offline discussion that I had with Sabina:

Sabina provided me access to a cluster where next CI run occurred, looking at machine-config-daemon pod logs I can confirm that MCO was able to successfully switch to RT kernel on the node available in worker-duprofile pool.

Adding relevant logs here:

$ oc logs -f  -n openshift-machine-config-operator  -c machine-config-daemon machine-config-daemon-2rr6m
...
I0121 12:14:50.270445    6850 update.go:1852] Starting update from rendered-worker-243559e29aa74a48d04e1cb5430ec055 to rendered-worker-duprofile-131d78f004c218a04fca898f5c2d5d99: &{osUpdate:false kargs:true fips:false passwd:false files:true units:true kernelType:true extensions:false}

...
I0121 12:14:52.477586    6850 run.go:18] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-machine-os-content/os-content-129809413 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8672a3ce64788e4138bec0d2cafe701d96b55c35482314d393b52dd414e635a4
I0121 12:15:05.593487    6850 update.go:1852] Running rpm-ostree [kargs --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=1,3,5,7,9-51 --append=tuned.non_isolcpus=00000155 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,1,3,5,7,9-51 --append=systemd.cpu_affinity=0,2,4,6,8 --append=default_hugepagesz=1G --append=nosmt]
I0121 12:15:05.595787    6850 rpm-ostree.go:261] Running captured: rpm-ostree kargs --append=skew_tick=1 --append=nohz=on --append=rcu_nocbs=1,3,5,7,9-51 --append=tuned.non_isolcpus=00000155 --append=intel_pstate=disable --append=nosoftlockup --append=tsc=nowatchdog --append=intel_iommu=on --append=iommu=pt --append=isolcpus=managed_irq,1,3,5,7,9-51 --append=systemd.cpu_affinity=0,2,4,6,8 --append=default_hugepagesz=1G --append=nosmt
I0121 12:15:08.299351    6850 update.go:1852] Initiating switch from kernel default to realtime
I0121 12:15:08.301828    6850 update.go:1852] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra" "--install" "kernel-rt-core" "--install" "kernel-rt-modules" "--install" "kernel-rt-modules-extra" "--install" "kernel-rt-kvm"]
I0121 12:15:08.303455    6850 rpm-ostree.go:261] Running captured: rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm
I0121 12:16:07.246397    6850 update.go:1852] Rebooting node
I0121 12:16:07.248542    6850 update.go:1852] initiating reboot: Node will reboot into config rendered-worker-duprofile-131d78f004c218a04fca898f5c2d5d99
...

Comment 12 Sinny Kumari 2021-01-21 13:16:30 UTC

Also, I have fixed in MCO problem that surfaced up in comment#8 https://github.com/openshift/machine-config-operator/pull/2349 . Now, MCO will not reapply kargs when MCO doesn't detect any change in kargs

Comment 13 Yuval Kashtan 2021-01-21 15:49:44 UTC

I think problems here all surface from that kernel switch.
we have multiple open BZs on kernel-rt which manifests in that node
such as https://bugzilla.redhat.com/show_bug.cgi?id=1886109
(see that it is verified and attached to rhel errata, but havent hit OCP yet)

and I think these can be the root cause of the network problems.

Comment 16 Yuval Kashtan 2021-01-27 13:34:19 UTC

here's a nice and simple reproducer:
```
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/run-level: "1"
  name: openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "4.6"
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mh-dpdk-site-1-fqdn-worker1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  isRdma: false
  nicSelector:
    pfNames:
    - ens7f1
  nodeSelector:
    node-role.kubernetes.io/worker-duprofile: ""
  numVfs: 4
  priority: 10
  resourceName: mh_u_site_1_fqdn_worker1
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-duprofile
spec:
  machineConfigSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/role
      operator: In
      values:
      - worker-duprofile
      - worker
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-duprofile: ""
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: reproducer-example
  labels:
    machineconfiguration.openshift.io/role: worker-duprofile

spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    storage:
  fips: false
  kernelArguments:
  - nosmt
  kernelType: realtime
  osImageURL: ""
```

to use it:
this need a node with Intel SRIOV capable NIC.
make sure to update the SriovNetworkNodePolicy with that NIC name
then:
1. oc apply -f reproducer.yaml # it is expected to fail on missing CRDs
2. wait for cluster to settle and sriov-network-operator to become operational
3. apply worker-duprofile to node
4. oc apply -f reproducer.yaml # again to apply missing CRs
5. you can inspect sriov-daemon and machine-config-daemon on that node to see what happening

and that it.

Comment 21 Yuval Kashtan 2021-01-27 20:19:11 UTC

Yeah, Vitaly is right!
an easier, even simpler, reproducer would be without sriov
just
1. create mcp
2. label the node
3. apply the mc above (or similar, preferably something that will take long enough to run)
4. reboot the node externally while that is applying.

Comment 30 Ken Young 2021-02-11 21:37:02 UTC

Ben,

I believe this issue should be documented in the 4.7 Release notes.  I am thinking something like this:

Cause: If an asynchronous reboot such as a power interruption coincides with changes in the Machine Config policy, the node can be left in an undetermined state.  Machine Config Operator believes that updated policy has been applied when it actually has not.

Consequence: The node is left in an indeterminate state.

Workaround (if any): If the reboot is caused by applying MCO configuration as well as other configuration which requires a restart of the server, this can be avoided.  First apply all MCO configuration and wait for the nodes to settle.  Then apply the secondary configuration.  There is no known workaround for power interruptions.

Result: If the configuration of MCO and other configuration (such as SRIOV on an Intel NIC)  is completed sequentially, the node will provision correctly.

What do you think?  By the way, the deadline for identifying bugs for Release Notes is close of business tomorrow.

Regards,
Ken Y

Comment 31 Ken Young 2021-03-27 16:43:32 UTC

*** Bug 1928266 has been marked as a duplicate of this bug. ***

Comment 35 Sinny Kumari 2021-10-27 16:22:26 UTC

This shouldn't be happening now as this was fixed as part of sriov operator PR https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/93 . SRIOV operator now co-ordinates with MCO before making changes to node. See bz#1921321

Comment 36 Yuval Kashtan 2021-10-27 18:55:54 UTC

@skumari 
I tend to disagree.

This have nothing to do with SRIOV, as the reboot can be simply a result of power loss (or any other kind of interruption)
MCD should be resilient to these kind of failures on it's own.

Comment 37 Sinny Kumari 2021-10-28 12:48:56 UTC

Thanks Yuval for pointing out the Power outage situation, missed some of the comments while going through it during bug refinement.

I tried to reproduce the problem by applying RT kernel MachineConfig and in the middle of kernel switching operation rebooted the node
1. Couldn't reproduce this exact bug when reboot was initiated manually by user by doing rsh/ssh into node
2. Can reproduce the problem when did hard reset of the node externally (on gcp did reset from GCE UI)

Opening this bug, we will try to fix it when we have some free cycle.

Comment 41 mkenigsb 2022-01-24 16:57:01 UTC

Created attachment 1853109 [details]
reproducer script

Added a script to reproduce. The final command should output a kernel version containing rt but does not

Comment 46 Tony Mulqueen 2022-02-22 10:28:39 UTC

Final text signed off by @mkenigsb:
Previously, the MCO stored pending configuration changes to disk before OS changes were applied. As a result, in situations such as power loss, the MCO assumed OS changes had already been applied on restart, and validation skipped over changes such as kargs and kernel-rt. This was remedied by only storing configuration changes to disk after the OS changes were applied. Now, if power is lost during the configuration application, the MCO knows it must reattempt the configuration application on restart.

Comment 47 mkenigsb 2022-02-22 13:46:24 UTC

The text from Tony looks good! It's not currently entered in the bug's Doc Text though.

Comment 48 Colin Walters 2022-03-09 21:23:13 UTC

This should be fixed more comprehensively by https://github.com/openshift/enhancements/pull/1032 (which will effectively implement https://github.com/openshift/machine-config-operator/issues/1190)

Comment 50 errata-xmlrpc 2022-03-10 16:02:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056