Bug 2104978 - MCD degrades are not overwrite-able by subsequent errors
Summary: MCD degrades are not overwrite-able by subsequent errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.13.0
Assignee: Charlie Doern
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-07 15:42 UTC by Yu Qi Zhang
Modified: 2023-05-17 22:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-17 22:46:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3488 0 None open Bug 2104978: fix degraded pool state message 2023-01-17 01:47:53 UTC
Red Hat Product Errata RHSA-2023:1326 0 None None None 2023-05-17 22:47:12 UTC

Internal Links: 2108320

Description Yu Qi Zhang 2022-07-07 15:42:33 UTC
In 4.11 we added daemon validation to the MCD via https://github.com/openshift/machine-config-operator/pull/3105

While trying to test another failure, we encountered the following error:

Last Transition Time:  2022-07-07T15:05:36Z
    Message:               Node ip-10-0-63-89.us-east-2.compute.internal is reporting: "unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: error running rpm-ostree kargs: exit status 1\nJob for rpm-ostreed.service failed.\nSee \"systemctl status rpm-ostreed.service\" and \"journalctl -xe\" for details.\n\x1b[0m\x1b[31merror: \x1b[0mexit status: 1\n"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded

Matching timestamps in the MCD log:

E0707 15:05:33.409582    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: error running rpm-ostree kargs: exit status 1
Job for rpm-ostreed.service failed.
See "systemctl status rpm-ostreed.service" and "journalctl -xe" for details.
error: exit status: 1

and correspondingly in rpm-ostreed

Jul 07 15:05:31 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 61 seconds
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Start request repeated too quickly.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Failed with result 'start-limit-hit'.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: Failed to start rpm-ostree System Management Daemon.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Consumed 98ms CPU time


So it looks like rpm-ostreed didn't start yet (but eventually was successful), the validator tried running rpm-ostree commands very early, and we failed without a retry and degraded due to it.

The MCD does continue:

I0707 15:06:33.427468    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:06:33.427487    2466 daemon.go:1175] state: Degraded
I0707 15:06:33.430410    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:06:33.430427    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:06:33.968530    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:07:33.985443    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:07:33.985470    2466 daemon.go:1175] state: Degraded

but this subsequent error doesn't overwrite the original issue and as such we are left in the old state reporting the wrong error.


A few questions then:
1. why is rpm-ostree.d failing like that, and is it expected
2. should we be retrying the operation (it looks it it got past it eventually on the next sync, so maybe not an issue)
3. why the MCO doesn't correctly update the error state

Comment 1 Yu Qi Zhang 2022-07-07 15:43:17 UTC
Full daemon logs:

I0707 15:05:29.238914    2466 start.go:112] Version: v4.11.0-202206250809.p0.g4aa21da.assembly.stream-dirty (4aa21da3efc9b49ff4d8fec0dfe5a8598a19332a)
I0707 15:05:29.243232    2466 start.go:125] Calling chroot("/rootfs")
I0707 15:05:29.245594    2466 update.go:1962] Running: systemctl start rpm-ostreed
I0707 15:05:30.039148    2466 rpm-ostree.go:324] Running captured: rpm-ostree status --json
I0707 15:05:30.116122    2466 rpm-ostree.go:324] Running captured: rpm-ostree status --json
I0707 15:05:30.169454    2466 daemon.go:236] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02eec9efc74b23a730588ce6a236706b5aeeef5f52010ba25720a611cf1d929c (411.86.202206291708-0)
I0707 15:05:30.290695    2466 start.go:101] Copied self to /run/bin/machine-config-daemon on host
I0707 15:05:30.292230    2466 start.go:189] overriding kubernetes api to https://api-int.sregidor-ssh2.qe.devcluster.openshift.com:6443
I0707 15:05:30.293259    2466 metrics.go:106] Registering Prometheus metrics
I0707 15:05:30.293328    2466 metrics.go:111] Starting metrics listener on 127.0.0.1:8797
I0707 15:05:30.296343    2466 writer.go:93] NodeWriter initialized with credentials from /var/lib/kubelet/kubeconfig
I0707 15:05:30.296475    2466 update.go:1977] Starting to manage node: ip-10-0-63-89.us-east-2.compute.internal
I0707 15:05:30.335927    2466 rpm-ostree.go:324] Running captured: rpm-ostree status
I0707 15:05:30.404093    2466 daemon.go:1220] State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02eec9efc74b23a730588ce6a236706b5aeeef5f52010ba25720a611cf1d929c
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202206291708-0 (2022-06-29T17:11:39Z)
 
  f9d88d07921009f524c39773d0935a7d1642a02bd37e0d621696bf4f766a0540
                   Version: 45.82.202008010929-0 (2020-08-01T09:33:23Z)
I0707 15:05:30.404613    2466 coreos.go:95] CoreOS aleph version: mtime=2020-08-01 09:35:48.964 +0000 UTC build=45.82.202008010929-0 imgid=rhcos-45.82.202008010929-0-qemu.x86_64.qcow2
No /etc/.ignition-result.json foundI0707 15:05:30.404699    2466 rpm-ostree.go:324] Running captured: journalctl --list-boots
I0707 15:05:30.408951    2466 daemon.go:1229] journalctl --list-boots:
-1 7f00b2ad240347b198afc64b62f0a281 Thu 2022-07-07 15:02:18 UTC—Thu 2022-07-07 15:04:37 UTC
 0 6b64211bdd9f4837af505ea3541c4e4d Thu 2022-07-07 15:04:49 UTC—Thu 2022-07-07 15:05:30 UTC
I0707 15:05:30.409005    2466 rpm-ostree.go:324] Running captured: systemctl list-units --state=failed --no-legend
I0707 15:05:30.417923    2466 daemon.go:1244] systemd service state: OK
I0707 15:05:30.417959    2466 daemon.go:909] Starting MachineConfigDaemon
I0707 15:05:30.418067    2466 daemon.go:916] Enabling Kubelet Healthz Monitor
I0707 15:05:31.332825    2466 daemon.go:451] Node ip-10-0-63-89.us-east-2.compute.internal is not labeled node-role.kubernetes.io/master
I0707 15:05:31.332938    2466 node.go:24] No machineconfiguration.openshift.io/currentConfig annotation on node ip-10-0-63-89.us-east-2.compute.internal: map[cloud.network.openshift.io/egress-ipconfig:[{"interface":"eni-0f535398b33d51315","ifaddr":{"ipv4":"10.0.48.0/20"},"capacity":{"ipv4":14,"ipv6":15}}] machine.openshift.io/machine:openshift-machine-api/sregidor-ssh2-dlnh9-worker-us-east-2a-mod-pflkc machineconfiguration.openshift.io/controlPlaneTopology:HighlyAvailable volumes.kubernetes.io/controller-managed-attach-detach:true], in cluster bootstrap, loading initial node annotation from /etc/machine-config-daemon/node-annotations.json
I0707 15:05:31.333435    2466 node.go:45] Setting initial node config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:05:31.352098    2466 daemon.go:1137] In bootstrap mode
I0707 15:05:31.352268    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:05:31.352313    2466 daemon.go:1175] state: Done
I0707 15:05:31.356550    2466 daemon.go:1425] No bootstrap pivot required; unlinking bootstrap node annotations
I0707 15:05:31.356635    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:05:31.356662    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:05:31.429554    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:05:33.352033    2466 daemon.go:500] Transitioned from state: Done -> Degraded
I0707 15:05:33.355528    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:05:33.355544    2466 daemon.go:1175] state: Degraded
I0707 15:05:33.359275    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:05:33.359294    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:05:33.409582    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: error running rpm-ostree kargs: exit status 1
Job for rpm-ostreed.service failed.
See "systemctl status rpm-ostreed.service" and "journalctl -xe" for details.
error: exit status: 1
I0707 15:06:33.427468    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:06:33.427487    2466 daemon.go:1175] state: Degraded
I0707 15:06:33.430410    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:06:33.430427    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:06:33.968530    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:07:33.985443    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:07:33.985470    2466 daemon.go:1175] state: Degraded
I0707 15:07:33.988396    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:07:33.988413    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:07:34.020335    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:08:34.035048    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:08:34.035066    2466 daemon.go:1175] state: Degraded
I0707 15:08:34.037822    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:08:34.037836    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:08:34.071001    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:09:34.081452    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:09:34.081468    2466 daemon.go:1175] state: Degraded
I0707 15:09:34.084325    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:09:34.084339    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:09:34.117176    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:10:34.135071    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:10:34.135088    2466 daemon.go:1175] state: Degraded
I0707 15:10:34.137784    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:10:34.137797    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:10:34.169218    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:11:34.185006    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:11:34.185029    2466 daemon.go:1175] state: Degraded
I0707 15:11:34.187940    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:11:34.187954    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:11:34.220806    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:12:34.241692    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:12:34.241708    2466 daemon.go:1175] state: Degraded
I0707 15:12:34.244578    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:12:34.244591    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:12:34.277219    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:13:34.290973    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:13:34.290997    2466 daemon.go:1175] state: Degraded
I0707 15:13:34.293871    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:13:34.293884    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:13:34.325982    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:14:34.341341    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:14:34.341359    2466 daemon.go:1175] state: Degraded
I0707 15:14:34.344093    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:14:34.344107    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:14:34.375638    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:15:34.391092    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:15:34.391108    2466 daemon.go:1175] state: Degraded
I0707 15:15:34.394009    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:15:34.394026    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:15:35.007586    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:16:35.027338    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:16:35.027355    2466 daemon.go:1175] state: Degraded
I0707 15:16:35.030243    2466 daemon.go:1463] Validating against pending config rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:16:35.030257    2466 rpm-ostree.go:324] Running captured: rpm-ostree kargs
E0707 15:16:35.062333    2466 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5a883f3d21fbb30631f333f2923a19c2: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0
I0707 15:17:35.074459    2466 daemon.go:1165] Current+desired config: rendered-worker-5a883f3d21fbb30631f333f2923a19c2
I0707 15:17:35.074478    2466 daemon.go:1175] state: Degraded

Full rpm-ostreed logs:
sh-4.4#  journalctl -u rpm-ostreed  
-- Logs begin at Thu 2022-07-07 15:02:18 UTC, end at Thu 2022-07-07 15:24:35 UTC. --
Jul 07 15:03:17 ip-10-0-63-89 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 15:03:18 ip-10-0-63-89 rpm-ostree[2199]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: In idle state; will auto-exit in 64 seconds
Jul 07 15:03:19 ip-10-0-63-89 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.22 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.22 unit:machine-config-daemon-firstboot.service uid:0) vanished; remaining=0
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: In idle state; will auto-exit in 60 seconds
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.23 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.23 unit:machine-config-daemon-firstboot.service uid:0) vanished; remaining=0
Jul 07 15:03:19 ip-10-0-63-89 rpm-ostree[2199]: In idle state; will auto-exit in 62 seconds
Jul 07 15:03:44 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.24 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1
Jul 07 15:03:44 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.24 unit:machine-config-daemon-firstboot.service uid:0) vanished; remaining=0
Jul 07 15:03:44 ip-10-0-63-89 rpm-ostree[2199]: In idle state; will auto-exit in 62 seconds
Jul 07 15:03:44 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.25 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1
Jul 07 15:03:44 ip-10-0-63-89 rpm-ostree[2199]: Initiated txn UpdateDeployment for client(id:cli dbus:1.25 unit:machine-config-daemon-firstboot.service uid:0): /org/projectatomic/rpmostree1/rhcos
Jul 07 15:04:05 ip-10-0-63-89 rpm-ostree[2199]: Created new deployment /ostree/deploy/rhcos/deploy/cff8e8dfd35b4a8e42f3bd4eb3afbbe9cd2d4fb3896c782585596372a8d059e4.0
Jul 07 15:04:05 ip-10-0-63-89 rpm-ostree[2199]: sanitycheck(/usr/bin/true) successful
Jul 07 15:04:18 ip-10-0-63-89 rpm-ostree[2199]: Txn UpdateDeployment on /org/projectatomic/rpmostree1/rhcos successful
Jul 07 15:04:18 ip-10-0-63-89 rpm-ostree[2199]: client(id:cli dbus:1.25 unit:machine-config-daemon-firstboot.service uid:0) vanished; remaining=0
Jul 07 15:04:18 ip-10-0-63-89 rpm-ostree[2199]: In idle state; will auto-exit in 60 seconds
Jul 07 15:04:19 ip-10-0-63-89 systemd[1]: Stopping rpm-ostree System Management Daemon...
Jul 07 15:04:19 ip-10-0-63-89 systemd[1]: Stopped rpm-ostree System Management Daemon.
Jul 07 15:04:19 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Consumed 12.887s CPU time
-- Reboot --
Jul 07 15:05:29 ip-10-0-63-89 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 15:05:29 ip-10-0-63-89 rpm-ostree[2631]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 64 seconds
Jul 07 15:05:30 ip-10-0-63-89 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.85 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.85 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 64 seconds
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.86 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.86 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 63 seconds
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.87 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.87 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:05:30 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 63 seconds
Jul 07 15:05:31 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.94 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:05:31 ip-10-0-63-89 rpm-ostree[2631]: client(id:machine-config-operator dbus:1.94 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:05:31 ip-10-0-63-89 rpm-ostree[2631]: In idle state; will auto-exit in 61 seconds
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Start request repeated too quickly.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Failed with result 'start-limit-hit'.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: Failed to start rpm-ostree System Management Daemon.
Jul 07 15:05:33 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Consumed 98ms CPU time
Jul 07 15:05:41 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:05:51 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:01 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:11 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:21 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:31 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:32 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:32 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Got notification message from PID 2631, but reception only permitted for main PID which is currently not known
Jul 07 15:06:33 ip-10-0-63-89 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 15:06:33 ip-10-0-63-89 rpm-ostree[5215]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 15:06:33 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 60 seconds
Jul 07 15:06:33 ip-10-0-63-89 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 15:06:33 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.149 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:06:33 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.149 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:06:33 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 63 seconds
Jul 07 15:07:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.150 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:07:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.150 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:07:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 62 seconds
Jul 07 15:08:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.151 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:08:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.151 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:08:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 61 seconds
Jul 07 15:09:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.156 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:09:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.156 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:09:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 63 seconds
Jul 07 15:10:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.161 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:10:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.161 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:10:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 64 seconds
Jul 07 15:11:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.162 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:11:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.162 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:11:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 60 seconds
Jul 07 15:12:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.163 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:12:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.163 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:12:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 61 seconds
Jul 07 15:13:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.164 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:13:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.164 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:13:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 60 seconds
Jul 07 15:14:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.165 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:14:34 ip-10-0-63-89 rpm-ostree[5215]: client(id:machine-config-operator dbus:1.165 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:14:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 60 seconds
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[5215]: In idle state; will auto-exit in 62 seconds
Jul 07 15:15:34 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Succeeded.
Jul 07 15:15:34 ip-10-0-63-89 systemd[1]: rpm-ostreed.service: Consumed 130ms CPU time
Jul 07 15:15:34 ip-10-0-63-89 systemd[1]: Starting rpm-ostree System Management Daemon...
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[11450]: Reading config file '/etc/rpm-ostreed.conf'
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 64 seconds
Jul 07 15:15:34 ip-10-0-63-89 systemd[1]: Started rpm-ostree System Management Daemon.
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.167 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.167 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:15:34 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 61 seconds
Jul 07 15:16:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.168 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:16:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.168 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:16:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 63 seconds
Jul 07 15:17:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.169 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:17:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.169 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:17:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 61 seconds
Jul 07 15:18:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.170 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:18:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.170 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:18:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 61 seconds
Jul 07 15:19:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.171 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:19:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.171 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:19:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 62 seconds
Jul 07 15:20:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.172 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:20:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.172 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:20:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 63 seconds
Jul 07 15:21:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.173 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:21:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.173 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:21:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 63 seconds
Jul 07 15:22:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.174 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:22:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.174 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:22:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 64 seconds
Jul 07 15:23:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.175 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:23:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.175 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:23:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 63 seconds
Jul 07 15:24:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.176 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) added; new total=1
Jul 07 15:24:35 ip-10-0-63-89 rpm-ostree[11450]: client(id:machine-config-operator dbus:1.176 unit:crio-70a105f9061333d7098020fd9a27314dcbcdc2df057fb2da4fd5862fedd67ac5.scope uid:0) vanished; remaining=0
Jul 07 15:24:35 ip-10-0-63-89 rpm-ostree[11450]: In idle state; will auto-exit in 63 seconds

Comment 2 Colin Walters 2022-07-18 21:34:33 UTC
> but this subsequent error doesn't overwrite the original issue and as such we are left in the old state reporting the wrong error.

Seems like an MCO bug here.

The underlying triggering problem though is https://bugzilla.redhat.com/show_bug.cgi?id=2108320

Comment 3 Yu Qi Zhang 2022-07-25 19:19:38 UTC
Ok, renaming this to track the other issue. The issue here being that even if you fixed an error, if another error happens, the MCD won't make progress, and ALSO not report the new error up the stack, only in the MCD logs, causing confusion often.

Comment 6 Sergio 2023-02-08 13:47:40 UTC
Verified using IPI on AWS version:

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-02-07-064924   True        False         178m    Cluster version is 4.13.0-0.nightly-2023-02-07-064924


Steps:

1. Deploy 2 MCs to create 2 test files in the nodes:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config
spec:
  config:
    ignition:
      security:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        mode: 420
        path: /etc/test-file.test


and

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-2
spec:
  config:
    ignition:
      security:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        mode: 420
        path: /etc/test-file-2.test




2. Edit file 1 and check the degraded reason in the MCP

  - lastTransitionTime: "2023-02-08T13:09:54Z"
    message: 'Node ip-10-0-130-16.us-east-2.compute.internal is reporting: "content
      mismatch for file \"/etc/test-file.test\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

3. Fix the file 1:

  - lastTransitionTime: "2023-02-08T13:18:09Z"
    message: ""
    reason: ""
    status: "False"
    type: NodeDegraded

4. Edit file 2 and check the degraded reason  in the MCP:

  - lastTransitionTime: "2023-02-08T13:19:18Z"
    message: 'Node ip-10-0-130-16.us-east-2.compute.internal is reporting: "content
      mismatch for file \"/etc/test-file-2.test\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

5. Edit file one and check the degraded reason  in the MCP:

  - lastTransitionTime: "2023-02-08T13:19:18Z"
    message: 'Node ip-10-0-130-16.us-east-2.compute.internal is reporting: "content
      mismatch for file \"/etc/test-file.test\""'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded



We move the status to VERIFIED.

Comment 9 errata-xmlrpc 2023-05-17 22:46:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326


Note You need to log in before you can comment on or make changes to this bug.