2102004 – 4.10 to 4.11 update: Degraded node: unexpected on-disk state: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0

Bug 2102004 - 4.10 to 4.11 update: Degraded node: unexpected on-disk state: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0

Summary: 4.10 to 4.11 update: Degraded node: unexpected on-disk state: mode mismatch f...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	MCO Bug Bot
QA Contact:	Rio Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2104561
TreeView+	depends on / blocked

Reported:	2022-06-29 05:12 UTC by W. Trevor King
Modified:	2023-01-17 19:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-17 19:50:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 3226	0	None	open	Bug 2102004: pkg/controller/common/helpers: Explicitly set mode 0644	2022-07-05 23:03:21 UTC
Red Hat Product Errata	RHSA-2022:7399	0	None	None	None	2023-01-17 19:51:16 UTC

Description W. Trevor King 2022-06-29 05:12:10 UTC

Seen in a 4.10.18 to 4.11.0-fc.3 update, the 'worker' pool wasn't completing its update:

  $ oc --as system:admin get machineconfigpools
  NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
  master   rendered-master-12902db670ee8e73aa13a8df2782ddc6   True      False      False      3              3                   3                     0                      2y38d
  worker   rendered-worker-4504e2ba786e399f01bb8e68c54cc177   False     True       True       24             22                  23                    1                      2y38d

Checking on node state:

  $ oc --as system:admin get -o json nodes | jq -r '.items[].metadata.annotations["machineconfiguration.openshift.io/state"]' | sort | uniq -c
        1 Degraded
       29 Done

Getting the name of the degraded node:

  $ oc --as system:admin get -o json nodes | jq -r '.items[].metadata | select(.annotations["machineconfiguration.openshift.io/state"] == "Degraded").name'
  build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal

Reason for the degradation:

  $ oc --as system:admin get -o json node build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal | jq -r '.metadata.annotations["machineconfiguration.openshift.io/reason"]'
  unexpected on-disk state validating against rendered-worker-44f2c74623e4d3bbe9557a9e82102c01: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"; expected: -rw-r--r--/420/0644; received: ----------/0/0

Hopping onto that node to confirm and adjust:

  $ oc --as system:admin debug node/build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal
  Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
  Starting pod/build0-gstfj-w-b-infra-wlvr9copenshift-ci-build-farminternal-debug 
  ...
  To use host binaries, run `chroot /host`
  Pod IP: 10.0.32.90
  If you don't see a command prompt, try pressing enter.
  sh-4.4# ls -lh /host/etc/crio/crio.conf.d                        
  total 12K
  -rw-r--r--. 1 root root 2.7K Jun 28 22:38 00-default
  ----------. 1 root root   46 Jun 28 22:38 01-ctrcfg-pidsLimit
  -rw-r--r--. 1 root root   67 Jun 28 22:38 01-mc-seccompUseDefault
  sh-4.4# file /host/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  /host/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit: ASCII text
  sh-4.4# cat /host/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  [crio]
    [crio.runtime]
      pids_limit = 8096
  sh-4.4# chmod 0644 /host/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  sh-4.4# ls -lh /host/etc/crio/crio.conf.d
  total 12K
  -rw-r--r--. 1 root root 2.7K Jun 28 22:38 00-default
  -rw-r--r--. 1 root root   46 Jun 28 22:38 01-ctrcfg-pidsLimit
  -rw-r--r--. 1 root root   67 Jun 28 22:38 01-mc-seccompUseDefault
  sh-4.4# exit

  Removing debug pod ...

That unstuck the update:

  $ oc --as system:admin get machineconfigpools
  NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
  master   rendered-master-12902db670ee8e73aa13a8df2782ddc6   True      False      False      3              3                   3                     0                      2y38d
  worker   rendered-worker-44f2c74623e4d3bbe9557a9e82102c01   True      False      False      32             32                  32                    0                      2y38d
  $ oc --as system:admin adm upgrade
  Cluster version is 4.11.0-fc.3
  ...

Not clear what happened to corrupt the mode.  Checking node logs, entries for that file look like its siblings:

  $ oc --as system:admin adm node-logs build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal | gzip >node.logs.gz
  $ zgrep -2 01-ctrcfg-pidsLimit node.logs.gz | sed 's/^/  /'
  Jun 28 22:38:52.241334 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Reloading.
  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(21): [finished] writing file "/etc/kubernetes/kubelet.conf"
  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(22): [started]  writing file "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(22): [finished] writing file "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(23): [started]  writing file "/etc/crio/crio.conf.d/01-mc-seccompUseDefault"
  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(23): [finished] writing file "/etc/crio/crio.conf.d/01-mc-seccompUseDefault"
  --
  Jun 28 22:38:56.543634 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Reloading.
  Jun 28 22:38:56.543646 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(21): [finished] writing file "/etc/kubernetes/kubelet.conf"
  Jun 28 22:38:56.543658 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(22): [started]  writing file "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
  Jun 28 22:38:56.543670 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(22): [finished] writing file "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
  Jun 28 22:38:56.543681 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(23): [started]  writing file "/etc/crio/crio.conf.d/01-mc-seccompUseDefault"
  Jun 28 22:38:56.543693 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(23): [finished] writing file "/etc/crio/crio.conf.d/01-mc-seccompUseDefault"
  --
  Jun 28 22:38:58.520605 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal restorecon[1607]: Relabeled /etc/crio/crio.conf.d from system_u:object_r:unlabeled_t:s0 to system_u:object_r:container_config_t:s0
  Jun 28 22:38:58.520605 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal restorecon[1607]: Relabeled /etc/crio/crio.conf.d/00-default from system_u:object_r:unlabeled_t:s0 to system_u:object_r:container_config_t:s0
  Jun 28 22:38:58.520605 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal restorecon[1607]: Relabeled /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit from system_u:object_r:unlabeled_t:s0 to system_u:object_r:container_config_t:s0
  Jun 28 22:38:58.520605 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal restorecon[1607]: Relabeled /etc/crio/crio.conf.d/01-mc-seccompUseDefault from system_u:object_r:unlabeled_t:s0 to system_u:object_r:container_config_t:s0
  Jun 28 22:38:58.520605 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal restorecon[1607]: Relabeled /etc/containers/policy.json from system_u:object_r:unlabeled_t:s0 to system_u:object_r:etc_t:s0

Comment 2 W. Trevor King 2022-06-29 05:39:35 UTC

Hmm, this is suspicious:

  $ zgrep -A1 'process 1999' node.logs.gz
  Jun 28 22:41:15.894942 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Found left-over process 1999 (rpm-ostree) in control group while starting unit. Ignoring.
  Jun 28 22:41:15.894958 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

What's the deal with 1999?

  $ zgrep 'rpm-ostree' node.logs.gz | grep -A1 '[[]1999]' | tail -n6
  Jun 28 22:40:56.695627 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal rpm-ostree[1999]: client(id:machine-config-operator dbus:1.48 unit:crio-e942a63a45ecbd88571cc4f6803a571d6a66367aece0d1af5d07014a5dfe9d2d.scope uid:0) vanished; remaining=0
  Jun 28 22:40:56.695663 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal rpm-ostree[1999]: In idle state; will auto-exit in 63 seconds
  Jun 28 22:40:57.807054 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal rpm-ostree[1999]: client(id:machine-config-operator dbus:1.55 unit:crio-e942a63a45ecbd88571cc4f6803a571d6a66367aece0d1af5d07014a5dfe9d2d.scope uid:0) added; new total=1
  Jun 28 22:40:57.817972 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal rpm-ostree[1999]: client(id:machine-config-operator dbus:1.55 unit:crio-e942a63a45ecbd88571cc4f6803a571d6a66367aece0d1af5d07014a5dfe9d2d.scope uid:0) vanished; remaining=0
  Jun 28 22:40:57.817995 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal rpm-ostree[1999]: In idle state; will auto-exit in 62 seconds
  Jun 28 22:40:59.790481 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Start request repeated too quickly.

I'm not sure what "Start request repeated too quickly" means, but perhaps this is part of the wonky mode...

Comment 3 W. Trevor King 2022-06-29 05:40:30 UTC

More PID 1999 sadness:

$ zgrep 'rpm-ostree' node.logs.gz | grep -A1 ' 1999'
Jun 28 22:41:08.101919 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID which is currently not known
Jun 28 22:41:15.894942 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Found left-over process 1999 (rpm-ostree) in control group while starting unit. Ignoring.
Jun 28 22:41:15.896417 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Starting rpm-ostree System Management Daemon...
--
Jun 28 22:41:18.109270 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:41:28.109373 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:41:38.109519 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:41:48.101491 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:41:58.108844 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:42:00.101055 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:42:00.101121 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: rpm-ostreed.service: Got notification message from PID 1999, but reception only permitted for main PID 4990
Jun 28 22:42:00.101714 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Started rpm-ostree System Management Daemon.

Comment 4 Yu Qi Zhang 2022-06-29 14:51:39 UTC

Interesting, I don't think we've seen this bug before. A few notes:

1. it seems that this same config on the other nodes didn't run into this problem, as indicated by the fact that most other nodes were successful
2. unexpected on-disk state validating against rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 seems like its validating against the NEW config post-reboot, meaning that it was the MCD who wrote the file, not ignition, is that correct? Or is that an old config that the node booted with and never updated?
3. that file is managed via the MCO via containerruntimeconfigcontroller, and it looks like there is no mode specified, which means normally it would have been 420 like all the other files

I don't think rpm-ostreed is involved directly in this process but that would have to be confirmed with the RHCOS team. I don't see anything that jumps out as "this is MCO's doing" at the very moment, but also not sure how we could have gotten here

Comment 5 W. Trevor King 2022-06-29 19:46:08 UTC

Same timestamps on:

  ----------. 1 root root   46 Jun 28 22:38 01-ctrcfg-pidsLimit

and:

  Jun 28 22:38:52.936870 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : files: createFilesystemsFiles: createFiles: op(22): [finished] writing file "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"

suggests this is an RHCOS thing to me.  But yeah, I didn't see anything that makes me completely sure.

Comment 6 Yu Qi Zhang 2022-06-29 21:07:13 UTC

Did the MCD at any point attempt to perform an update on the files of that node? Or was this caught before the update happened? It sounds like the latter based on your most recently comment but I am not certain

Comment 7 W. Trevor King 2022-06-29 21:30:00 UTC

In this cluster today:

$ oc --as system:admin get -l node-role.kubernetes.io/infra= -o json nodes | jq -r '.items[].metadata | .annotations["machineconfiguration.openshift.io/currentConfig"] + " " + .annotations["machineconfiguration.openshift.io/desiredConfig"] + " " + .annotations["machineconfiguration.openshift.io/state"] + " " + .name' | sort
rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 Done build0-gstfj-w-b-infra-5ck6v.c.openshift-ci-build-farm.internal
rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 Done build0-gstfj-w-b-infra-l69dd.c.openshift-ci-build-farm.internal
rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 Done build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal

And that 44f... MachineConfig was given as the worker target in comment 0's MachineConfigPool list and in the 'unexpected on-disk state validating against rendered-worker-44f2c74623e4d3bbe9557a9e82102c01' message.  The MachineConfigPool list also included the single degraded node, and 23 of 24 nodes in the 'worker' pool were updated.  I think that means that this stick happened after the node was asked to update.  But to get at pivot timing, using the gathered node logs:

$ zgrep 44f2c74623e4d3bbe9557a9e82102c01 node.logs.gz 
Jun 28 22:39:13.624843 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal machine-config-daemon[2302]: I0628 22:39:13.624799    2302 update.go:513] Checking Reconcilable for config mco-empty-mc to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
Jun 28 22:39:13.625826 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal machine-config-daemon[2302]: I0628 22:39:13.625786    2302 update.go:1977] Starting update from mco-empty-mc to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false kernelType:false extensions:false}
Jun 28 22:39:13.632052 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal root[2342]: machine-config-daemon[2302]: Starting update from mco-empty-mc to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false kernelType:false extensions:false}
Jun 28 22:40:08.564733 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal logger[2454]: rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
Jun 28 22:40:08.569757 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal machine-config-daemon[2302]: I0628 22:40:08.569726    2302 update.go:1977] initiating reboot: Completing firstboot provisioning to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
Jun 28 22:40:08.572129 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal root[2456]: machine-config-daemon[2302]: initiating reboot: Completing firstboot provisioning to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
Jun 28 22:40:08.585843 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Started machine-config-daemon: Completing firstboot provisioning to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01.
Jun 28 22:40:08.620927 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Stopping machine-config-daemon: Completing firstboot provisioning to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01...
Jun 28 22:40:08.732459 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal systemd[1]: Stopped machine-config-daemon: Completing firstboot provisioning to rendered-worker-44f2c74623e4d3bbe9557a9e82102c01.
Jun 29 04:57:45.156007 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal logger[322343]: rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
Jun 29 04:57:55.188890 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal root[322477]: machine-config-daemon[1962]: Update completed for config rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 and node has been successfully uncordoned

"Starting update from mco-empty-mc" is a bit weird.  Ah, it was a new node being born:

$ oc --as system:admin get -o json node build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal | jq -r .metadata.creationTimestamp
2022-06-28T22:40:51Z

Which explains why Ignition was so active.

$ zgrep 'ignition.*https' node.logs.gz
Jun 28 22:38:48.766270 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[911]: GET https://api-int.build02.gcp.ci.openshift.org:22623/config/worker: attempt #1
Jun 28 22:38:48.884632 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[911]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: 935bd4c87f722517dbe23c1fd4310949bf418aae5b389dffbc746cfd11fcb630724cb6bc71f2385e7886c7dc00b8b239651740670bce704f321a8533e1f07845

I dunno how to map that hash to the rendered MachineConfigs...  It doesn't seem to match what's currently being rendered:

$ oc --as system:admin debug --no-tty -q node/build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal -- curl -s https://api-int.build02.gcp.ci.openshift.org:22623/config/worker | sha512sum
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e  -

Comment 8 W. Trevor King 2022-06-29 21:41:27 UTC

The other two infra machines are quite a bit older:

$ for NODE in $(oc --as system:admin get -l node-role.kubernetes.io/infra= -o jsonpath='{.items[*].metadata.name}' nodes); do echo "${NODE}"; oc --as system:admin get -o jsonpath='{.metadata.creationTimestamp}{"\n"}' node "${NODE}"; oc --as system:admin adm node-logs "${NODE}" | grep 'ignition.*fetched referenced config'; done
build0-gstfj-w-b-infra-5ck6v.c.openshift-ci-build-farm.internal
2022-03-28T10:06:20Z
build0-gstfj-w-b-infra-l69dd.c.openshift-ci-build-farm.internal
2022-01-17T12:55:54Z
build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal
2022-06-28T22:40:51Z
Jun 28 22:38:48.884632 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[911]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: 935bd4c87f722517dbe23c1fd4310949bf418aae5b389dffbc746cfd11fcb630724cb6bc71f2385e7886c7dc00b8b239651740670bce704f321a8533e1f07845

Broadening the net to include all 'worker' nodes:

$ for NODE in $(oc --as system:admin get -l node-role.kubernetes.io/worker= -o jsonpath='{.items[*].metadata.name}' nodes); do oc --as system:admin adm node-logs "${NODE}" | grep 'ignition.*fetched
 referenced config'; done | sort
Jun 28 22:38:48.884632 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[911]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: 935bd4c87f722517dbe23c1fd4310949bf418aae5b389dffbc746cfd11fcb630724cb6bc71f2385e7886c7dc00b8b239651740670bce704f321a8533e1f07845
Jun 28 22:43:01.138271 build0-gstfj-w-b-98mtn.c.openshift-ci-build-farm.internal ignition[852]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: d8bc4ab38515a0727a51f492d6c8ec71dd45b9ccd61e071a216fe1ad8bff3965af591cf04b7139edbc93d783851372ec8b110e0616f2afbc990eb400e2bfcbb4
Jun 29 00:03:48.153096 localhost ignition[847]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: d8bc4ab38515a0727a51f492d6c8ec71dd45b9ccd61e071a216fe1ad8bff3965af591cf04b7139edbc93d783851372ec8b110e0616f2afbc990eb400e2bfcbb4
...
Jun 29 21:20:45.979121 localhost ignition[852]: fetched referenced config at https://api-int.build02.gcp.ci.openshift.org:22623/config/worker with SHA512: d8bc4ab38515a0727a51f492d6c8ec71dd45b9ccd61e071a216fe1ad8bff3965af591cf04b7139edbc93d783851372ec8b110e0616f2afbc990eb400e2bfcbb4

So this node is the oldest one to remember its Ignition config, and right after this node, folks started seeing d8bc...  It's a bit weird that this node knew its hostname by the time it made that request, most of the other nodes are still using 'localhost' at this point.  Timestamps are also weird, since 21:20 is still a few hours into the future if we're talking UTC:

$ date --utc
Wed 29 Jun 2022 09:40:49 PM UTC

Comment 9 Yu Qi Zhang 2022-06-29 21:51:05 UTC

The first served ignition content is always preserved on the node, at /etc/mcs-machine-config-content.json

If you want to check contents, that's probably the best place. That file should never get updated again after the MCO takes over.

Could you find the MCD logs of the problematic node, as well as the rendered MC it should be on, as well as the original content in that file^

Comment 10 W. Trevor King 2022-06-29 22:18:27 UTC

$ oc --as system:admin debug --no-tty -q node/build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal -- cat /host/etc/mcs-machine-config-content.json >mcs-machine-config-content.json
$ sha512sum mcs-machine-config-content.json 
8afe3d5d2ad4568e362473051cdc7c220f60e0d7d221db678c2352af7d6d81289e4e40046a71908e96e03f15603f490b7fec8e86952bff1c1e4fa06f808ea304  mcs-machine-config-content.json

Doesn't match the 'SHA512: 935b...' from the node logs.  Oh, because this is the initial MachineConfig, not the initial Ignition content:

$ jq -r .metadata.name mcs-machine-config-content.json
rendered-worker-44f2c74623e4d3bbe9557a9e82102c01
$ jq '.spec.config.storage.files[] | select(.path == "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit")' mcs-machine-config-content.json
{
  "contents": {
    "compression": "",
    "source": "data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIHBpZHNfbGltaXQgPSA4MDk2Cg=="
  },
  "overwrite": true,
  "path": "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
}

There are some other files that include explicit mode requests:

$ jq -r '.spec.config.storage.files[].mode' mcs-machine-config-content.json | sort | uniq -c
      2 384
     23 420
      1 484
      5 493
      1 null

Hmm, maybe this file is the only one with 'null'?  From the still-live, in-cluster MachineConfig:

$ oc --as system:admin get -o json machineconfig rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 | jq '.spec.config.storage.files[] | select(.mode == null)'
{
  "contents": {
    "compression": "",
    "source": "data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIHBpZHNfbGltaXQgPSA4MDk2Cg=="
  },
  "overwrite": true,
  "path": "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
}

My guess is that Ignition treats that as 000, and the MCO wants to fall back to 420, and the MCO is mad that Ignition doesn't agree on the default.

Comment 11 W. Trevor King 2022-06-29 22:36:30 UTC

Auditing for 'null':

$ oc --as system:admin get -o json machineconfigs | jq -r '.items[] | .metadata as $m | .spec.config.storage| (.files // [])[] | select(.mode == null) | $m.creationTimestamp + " " + $m.name
 + " " + .path'
2020-10-20T21:25:54Z 99-worker-generated-containerruntime /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
2020-10-26T17:22:49Z 99-worker-generated-kubelet /etc/node-sizing-enabled.env
2020-10-26T17:22:49Z 99-worker-generated-kubelet /etc/kubernetes/kubelet.conf
2022-06-28T22:34:21Z rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit

$ oc --as system:admin get --show-managed-fields -o json machineconfig 99-worker-generated-containerruntime | jq '.metadata | {annotations, creationTimestamp, generation, managedFields: ([.
managedFields[] | {manager}])}'
{
  "annotations": {
    "machineconfiguration.openshift.io/generated-by-controller-version": "37b741601f9b7ff9b2e1870102cc2970b24e1835"
  },
  "creationTimestamp": "2020-10-20T21:25:54Z",
  "generation": 4,
  "managedFields": [
    {
      "manager": "machine-config-controller"
    }
  ]
}

Dropping into audit logs:

$ oc --as system:admin adm must-gather --dest-dir=audit-logs -- /usr/bin/gather_audit_logs
$ zgrep -h '"verb":"[^"]*".*"name":"99-worker-generated-containerruntime"' audit-logs/quay-io-*/audit_logs/kube-apiserver/*.gz | jq -r .verb | sort | uniq -c
      3 get

So I'm not fast enough to figure out when the last-touch happened.  Checking the covered range:

$ zgrep -h 'stageTimestamp' audit-logs/quay-io-*/audit_logs/kube-apiserver/*.gz | jq .stageTimestamp | sort | head -n2
"2022-06-29T16:29:59.025155Z"
"2022-06-29T16:29:59.026341Z"

which is indeed well after the 28th when this node was coming up.

Comment 12 W. Trevor King 2022-06-29 22:43:30 UTC

Digging into one of the other paths with a null, to understand why that's not making it into the rendered config:

$ oc --as system:admin get -o json machineconfigs | jq -r '.items[] | .metadata as $m | .spec.config.storage | (.files // [])[] | select(.path == "/etc/node-sizing-enabled.env") | $m.creationTimestamp + " " + $m.name + " " +
 (.mode | tostring)' | sort
2020-05-21T19:36:27Z 00-master 420
2020-05-21T19:36:27Z 00-worker 420
2020-10-26T17:22:49Z 99-worker-generated-kubelet null
2021-06-09T19:27:30Z rendered-master-7905996052725f8e4e7d8c39f4a4e89d 420
2021-06-09T19:27:30Z rendered-worker-ab6e05f3836865910b2d57f7ff099947 420
...so many other rendered-* MachineConfig, do we not prune?...
2022-06-28T22:34:21Z rendered-master-12902db670ee8e73aa13a8df2782ddc6 420
2022-06-28T22:34:21Z rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 420

Diffing:

$ get_node_sizing_enabled() {
>   NAME="${1}"
>   oc --as system:admin get -o json machineconfig "${NAME}" | jq -r '.spec.config.storage | (.files // [])[] | select(.path == "/etc/node-sizing-enabled.env")'
> }
$ diff -u <(get_node_sizing_enabled 00-worker) <(get_node_sizing_enabled rendered-worker-44f2c74623e4d3bbe9557a9e82102c01)
--- /dev/fd/63  2022-06-29 15:42:38.732172513 -0700
+++ /dev/fd/62  2022-06-29 15:42:38.734172513 -0700
@@ -1,6 +1,7 @@
 {
   "contents": {
-    "source": "data:,NODE_SIZING_ENABLED%3Dfalse%0ASYSTEM_RESERVED_MEMORY%3D1Gi%0ASYSTEM_RESERVED_CPU%3D500m"
+    "compression": "",
+    "source": "data:text/plain;charset=utf-8;base64,Tk9ERV9TSVpJTkdfRU5BQkxFRD1mYWxzZQpTWVNURU1fUkVTRVJWRURfTUVNT1JZPTNHaQpTWVNURU1fUkVTRVJWRURfQ1BVPTEwMDBtCg=="
   },
   "mode": 420,
   "overwrite": true,
$ diff -u <(get_node_sizing_enabled 99-worker-generated-kubelet) <(get_node_sizing_enabled rendered-worker-44f2c74623e4d3bbe9557a9e82102c01)
--- /dev/fd/63  2022-06-29 15:43:03.654170872 -0700
+++ /dev/fd/62  2022-06-29 15:43:03.656170872 -0700
@@ -3,6 +3,7 @@
     "compression": "",
     "source": "data:text/plain;charset=utf-8;base64,Tk9ERV9TSVpJTkdfRU5BQkxFRD1mYWxzZQpTWVNURU1fUkVTRVJWRURfTUVNT1JZPTNHaQpTWVNURU1fUkVTRVJWRURfQ1BVPTEwMDBtCg=="
   },
+  "mode": 420,
   "overwrite": true,
   "path": "/etc/node-sizing-enabled.env"
 }

Is there some kind of union going on?

Comment 13 W. Trevor King 2022-06-29 22:52:05 UTC

Fitting the suspicious 99-worker-generated-containerruntime into the cluster's update history:

$ (oc --as system:admin get -o json clusterversion version | jq -r '.status.history[] | .startedTime + " " + .completionTime + " " + .state + " " + .version'; echo '2020-10-20T21:25:54Z --- 99-worker-generated-containerruntime') | sort
2020-07-29T15:01:15Z 2020-07-29T15:35:42Z Completed 4.5.4
2020-08-27T17:41:55Z 2020-08-27T18:21:45Z Completed 4.5.6
2020-09-25T09:30:46Z 2020-09-25T10:17:32Z Completed 4.5.12
2020-10-20T17:53:50Z 2020-10-20T18:23:58Z Completed 4.5.14
2020-10-20T18:30:58Z 2020-10-21T15:18:00Z Completed 4.6.0-rc.4
2020-10-20T21:25:54Z --- 99-worker-generated-containerruntime
2020-10-23T14:01:15Z 2020-10-23T14:31:36Z Completed 4.6.1
2020-11-17T17:19:57Z 2020-11-17T19:04:40Z Completed 4.6.4
...
2022-06-06T12:43:55Z 2022-06-06T13:51:27Z Completed 4.10.17
2022-06-17T14:26:40Z 2022-06-17T15:01:31Z Completed 4.10.18
2022-06-28T14:12:13Z 2022-06-29T04:59:16Z Completed 4.11.0-fc.3

So possibly it is something the cluster picked up on that 4.6.0-rc.4, the MCO was fixed by GA, and we never worried about recovering the rc.4 behavior because clusters that touched it were tainted into not being supported?

Comment 14 Yu Qi Zhang 2022-06-29 22:54:53 UTC

>Hmm, maybe this file is the only one with 'null'?  From the still-live, in-cluster MachineConfig:

>$ oc --as system:admin get -o json machineconfig rendered-worker-44f2c74623e4d3bbe9557a9e82102c01 | jq '.spec.config.storage.files[] | select(.mode == null)'
>{
>  "contents": {
>    "compression": "",
>    "source": "data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIHBpZHNfbGltaXQgPSA4MDk2Cg=="
>  },
>  "overwrite": true,
>  "path": "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
>}

>My guess is that Ignition treats that as 000, and the MCO wants to fall back to 420, and the MCO is mad that Ignition doesn't agree on the default.

That is very odd we never ran into this issue in the past. I don't know what the default ignition would be here. I think we should check with the firstboot team.

> ...so many other rendered-* MachineConfig, do we not prune?...

Nope. There's a card somewhere for that

> Is there some kind of union going on?

Nope, assuming I read that correctly and its the same file, the latter overwrites completely.

> Fitting the suspicious 99-worker-generated-containerruntime into the cluster's update history:

The controller should be generating that MC every time the MCO updated. Does it have a generated by controller version? If so it's still managed by the MCO and continuously updated. Otherwise the MCO would have thrown an error.

If it doesn't exist, that's problematic

Comment 15 Yu Qi Zhang 2022-06-29 22:56:15 UTC

I think our first and best lead here is to figure out, if a file doesn't specify mode, what ignition's default behaviour is, and continue from there?

and secondly, why did other nodes not fail on this, and why we didn't see this issue until the current update.

Comment 16 W. Trevor King 2022-06-29 22:59:50 UTC

4.6 GAed with 4.6.1 [1].

The MCO at that point was:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.1-x86_64 | grep machine-config-operator
  machine-config-operator                        https://github.com/openshift/machine-config-operator                        6896f6bc491fde7e8314631652f70841c7f9f31d

By 4.6.3 it was:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.3-x86_64 | grep machine-config-operator
  machine-config-operator                        https://github.com/openshift/machine-config-operator                        4f29aba0ccd1d7c04f839f5c9ceaa3dd63a04960

And back at 4.6.0-rc.4 it was:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.0-rc.4-x86_64 | grep machine-config-operator
  machine-config-operator                        https://github.com/openshift/machine-config-operator                        48d52f385642cbecf5c95e0ac4b0ec8c37664fe7

Checking the history:

$ git --no-pager log --oneline --first-parent 48d52f385642c..4f29aba0ccd
4f29aba0c Merge pull request #2164 from vrutkovs/4.6-ovs_installed_workaround
6896f6bc4 Merge pull request #2168 from cgwalters/encap-fix-version-4.6

But no hits in either for '[Mm]ode'.

Your comment 15 follow-ups make sense to me.

[1]: https://github.com/openshift/cincinnati-graph-data/blob/4c92f3ac16b161cf42fdae5f538f93b1741f761d/internal-channels/stable.yaml#L44

Comment 17 W. Trevor King 2022-06-29 23:03:58 UTC

$ git clone --depth 1 --branch main https://github.com/coreos/ignition.git
$ cd ignition
$ git --no-pager log --oneline -1
7652b83 (grafted, HEAD -> main, origin/main, origin/HEAD) Merge pull request #1409 from bgilbert/notes
$ git --no-pager grep '_mode_.*file' docs
docs/configuration-v3_0.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420). Setuid/setgid/sticky bits are not supported. If not specified, the permission mode for files defaults to 0644 or the existing file's permissions if `overwrite` is false, `contents.source` is unspecified, and a file already exists at the path.
docs/configuration-v3_1.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420). Setuid/setgid/sticky bits are not supported. If not specified, the permission mode for files defaults to 0644 or the existing file's permissions if `overwrite` is false, `contents.source` is unspecified, and a file already exists at the path.
docs/configuration-v3_2.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420). Setuid/setgid/sticky bits are not supported. If not specified, the permission mode for files defaults to 0644 or the existing file's permissions if `overwrite` is false, `contents.source` is unspecified, and a file already exists at the path.
docs/configuration-v3_3.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420). Setuid/setgid/sticky bits are not supported. If not specified, the permission mode for files defaults to 0644 or the existing file's permissions if `overwrite` is false, `contents.source` is unspecified, and a file already exists at the path.
docs/configuration-v3_4_experimental.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420). Setuid/setgid/sticky bits are supported. If not specified, the permission mode for files defaults to 0644 or the existing file's permissions if `overwrite` is false, `contents.source` is unspecified, and a file already exists at the path.

So consistently 420 in v3.  Not sure if I need to worry about v2?

Comment 18 W. Trevor King 2022-06-29 23:05:59 UTC

Here's v2 just in case, always nominally 420 there too:

$ git clone --depth 1 --branch spec2x https://github.com/coreos/ignition.git
$ cd ignition
$ git --no-pager log --oneline -1
4db553d (grafted, HEAD -> spec2x, origin/spec2x) Merge pull request #1067 from bgilbert/spec2y
$ git --no-pager grep '_mode_.*file' doc
doc/configuration-v2_0.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).
doc/configuration-v2_1.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).
doc/configuration-v2_2.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).
doc/configuration-v2_3.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).
doc/configuration-v2_4.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).
doc/configuration-v2_5-experimental.md:    * **_mode_** (integer): the file's permission mode. Note that the mode must be properly specified as a **decimal** value (i.e. 0644 -> 420).

Comment 19 W. Trevor King 2022-06-29 23:07:10 UTC

oh, wait.  There is no "If not specified..." clause in the v2 spec...

Comment 20 W. Trevor King 2022-06-29 23:13:12 UTC

Back to our cluster:

$ oc --as system:admin get -o json machineconfigs | jq -r '.items[].spec.config.ignition.version' | sort | uniq -c
     33 2.2.0
     27 3.1.0
    136 3.2.0
$ oc --as system:admin get -o json machineconfigs | jq -r '.items[] | .spec.config.ignition.version + " " + .metadata.name' | grep -v rendered-
3.2.0 00-master
3.2.0 00-worker
3.2.0 01-master-container-runtime
3.2.0 01-master-kubelet
3.2.0 01-worker-container-runtime
3.2.0 01-worker-kubelet
3.2.0 99-master-generated-crio-seccomp-use-default
3.2.0 99-master-generated-registries
2.2.0 99-master-ssh
3.2.0 99-worker-generated-containerruntime
3.2.0 99-worker-generated-crio-seccomp-use-default
3.2.0 99-worker-generated-kubelet
3.2.0 99-worker-generated-registries
2.2.0 99-worker-ssh

So the 99-worker-generated-containerruntime that seems to be introducing the problem is 3.2.0, where the null mode has (nominally) an explicit default.

Comment 21 W. Trevor King 2022-06-29 23:28:56 UTC

And back to our node logs for the Ignition process that wrote the file at 22:38:52.936870:

$ zgrep 'ignition[[]1210]' node.logs.gz | head -n1
Jun 28 22:38:51.655160 build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal ignition[1210]: INFO     : Ignition 0.35.0

Looks like that's 2020-01-21 code [1], so the Ignition service is probably down-converting on the fly.

[1]: https://github.com/coreos/ignition/tree/v0.35.0

Comment 22 W. Trevor King 2022-06-29 23:35:11 UTC

heh, here's where comment 7's cf83... hash comes from:

$ sha512sum </dev/null 
cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e  -

Because:

$ oc --as system:admin debug -q node/build0-gstfj-w-b-infra-wlvr9.c.openshift-ci-build-farm.internal -- curl https://api-int.build02.gcp.ci.openshift.org:22623/config/worker
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to api-int.build02.gcp.ci.openshift.org port 22623: Connection refused

I'll go read up about how access to the Ignition service is supposed to work.

Comment 23 W. Trevor King 2022-06-29 23:38:43 UTC

$ oc --as system:admin -n openshift-machine-api get -o json machines | jq -r '.items[].spec.providerSpec.value.disks[].image' | sort | uniq -c
      6 build0-gstfj-rhcos-image
     39 projects/rhcos-cloud/global/images/rhcos-46-82-202011260640-0-gcp-x86-64

Hmmm...

Comment 24 W. Trevor King 2022-06-29 23:40:46 UTC

That's from [1].  I'll go talk to the test-platform folks to get some context on the custom boot image.

[1]: https://github.com/openshift/release/blame/7e61829f682e1574513f78c3e94537836d824ab3/clusters/build-clusters/02_cluster/openshift-machine-api/build0-gstfj-w-b-infra_machineset.yaml#L38

Comment 25 W. Trevor King 2022-06-30 21:55:12 UTC

Ok, [1] has fixed that MachineSet:

  $ oc --as system:admin -n openshift-machine-api get -o json machinesets | jq -r '.items[].spec.template.spec.providerSpec.value.disks[].image' | sort | uniq -c
       10 projects/rhcos-cloud/global/images/rhcos-46-82-202011260640-0-gcp-x86-64

And checking the Ignition servers now, there's no sign of older clients:

  $ for POD in $(oc --as system:admin -n openshift-machine-config-operator get -o jsonpath="{.items[*].metadata.name}" -l k8s-app=machine-config-server pods); do oc --as system:admin -n openshift-machine-config-operator logs "${POD}" | grep -o 'User-Agent.*'; done | sort | uniq -c
      866 User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"

But these logs only go back so far:

  $ for POD in $(oc --as system:admin -n openshift-machine-config-operator get -o jsonpath="{.items[*].metadata.name}" -l k8s-app=machine-config-server pods); do oc --as system:admin -n openshift-machine-config-operator logs "${POD}" | grep 'User-Agent.*' | head -n1; done
  I0628 23:05:42.511656       1 api.go:109] Pool worker requested by address:"10.0.32.9:51888" User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"
  I0628 23:08:20.013154       1 api.go:109] Pool worker requested by address:"10.0.32.43:59892" User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"
  I0628 22:43:00.883926       1 api.go:109] Pool worker requested by address:"10.0.32.124:54844" User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"

so I don't have the entries from the Jun 28 22:38 UTC Ignition that triggered this bug.

[1]: https://github.com/openshift/release/pull/30001

Comment 26 W. Trevor King 2022-06-30 22:07:24 UTC

Setting up a test MachineSet to reproduce the break:

  $ curl -s https://raw.githubusercontent.com/openshift/release/master/clusters/build-clusters/02_cluster/openshift-machine-api/build0-gstfj-w-b-infra_machineset.yaml | sed 's/\(name: build0-\).*/\1test/;s/image: .*/image: build0-gstfj-rhcos-image/' >machineset.yaml
  $ oc --as system:admin apply -f machineset.yaml
  $ oc --as system:admin -n openshift-machine-api get machineset build0-test
  NAME          DESIRED   CURRENT   READY   AVAILABLE   AGE
  build0-test   1         1                             71s

And there's that ancient client:

  $ for POD in $(oc --as system:admin -n openshift-machine-config-operator get -o jsonpath="{.items[*].metadata.name}" -l k8s-app=machine-config-server pods); do oc --as system:admin -n openshift-machine-config-operator logs "${POD}" | grep -o 'User-Agent.*'; done | sort | uniq -c
        1 User-Agent:"Ignition/0.35.0" Accept-Header: "application/vnd.coreos.ignition+json; version=2.2.0, application/vnd.coreos.ignition+json; version=1; q=0.5, */*; q=0.1"
      868 User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"

Machine comes up fine:

  $ oc --as system:admin -n openshift-machine-api get machines | grep build0-test-
  build0-test-k9pcq                          Running    n1-standard-16       us-east1   us-east1-b   3m31s

Node is ready:

  $ oc --as system:admin -n openshift-machine-api get nodes | grep build0-test
  build0-test-k9pcq.c.openshift-ci-build-farm.internal              Ready                      infra,worker   101s    v1.24.0+284d62a

And the mode is 0:

  $ oc --as system:admin debug -q node/build0-test-k9pcq.c.openshift-ci-build-farm.internal -- ls -l /host/etc/crio/crio.conf.d
  Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
  total 12
  -rw-r--r--. 1 root root 2672 Jun 30 21:59 00-default
  ----------. 1 root root   46 Jun 30 21:59 01-ctrcfg-pidsLimit
  -rw-r--r--. 1 root root   67 Jun 30 21:59 01-mc-seccompUseDefault

Deleting the test MachineSet:

  $ oc --as system:admin -n openshift-machine-api delete machineset build0-test

So it's definitely an unset Ignition version=2.2.0 mode vs. this old Ignition/0.35.0 client.

Still unclear to me why the mode is unset in the three 99-worker-generated-* MachineConfig that comment 11 was poking at.

Comment 27 W. Trevor King 2022-06-30 23:13:29 UTC

Testing with cluster bot 'launch 4.11.0-rc.0':

$ oc adm upgrade
Cluster version is 4.11.0-rc.0
$ cat crc.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: set-pids-limit
spec:
  containerRuntimeConfig:
    pidsLimit: 8096
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
$ oc apply -f crc.yaml
$ oc get -o json machineconfigs | jq -r '.items[] | .metadata as $m | .spec.config.storage| (.files // [])[] | select(.mode == null) | $m.creationTimestamp + " " + $m.name
 + " " + .path'
2022-06-30T23:10:08Z 99-worker-generated-containerruntime /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
2022-06-30T23:10:13Z rendered-worker-0f5cf0eea59f3e588caf37269a3ec44a /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit

So I dunno where in the code things are getting lost, but I would consider any changes that avoid null modes in ContainerRuntimeConfig-to-MachineConfig conversion (and possibly also KubeletConfig-to-MachineConfig conversion [1]?) as sufficient to close the bug. 

[1]: https://docs.openshift.com/container-platform/4.10/post_installation_configuration/machine-configuration-tasks.html#configuring-machines-with-custom-resources

Comment 28 W. Trevor King 2022-06-30 23:36:54 UTC

And auditing our minor releases for exposure:

* 4.5... [4.5]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-aws-serial/1537875703121317888/artifacts/e2e-aws-serial/ipi-install-install/artifacts/log-bundle-20220617201122.tar | tar xz --strip-components=1
  $ grep -ro 'User-Agent.*Ignition.*'
  ...no hits, so not sure what 4.5 was running...

* 4.6.59 [4.6]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-serial/1534496288538628096/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-server-xcf27_machine-config-server.log | grep -o 'User-Agent.*' | sort | uniq -c
        4 User-Agent:"Ignition/2.6.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0, */*;q=0.1"

* 4.8.46 [4.8]

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-serial/1542186763156459520/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-server-2pclf_machine-config-server.log | grep -o 'User-Agent.*' | sort | uniq -c
        3 User-Agent:"Ignition/2.9.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.2.0, */*;q=0.1"

* 4.10.21 [4.10]

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial/1542174109675294720/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-server-8gth4_machine-config-server.log | grep -o 'User-Agent.*' | sort | uniq -c
        3 User-Agent:"Ignition/2.13.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.3.0, */*;q=0.1"

* 4.11.0-rc.0 [4.11]

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-serial/1542165038209437696/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-server-c85gf_machine-config-server.log | grep -o 'User-Agent.*' | sort | uniq -c
        3 User-Agent:"Ignition/2.13.0" Accept-Header: "application/vnd.coreos.ignition+json;version=3.3.0, */*;q=0.1"

So I'm not entirely clear on what's happening with 4.5 and earlier boot image Ignition clients, but 4.6 and later (at least by their current z-stream tips) are all looking for version=3.*, so they have the comment 17 defaulting and won't be bothered by the nulls.

[4.5]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#periodic-ci-openshift-release-master-nightly-4.5-e2e-aws-serial
  from https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#periodic-ci-openshift-release-master-nightly-4.5-e2e-aws-serial
[4.6]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.6-e2e-aws-serial/1534496288538628096
[4.8]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-serial/1542186763156459520
[4.10]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-serial/1542174109675294720
[4.11]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-serial/1542165038209437696

Comment 29 Yu Qi Zhang 2022-07-05 17:21:11 UTC

I think if we want to fundamentally fix this, it may require some fixes to 4.5 and earlier. Like Trevor mentions, I think we can apply some fixes for the new versions to make sure this doesn't happen, but this issue has (I think) always been around and should be relatively rare.

Could I ask why this is marked as a servicedeliveryblocker?

Comment 30 W. Trevor King 2022-07-05 22:04:51 UTC

I don't think it's worth dusting off the 4.5.z build pipelines to publish changes there.  But shipping something in 4.12, and possibly 4.11.z, that avoids the null mode in ContainerRuntimeConfig-to-MachineConfig conversion (and possibly also KubeletConfig-to-MachineConfig conversion, per comment 27), would protect folks with born-in-4.5-or-earlier clusters who are updating to 4.11.  I'm not actually clear on why these clusters did not have trouble on 4.10...

Comment 31 W. Trevor King 2022-07-05 22:40:14 UTC

Ok, back to cluster bot with 'launch 4.10.18', and otherwise following comment 27:

$ oc adm upgrade
Cluster version is 4.10.18
...
$ cat crc.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: set-pids-limit
spec:
  containerRuntimeConfig:
    pidsLimit: 8096
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
$ oc apply -f crc.yaml
$ oc get -o json machineconfigs | jq -r '.items[] | .metadata as $m | .spec.config.storage| (.files // [])[] | select(.mode == null) | $m.creationTimestamp + " " + $m.name + " " + .path'
...no hits...
$ oc get machineconfigs | grep containerruntime
99-worker-generated-containerruntime               7152adb176e7ff4bf6c1d3a2e7b0aae4fd2794b6   3.2.0             42s
$ oc get -o json machineconfig 99-worker-generated-containerruntime | jq '.spec.config.storage.files[]'
{
  "contents": {
    "source": "data:text/plain,%5Bcrio%5D%0A%20%20%5Bcrio.runtime%5D%0A%20%20%20%20pids_limit%20%3D%208096%0A"
  },
  "mode": 420,
  "overwrite": true,
  "path": "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
}


So that explains why this wasn't an issue for born-in-4.old clusters on 4.10.  I'm not clear on exactly what changed between 4.10 and 4.11, but the 4.11 FCs and RCs seem like the first to set the null mode for these generated MachineConfigs.

Comment 32 W. Trevor King 2022-07-05 22:45:21 UTC

Diffing 4.10 and 4.11, I suspect the regression (for folks with boot images that use v2 Ignition clients) came in with [1].

[1]: https://github.com/openshift/machine-config-operator/pull/3128/commits/da5184f3045bbcaa22e5acf04bea707eb714b062#diff-b2ef9db0d76c6145ec1bb5310134037bae64572ab47464d3223745c2a410d3e3L111

Comment 34 W. Trevor King 2022-07-06 00:05:56 UTC

Following comment 27 and comment 31 to get some pre-merge verification on mco#3226, I asked cluster bot for a cluster with 'launch 4.12.0-0.nightly-2022-07-05-191634,openshift/machine-config-operator#3226' [1].  And then:

  $ oc adm upgrade
  Cluster version is 0.0.1-0.test-2022-07-05-233035-ci-ln-imcnbfb-latest
  ...
  $ cat crc.yaml 
  apiVersion: machineconfiguration.openshift.io/v1
  kind: ContainerRuntimeConfig
  metadata:
    name: set-pids-limit
  spec:
    containerRuntimeConfig:
      pidsLimit: 8096
    machineConfigPoolSelector:
      matchLabels:
        pools.operator.machineconfiguration.openshift.io/worker: ""
  $ oc apply -f crc.yaml
  $ oc get -o json machineconfigs | jq -r '.items[] | .metadata as $m | .spec.config.storage| (.files // [])[] | select(.mode == null) | $m.creationTimestamp + " " + $m.name + " " + .path'
  ...no hits...
  $ oc get machineconfigs | grep containerruntime
  99-worker-generated-containerruntime               bc6a7da4fd747ab95d2ac9592c4682a32d7ff954   3.2.0             17s
  $ oc get -o json machineconfig 99-worker-generated-containerruntime | jq '.spec.config.storage.files[]'
  {
    "contents": {
      "compression": "",
      "source": "data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIHBpZHNfbGltaXQgPSA4MDk2Cg=="
    },
    "mode": 420,
    "overwrite": true,
    "path": "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit"
  }

So hooray, that looks great.

Hunting around for when the default boot images changed to Ignition v3, I've found [2], which is new in 4.6.

  $ git clone https://github.com/openshift/installer
  $ cd installer
  $ git log --oneline origin/release-4.6 | grep 'Migrate to ignition config spec v3'
  b9028eea48 *: Migrate to ignition config spec v3.1
  $ git log --oneline origin/release-4.5 | grep 'Migrate to ignition config spec v3'
  ...no hits...

My cluster bot cluster happens to be GCP, so grabbing the 4.5 boot image:

  $ git cat-file -p origin/release-4.5:data/data/rhcos-amd64.json | jq -r .gcp.image
  rhcos-45-82-202008010929-0-gcp-x86-64

So following comment 26 in an attempt to create a reproducing MachineSet:

  $ oc -n openshift-machine-api get machinesets
  NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
  ci-ln-imcnbfb-72292-ql4sr-worker-a   1         1         1       1           20m
  ci-ln-imcnbfb-72292-ql4sr-worker-b   1         1         1       1           20m
  ci-ln-imcnbfb-72292-ql4sr-worker-c   1         1         1       1           20m
  ci-ln-imcnbfb-72292-ql4sr-worker-f   0         0                             20m
  $ oc -n openshift-machine-api get -o yaml machineset ci-ln-imcnbfb-72292-ql4sr-worker-b >machineset.yaml
  $ emacs machineset.yaml  # remove status and other mutable stuff.  Change name and labels to old-image.  Slot in rhcos-45-82-202008010929-0-gcp-x86-64
  $ oc apply -f machineset.yaml

But no luck:

  $ oc -n openshift-machine-api get machines | grep old-image
  old-image-4m6h5                            Failed                                                  43s
  $ oc -n openshift-machine-api get -o json machine old-image-4m6h5 | jq '.status.providerStatus.conditions[]'
  {
    "lastTransitionTime": "2022-07-06T00:03:01Z",
    "message": "googleapi: Error 403: Required 'compute.images.useReadOnly' permission for 'projects/rhcos-cloud/global/images/rhcos-45-82-202008010929-0-gcp-x86-64', forbidden",
    "reason": "MachineCreationFailed",
    "status": "False",
    "type": "MachineCreated"
  }

So I'd need to dig deeper into GCP MachineSets to figure out how to request a MachineSet with that old 4.5 boot image.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1544461264191754240
[2]: https://github.com/openshift/installer/pull/3871

Comment 35 W. Trevor King 2022-07-06 00:07:33 UTC

I'm going to guess "high" severity so I can set a target release without the bots clearing it off, because I like support for these born-in-4.old clusters to be boring and stable, but feel free to adjust as you see fit.

Comment 36 Manuel Dewald 2022-07-06 07:03:14 UTC

This hit an internal cluster during an upgrade to 4.11 and blocked to upgrade, hence labeled it as ServiceDeliveryBlocker as there might be clusters from external customers suffering the same problem. I can't tell why it didn't happen during the upgrade to 4.10, probably something changed on cluster.

Comment 37 W. Trevor King 2022-07-06 08:15:37 UTC

Comment 32 links the code that regressed the behavior vs. Ignition v2 clients between 4.10 and 4.11, which is why folks didn't bump into this until updating to 4.11.

Comment 38 Yu Qi Zhang 2022-07-06 15:20:09 UTC

Thanks for the fix Trevor! I think we should backport this ASAP and mark as a blocker for 4.11.0, given this will probably be an issue for anyone specifying that containerruntimeconfig

Comment 41 Rio Liu 2022-07-08 18:42:56 UTC

verified on 4.12.0-0.nightly-2022-07-06-221008

1. install ocp 4.5.41 which has old boot image. ignition version is 2
2. create containerruntimeconfig to change pidsLimit

$ cat ctrcfg-pidlimit.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: set-pids-limit
spec:
  containerRuntimeConfig:
    pidsLimit: 8096
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""

3. upgrade cluster from 4.5.41 to 4.11.0-fc.3

$ oc get clusterversion -o yaml|yq -y '.items[].status.history'
- completionTime: '2022-07-08T15:23:33Z'
  image: quay.io/openshift-release-dev/ocp-release:4.11.0-fc.3-x86_64
  startedTime: '2022-07-08T14:06:27Z'
  state: Completed
  verified: false
  version: 4.11.0-fc.3
- completionTime: '2022-07-08T11:06:19Z'
  image: quay.io/openshift-release-dev/ocp-release:4.10.18-x86_64
  startedTime: '2022-07-08T09:50:57Z'
  state: Completed
  verified: false
  version: 4.10.18
- completionTime: '2022-07-08T09:41:12Z'
  image: quay.io/openshift-release-dev/ocp-release:4.9.42-x86_64
  startedTime: '2022-07-08T08:09:47Z'
  state: Completed
  verified: false
  version: 4.9.42
- completionTime: '2022-07-08T08:08:32Z'
  image: quay.io/openshift-release-dev/ocp-release:4.8.46-x86_64
  startedTime: '2022-07-08T06:45:39Z'
  state: Completed
  verified: false
  version: 4.8.46
- completionTime: '2022-07-08T06:37:00Z'
  image: quay.io/openshift-release-dev/ocp-release:4.7.54-x86_64
  startedTime: '2022-07-08T05:22:22Z'
  state: Completed
  verified: false
  version: 4.7.54
- completionTime: '2022-07-08T05:05:42Z'
  image: quay.io/openshift-release-dev/ocp-release:4.6.59-x86_64
  startedTime: '2022-07-08T04:03:14Z'
  state: Completed
  verified: false
  version: 4.6.59
- completionTime: '2022-07-08T03:44:39Z'
  image: quay.io/openshift-release-dev/ocp-release@sha256:c67fe644d1c06e6d7694e648a40199cb06e25e1c3cfd5cd4fdac87fd696d2297
  startedTime: '2022-07-08T03:17:19Z'
  state: Completed
  verified: false
  version: 4.5.41

4. scale up machineset to provision new node

$ oc scale --replicas=2 machineset rioliu-070802-sc6nb-worker-us-east-2a -n openshift-machine-api
machineset.machine.openshift.io/rioliu-070802-sc6nb-worker-us-east-2a scaled


5. new node is degraded due to mode mismatch issue

$ oc get node/ip-10-0-147-62.us-east-2.compute.internal -o yaml | yq -y '.metadata.annotations'
cloud.network.openshift.io/egress-ipconfig: '[{"interface":"eni-02ce464059c75b4fe","ifaddr":{"ipv4":"10.0.128.0/19"},"capacity":{"ipv4":9,"ipv6":10}}]'
csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-03a76df8c6f58451b"}'
machine.openshift.io/machine: openshift-machine-api/rioliu-070802-sc6nb-worker-us-east-2a-95jkc
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-103a240b977aedb4265c8501c0f23c1a
machineconfiguration.openshift.io/desiredConfig: rendered-worker-103a240b977aedb4265c8501c0f23c1a
machineconfiguration.openshift.io/reason: 'unexpected on-disk state validating against
  rendered-worker-103a240b977aedb4265c8501c0f23c1a: mode mismatch for file: "/etc/crio/crio.conf.d/01-ctrcfg-pidsLimit";
  expected: -rw-r--r--/420/0644; received: ----------/0/0'
machineconfiguration.openshift.io/state: Degraded
volumes.kubernetes.io/controller-managed-attach-detach: 'true'

$ oc debug node/ip-10-0-147-62.us-east-2.compute.internal -- chroot /host stat /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/ip-10-0-147-62us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
  File: /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  Size: 46        	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 176164947   Links: 1
Access: (0000/----------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:container_config_t:s0
Access: 2022-07-08 15:55:26.207531504 +0000
Modify: 2022-07-08 15:51:42.826000000 +0000
Change: 2022-07-08 15:54:19.140076330 +0000
 Birth: 2022-07-08 15:54:19.140076330 +0000

Removing debug pod ...

6. scale down machineset to remove issued node

$ oc scale --replicas=1 machineset rioliu-070802-sc6nb-worker-us-east-2a -n openshift-machine-api
machineset.machine.openshift.io/rioliu-070802-sc6nb-worker-us-east-2a scaled

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-166.us-east-2.compute.internal   Ready    worker   13h   v1.24.0+284d62a
ip-10-0-133-203.us-east-2.compute.internal   Ready    master   13h   v1.24.0+284d62a
ip-10-0-160-136.us-east-2.compute.internal   Ready    master   13h   v1.24.0+284d62a
ip-10-0-170-121.us-east-2.compute.internal   Ready    worker   13h   v1.24.0+284d62a
ip-10-0-202-25.us-east-2.compute.internal    Ready    master   13h   v1.24.0+284d62a
ip-10-0-204-123.us-east-2.compute.internal   Ready    worker   13h   v1.24.0+284d62a

7. upgrade cluster to 4.12.0-0.nightly-2022-07-06-221008

$ oc get clusterversion -o yaml|yq -y '.items[].status.history[0]'
completionTime: '2022-07-08T18:13:53Z'
image: registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-07-06-221008
startedTime: '2022-07-08T17:03:41Z'
state: Completed
verified: false
version: 4.12.0-0.nightly-2022-07-06-221008

8. scale up machineset to provision new node

$ oc scale --replicas=2 machineset rioliu-070802-sc6nb-worker-us-east-2a -n openshift-machine-api
machineset.machine.openshift.io/rioliu-070802-sc6nb-worker-us-east-2a scaled


9. check new node status, it is not degraded, no config drift found

$ oc get node ip-10-0-139-88.us-east-2.compute.internal -o yaml| yq -y '.metadata.annotations'
cloud.network.openshift.io/egress-ipconfig: '[{"interface":"eni-0da5c4a13816be886","ifaddr":{"ipv4":"10.0.128.0/19"},"capacity":{"ipv4":9,"ipv6":10}}]'
csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-0984649b2743f74ac"}'
machine.openshift.io/machine: openshift-machine-api/rioliu-070802-sc6nb-worker-us-east-2a-kkh4c
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-5b907435666b678ea836542b5fd330c2
machineconfiguration.openshift.io/desiredConfig: rendered-worker-5b907435666b678ea836542b5fd330c2
machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-5b907435666b678ea836542b5fd330c2
machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-5b907435666b678ea836542b5fd330c2
machineconfiguration.openshift.io/reason: ''
machineconfiguration.openshift.io/state: Done
volumes.kubernetes.io/controller-managed-attach-detach: 'true'

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-14580ffdad42a4a97812d4472fdb6272   True      False      False      3              3                   3                     0                      15h
worker   rendered-worker-5b907435666b678ea836542b5fd330c2   True      False      False      4              4                   4                     0                      15h

Comment 42 Rio Liu 2022-07-08 18:47:59 UTC

10. check file mode in mc and new provisioned node

$ oc get mc/99-worker-generated-containerruntime -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  annotations:
    machineconfiguration.openshift.io/generated-by-controller-version: 5d54e097506ec8bdd76f3969cc9f382ae1efea3c
  creationTimestamp: "2022-07-08T04:51:05Z"
  generation: 5
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-generated-containerruntime
  ownerReferences:
  - apiVersion: machineconfiguration.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: ContainerRuntimeConfig
    name: set-pids-limit
    uid: 5259a782-0251-4746-89ab-f5637bd3d2ce
  resourceVersion: "460887"
  uid: a314dc86-272b-4aa3-a989-172144b71e30
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          compression: ""
          source: data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIHBpZHNfbGltaXQgPSA4MDk2Cg==
        mode: 420
        overwrite: true
        path: /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  extensions: null
  fips: false
  kernelArguments: null
  kernelType: ""
  osImageURL: ""

$ oc debug node/ip-10-0-139-88.us-east-2.compute.internal -- chroot /host stat /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/ip-10-0-139-88us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
  File: /etc/crio/crio.conf.d/01-ctrcfg-pidsLimit
  Size: 46        	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 54569073    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:container_config_t:s0
Access: 2022-07-08 18:22:31.484067929 +0000
Modify: 2022-07-08 18:18:41.156000000 +0000
Change: 2022-07-08 18:21:20.230737254 +0000
 Birth: 2022-07-08 18:21:20.230737254 +0000

Removing debug pod ...

Comment 47 errata-xmlrpc 2023-01-17 19:50:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Note You need to log in before you can comment on or make changes to this bug.