Hide Forgot
Description of problem: I am trying to initiate the ISCSI initiator name on cluster nodes. I have following ignition snippet applied upon cluster installation: for role in master worker; do initiator_content=$(echo "$(/sbin/iscsi-iname)" | base64 -w0) cat > "${CLUSTER_DIR}/openshift/100-${role}-iscsi-initiator.yaml" << __EOF__ apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: 100-${role}-iscsi-initiator labels: machineconfiguration.openshift.io/role: ${role} spec: config: ignition: version: 3.1.0 storage: files: - path: /etc/iscsi/initiatorname.iscsi overwrite: true contents: source: data:text/plain;charset=utf-8;base64,${initiator_content} mode: 666 __EOF__ And I am getting following error when the machine-config is getting rolled out: Worker: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node cdi-night46-cmf82-worker-0-6gdhv is reporting: \"unexpected on-disk state validating against rendered-worker-a3bf384630c44ff1a1f3ac4ad6936b74: mode mismatch for file: \\\"/etc/iscsi/initiatorname.iscsi\\\"; expected: --w--wx-w-; received: --w--wx-w-\", Node cdi-night46-cmf82-worker-0-5wzss is reporting: \"unexpected on-disk state validating against rendered-worker-a3bf384630c44ff1a1f3ac4ad6936b74: mode mismatch for file: \\\"/etc/iscsi/initiatorname.iscsi\\\"; expected: --w--wx-w-; received: --w--wx-w-\", Node cdi-night46-cmf82-worker-0-mvmrg is reporting: \"unexpected on-disk state validating against rendered-worker-a3bf384630c44ff1a1f3ac4ad6936b74: mode mismatch for file: \\\"/etc/iscsi/initiatorname.iscsi\\\"; expected: --w--wx-w-; received: --w--wx-w-\"" I was also trying with 644 mode, but I am getting same error, just different expected & received strings. Version-Release number of selected component (if applicable): OCP-4.6.4 How reproducible: always Steps to Reproduce: 1. Add ignition config above 2. Start install process 3. Observe status of machine config. Actual results: mode mismatch for file: \\\"/etc/iscsi/initiatorname.iscsi\\\"; expected: --w--wx-w-; received: --w--wx-w-\"" Expected results: The iscsi initiator present on nodes. Additional info:
I found another way how to do that by snippet bellow. But the above case is still relevant. for role in master worker; do cat > "${CLUSTER_DIR}/openshift/100-${role}-iscsi-initiator.yaml" << __EOF__ apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: 100-${role}-iscsi-initiator labels: machineconfiguration.openshift.io/role: ${role} spec: config: ignition: version: 3.1.0 systemd: units: - contents: | [Unit] Description=Generate ISCSI Initiator name [Service] Type=oneshot RemainAfterExit=yes ExecStart=-/bin/sh -c 'echo "InitiatorName=\$(/sbin/iscsi-iname)" > /etc/iscsi/initiatorname.iscsi' [Install] WantedBy=multi-user.target enabled: true name: iscsi-initiator.service __EOF__
OK, a few things here: 1. The `mode: 666` bit is tricky. It's actually in decimal, not octal (see the documentation for `mode` in https://coreos.github.io/ignition/configuration-v3_1/). So e.g. for 666, you'd write 438. 2. The MachineConfig is shared by all objects within a pool (see https://github.com/openshift/machine-config-operator/issues/1720), so specifying an initiator name that way will cause all the machines in the pool to have the same name. 3. RHCOS already ships with a systemd service which generates an iSCSI name on firstboot: https://github.com/openshift/os/blob/master/overlay.d/05rhcos/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service So if you don't need deterministic names (and simply want to ensure that each node has a different name), then it should work fine not to provide anything. 4. That said, specifying the name via Ignition is fully supported. And even though specifying it via an MC is likely not what you want, it shouldn't have caused that error. That message looks suspicious to me: > expected: --w--wx-w-; received: --w--wx-w- The relevant code in the MCO is: > if fi.Mode() != mode { > return errors.Errorf("mode mismatch for file: %q; expected: %v; received: %v", filePath, mode, fi.Mode()) > } I wonder if it's differing in the higher mode bits but the %v is only printing the lower bits or something? I'm going to move this over to the MCO for digging into that part some more.
Jonathan Lebon thank you clarification. That is useful. I added that machine config, because initiator name is not generated on our nodes (rhcos-46.82.202010011740-0) from some reason. Do I need to enable some service, or something like that ?
(In reply to Lukas Bednar from comment #3) > Jonathan Lebon thank you clarification. That is useful. > > I added that machine config, because initiator name is not generated on our > nodes (rhcos-46.82.202010011740-0) from some reason. > Do I need to enable some service, or something like that ? That bug is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1901021 and should be fixed in the next 4.6 release. So let's make this bug strictly about the weird MCO mode mismatch behaviour.
> I wonder if it's differing in the higher mode bits but the %v is only printing the lower bits or something? Dug around a bit but yeah I think that daemon lines are not quite right? I think we are comparing upper & lower (from FileMode.Mode()) but are outputting the lower 9 in that error message with %v (which I think is also wrong).. but I'm also not sure if we should just be comparing the lower 9 via FileMode.Perm()(https://golang.org/pkg/os/#FileMode.Perm) to begin with since that seems to be what we care about?
So I believe this is bc of the octal vs decimal and the config should have received 438 not 666 https://github.com/openshift/machine-config-operator/blob/4b73d7536eb7da97f577c200a3c36b282b84f0d4/install/0000_80_machine-config-operator_01_machineconfig.crd.yaml#L177 I'll look into this a bit more to see if I can make the error message clearer.
I added a PR to fix that confusing message with explanation: - https://github.com/openshift/machine-config-operator/pull/2340
Verified on 4.6.0-0.nightly-2021-03-25-051145. Error message is clearer now. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-03-25-051145 True False 24m Cluster version is 4.6.0-0.nightly-2021-03-25-051145 $ cat << EOF > file.yaml > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: worker > name: test-file > spec: > config: > ignition: > version: 3.1.0 > storage: > files: > - contents: > source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK > mode: 0644 > path: /etc/test > EOF $ oc create -f file.yaml machineconfig.machineconfiguration.openshift.io/test-file created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 00-worker 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 01-master-container-runtime 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 01-master-kubelet 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 01-worker-container-runtime 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 01-worker-kubelet 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 99-master-generated-registries 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 99-master-ssh 3.1.0 60m 99-worker-generated-registries 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m 99-worker-ssh 3.1.0 60m rendered-master-3cc83eef4dd58994c72f9910f5315a79 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 2s rendered-worker-9893591f36e7bb103247655baf16c147 788eb73e87a3f4b51c5e7caceef10c394eb1cf92 3.1.0 49m test-file 3.1.0 7s $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-9893591f36e7bb103247655baf16c147 False True False 3 0 0 0 50m $ watch oc get mcp/worker $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-146-232.us-west-2.compute.internal Ready master 60m v1.19.0+263ee0d ip-10-0-148-209.us-west-2.compute.internal Ready worker 54m v1.19.0+263ee0d ip-10-0-181-176.us-west-2.compute.internal Ready worker 49m v1.19.0+263ee0d ip-10-0-185-77.us-west-2.compute.internal Ready master 60m v1.19.0+263ee0d ip-10-0-212-137.us-west-2.compute.internal Ready master 60m v1.19.0+263ee0d ip-10-0-222-59.us-west-2.compute.internal Ready worker 49m v1.19.0+263ee0d $ oc debug node/ip-10-0-181-176.us-west-2.compute.internal Starting pod/ip-10-0-181-176us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# ls -la /etc/test -rw-r--r--. 1 root root 132 Mar 25 13:14 /etc/test sh-4.4# chmod 1644 /etc/test sh-4.4# ls -la /etc/test -rw-r--r-T. 1 root root 132 Mar 25 13:14 /etc/test sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc get pods -A --field-selector spec.nodeName=ip-10-0-181-176.us-west-2.compute.internal NAMESPACE NAME READY STATUS RESTARTS AGE openshift-cluster-csi-drivers aws-ebs-csi-driver-node-jnvnt 3/3 Running 0 51m openshift-cluster-node-tuning-operator tuned-72wfr 1/1 Running 0 51m openshift-dns dns-default-djjl5 3/3 Running 0 51m openshift-image-registry image-registry-99cc49fc8-cw9wm 1/1 Running 0 9m33s openshift-image-registry node-ca-5kh5x 1/1 Running 0 51m openshift-ingress router-default-6977f9bcbc-vflbp 1/1 Running 0 9m33s openshift-machine-config-operator machine-config-daemon-wh89d 2/2 Running 0 51m openshift-monitoring alertmanager-main-0 5/5 Running 0 9m21s openshift-monitoring alertmanager-main-1 5/5 Running 0 9m22s openshift-monitoring grafana-57996d6746-dq6kb 2/2 Running 0 9m33s openshift-monitoring kube-state-metrics-8b5c557b6-dxljq 3/3 Running 0 9m33s openshift-monitoring node-exporter-v2mch 2/2 Running 0 51m openshift-monitoring openshift-state-metrics-868f5f4dc-tsx4d 3/3 Running 0 9m33s openshift-monitoring prometheus-adapter-fd6d44c76-m74mp 1/1 Running 0 9m33s openshift-monitoring prometheus-k8s-0 6/6 Running 1 8m32s openshift-monitoring telemeter-client-6db8fcb844-847cq 3/3 Running 0 9m32s openshift-monitoring thanos-querier-7cc9647ddb-gnxfd 5/5 Running 0 9m32s openshift-multus multus-d9z57 1/1 Running 0 51m openshift-multus network-metrics-daemon-6zpgq 2/2 Running 0 51m openshift-sdn ovs-ksw7g 1/1 Running 0 51m openshift-sdn sdn-ftn4t 2/2 Running 1 51m $ oc -n openshift-machine-config-operator logs -f machine-config-daemon-wh89d -c machine-config-daemon *SNIP* * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:74b81659ace1115224b515a0af0fff413050ed8705cef6339280e322b7ea6138 CustomOrigin: Managed by machine-config-operator Version: 46.82.202103211221-0 (2021-03-21T12:24:27Z) ostree://cb0327325553e6922ff25822ea7eb1a2ec213e70c7cf6880965e7e2bb5ee7dea Version: 46.82.202011260640-0 (2020-11-26T06:44:15Z) I0325 13:15:46.419763 1993 rpm-ostree.go:261] Running captured: journalctl --list-boots I0325 13:15:46.445712 1993 daemon.go:869] journalctl --list-boots: -2 46b111d093f745c69c32682595df5319 Thu 2021-03-25 12:26:27 UTC—Thu 2021-03-25 12:33:01 UTC -1 83178576fe6c4459b32988a5bf480fae Thu 2021-03-25 12:33:14 UTC—Thu 2021-03-25 13:15:11 UTC 0 2480a326752241cfb99453a51366d75a Thu 2021-03-25 13:15:25 UTC—Thu 2021-03-25 13:15:46 UTC I0325 13:15:46.445801 1993 rpm-ostree.go:261] Running captured: systemctl list-units --state=failed --no-legend I0325 13:15:46.476182 1993 daemon.go:884] systemd service state: OK I0325 13:15:46.476201 1993 daemon.go:616] Starting MachineConfigDaemon I0325 13:15:46.477042 1993 daemon.go:623] Enabling Kubelet Healthz Monitor E0325 13:15:49.138111 1993 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host E0325 13:15:49.138221 1993 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: connect: no route to host I0325 13:15:51.364493 1993 daemon.go:403] Node ip-10-0-181-176.us-west-2.compute.internal is not labeled node-role.kubernetes.io/master I0325 13:15:51.370150 1993 daemon.go:815] Current config: rendered-worker-9893591f36e7bb103247655baf16c147 I0325 13:15:51.370167 1993 daemon.go:816] Desired config: rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:15:51.376944 1993 update.go:1676] Disk currentConfig rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b overrides node's currentConfig annotation rendered-worker-9893591f36e7bb103247655baf16c147 I0325 13:15:51.380048 1993 daemon.go:1082] Validating against pending config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:15:51.389902 1993 daemon.go:1093] Validated on-disk state I0325 13:15:51.404542 1993 daemon.go:1134] Completing pending config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:15:51.417246 1993 update.go:1676] completed update for config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:15:51.422275 1993 daemon.go:1150] In desired config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b ^C $ oc -n openshift-machine-config-operator delete pod machine-config-daemon-wh89d pod "machine-config-daemon-wh89d" deleted $ oc get pods -A --field-selector spec.nodeName=ip-10-0-181-176.us-west-2.compute.internal NAMESPACE NAME READY STATUS RESTARTS AGE openshift-cluster-csi-drivers aws-ebs-csi-driver-node-jnvnt 3/3 Running 0 52m openshift-cluster-node-tuning-operator tuned-72wfr 1/1 Running 0 52m openshift-dns dns-default-djjl5 3/3 Running 0 52m openshift-image-registry image-registry-99cc49fc8-cw9wm 1/1 Running 0 10m openshift-image-registry node-ca-5kh5x 1/1 Running 0 52m openshift-ingress router-default-6977f9bcbc-vflbp 1/1 Running 0 10m openshift-machine-config-operator machine-config-daemon-ncvfw 2/2 Running 0 5s openshift-monitoring alertmanager-main-0 5/5 Running 0 10m openshift-monitoring alertmanager-main-1 5/5 Running 0 10m openshift-monitoring grafana-57996d6746-dq6kb 2/2 Running 0 10m openshift-monitoring kube-state-metrics-8b5c557b6-dxljq 3/3 Running 0 10m openshift-monitoring node-exporter-v2mch 2/2 Running 0 52m openshift-monitoring openshift-state-metrics-868f5f4dc-tsx4d 3/3 Running 0 10m openshift-monitoring prometheus-adapter-fd6d44c76-m74mp 1/1 Running 0 10m openshift-monitoring prometheus-k8s-0 6/6 Running 1 9m28s openshift-monitoring telemeter-client-6db8fcb844-847cq 3/3 Running 0 10m openshift-monitoring thanos-querier-7cc9647ddb-gnxfd 5/5 Running 0 10m openshift-multus multus-d9z57 1/1 Running 0 52m openshift-multus network-metrics-daemon-6zpgq 2/2 Running 0 52m openshift-sdn ovs-ksw7g 1/1 Running 0 52m openshift-sdn sdn-ftn4t 2/2 Running 1 52m $ oc -n openshift-machine-config-operator logs -f machine-config-daemon-ncvfw -c machine-config-daemon *SNIP* I0325 13:26:22.561559 18498 rpm-ostree.go:261] Running captured: systemctl list-units --state=failed --no-legend I0325 13:26:22.570334 18498 daemon.go:884] systemd service state: OK I0325 13:26:22.570349 18498 daemon.go:616] Starting MachineConfigDaemon I0325 13:26:22.570421 18498 daemon.go:623] Enabling Kubelet Healthz Monitor I0325 13:26:23.502159 18498 daemon.go:403] Node ip-10-0-181-176.us-west-2.compute.internal is not labeled node-role.kubernetes.io/master I0325 13:26:23.507733 18498 daemon.go:808] Current+desired config: rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:26:23.512789 18498 daemon.go:1085] Validating against current config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b E0325 13:26:23.519543 18498 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b: mode mismatch for file: "/etc/test"; expected: -rw-r--r--/420/0644; received: trw-r--r--/1048996/04000644 I0325 13:26:25.551808 18498 daemon.go:808] Current+desired config: rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:26:25.568644 18498 daemon.go:1085] Validating against current config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b E0325 13:26:25.598268 18498 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b: mode mismatch for file: "/etc/test"; expected: -rw-r--r--/420/0644; received: trw-r--r--/1048996/04000644 I0325 13:26:33.615778 18498 daemon.go:808] Current+desired config: rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:26:33.620529 18498 daemon.go:1085] Validating against current config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b E0325 13:26:33.626854 18498 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b: mode mismatch for file: "/etc/test"; expected: -rw-r--r--/420/0644; received: trw-r--r--/1048996/04000644 I0325 13:26:49.640493 18498 daemon.go:808] Current+desired config: rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b I0325 13:26:49.646234 18498 daemon.go:1085] Validating against current config rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b E0325 13:26:49.656450 18498 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-5ddb9087f440dc41a1d3a884da7e7a7b: mode mismatch for file: "/etc/test"; expected: -rw-r--r--/420/0644; received: trw-r--r--/1048996/04000644
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.23 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0952