Hide Forgot
Description of problem: MCO is configured to load /etc/iscsi/initiatorname.iscsi in new worker but fails to degraded state. The API respond with the correct file [aguadarr@dlosbastion03 ~]$ curl -ks https://api-int.kubelab2.cscglobal.com:22623/config/worker | jq -r '.storage.files[] | select(.path=="/etc/iscsi/initiatorname.iscsi")' { "filesystem": "root", "path": "/etc/iscsi/initiatorname.iscsi", "contents": { "source": "data:text/plain;charset=utf-8;base64,SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg==", "verification": {} }, "mode": 420 } The content of the file /etc/iscsi/initiatorname.iscsi offer by the API config/worker is correct [aguadarr@dlosbastion03 ~]$ echo SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg== | base64 -d InitiatorName=iqn.1994-05.com.nutanix:k8s-worker The log shows error reading the file /etc/iscsi/initiatorname.iscsi, even the logs shows that the file was written and selinux relabel the file, but the content is not the provided by the API sh-4.4# journalctl | grep 'etc/iscsi' Aug 05 19:29:25 localhost iscsid[718]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Aug 05 19:29:25 localhost iscsid[718]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier]. Aug 05 19:29:25 localhost iscsid[718]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi Aug 05 19:31:19 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO : files: createFilesystemsFiles: createFiles: op(1a): [started] writing file "/etc/iscsi/initiatorname.iscsi" Aug 05 19:31:19 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO : files: createFilesystemsFiles: createFiles: op(1a): [finished] writing file "/etc/iscsi/initiatorname.iscsi" Aug 05 19:31:20 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO : files: createFilesystemsFiles: createFiles: op(1a): [started] writing file "/etc/iscsi/initiatorname.iscsi" Aug 05 19:31:20 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO : files: createFilesystemsFiles: createFiles: op(1a): [finished] writing file "/etc/iscsi/initiatorname.iscsi" Aug 05 19:31:21 ocp-compute-06.kubelab2.cscglobal.com restorecon[1389]: Relabeled /etc/iscsi/initiatorname.iscsi from system_u:object_r:unlabeled_t:s0 to system_u:object_r:etc_t:s0 Aug 05 19:32:07 localhost iscsid[766]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Aug 05 19:32:07 localhost iscsid[766]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier]. Aug 05 19:32:07 localhost iscsid[766]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi Due this behavior the MCD is in the error loop I0805 19:34:04.175620 3495 daemon.go:955] Validating against pending config rendered-worker-9eb0054542fb4cc0439e927d588d7344 E0805 19:34:04.176304 3495 daemon.go:1350] content mismatch for file /etc/iscsi/initiatorname.iscsi: InitiatorName=iqn.1994-05.com. . A: redhat:c6d9a5616ee1 . . B: nutanix:k8s-worker . . E0805 19:34:04.176397 3495 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-9eb0054542fb4cc0439e927d588d7344 I0805 19:34:06.175031 3495 daemon.go:724] Current+desired config: rendered-worker-9eb0054542fb4cc0439e927d588d7344 Actual results: Node fails to deploy and goes into Degraded Expected results: /etc/iscsi/initiatorname.iscsi to be successfully deployed and node to be Ready as this occurs with existing nodes
@Brandon please provide a must gather from this cluster.
Hi Kirsten, got a fresh mustgather from the cluster here: https://access.redhat.com/hydra/rest/cases/02702919/attachments/2ab94029-06a7-43b1-9f08-c479ae8b7d8a?usePresignedUrl=true
Hi Brandon, I don't have hydra access can you please upload to GDrive?
Actually I am able to download from the customer portal. No need to reupload.
After discussing with Micah, moving this over to RHCOS for further triage.
I'm able to reproduce this behavior on a 4.6 cluster; it looks like if I restart the `iscsid.service` after the MC is applied and the node comes back, it happily starts. I'm wondering if there is something amiss happening in the initrd re: iscsid, since I see the service starting before the pivot into the real root of RHCOS. ``` Aug 12 20:53:43 localhost systemd[1]: Reached target Basic System. Aug 12 20:53:43 localhost iscsid[627]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Aug 12 20:53:43 localhost iscsid[627]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp > Aug 12 20:53:43 localhost iscsid[627]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6. Aug 12 20:53:43 localhost iscsid[627]: If using hardware iscsi like qla4xxx this message can be ignored. Aug 12 20:53:43 localhost iscsid[627]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi ... Aug 12 20:53:43 localhost systemd[1]: Starting OSTree Prepare OS/... Aug 12 20:53:43 localhost ostree-prepare-root[729]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/aecccdc13ae1926a4907b0e53b959654adcd5dfd6e37728955c33163963c1fba.0 Aug 12 20:53:43 localhost systemd[1]: Started OSTree Prepare OS/. Aug 12 20:53:43 localhost systemd[1]: Reached target Initrd Root File System. Aug 12 20:53:43 localhost systemd[1]: Starting Reload Configuration from the Real Root... Aug 12 20:53:43 localhost systemd[1]: Reloading. Aug 12 20:53:44 localhost systemd[1]: Started Reload Configuration from the Real Root. ... Aug 12 20:53:44 localhost systemd[1]: Starting Switch Root... Aug 12 20:53:44 localhost systemd[1]: Switching root. ... Aug 12 20:53:48 cosa-devsh systemd[1]: iscsi.service: Unit cannot be reloaded because it is inactive. ``` Additional strangeness I observed: after applying the MC, attempts to view the `/etc/iscsi/initiatorname.iscsi` file with `vi` shows an empty file. But a `cat` of the file shows the contents. <time passes> I experimented with `rpm-ostree initramfs` to see if I could include the `/etc/iscsi/initiatorname.iscsi` in the initrd and had some successs: ``` $ sudo rpm-ostree initramfs --arg=-I --arg=/etc/iscsi/initiatorname.iscsi --enable Checking out tree 935aac4... done Generating initramfs... done Writing OSTree commit... done Staging deployment... done Initramfs regeneration is now: enabled [core@cosa-devsh ~]$ rpm-ostree status State: idle Deployments: ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7 Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z) Initramfs: -I /etc/iscsi/initiatorname.iscsi ... ``` When rebooting into the new deployment, I no longer see any errors about the inability to open the `/etc/iscsi/initiatorname.iscsi` file. ``` $ rpm-ostree status State: idle Deployments: * ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7 Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z) Initramfs: -I /etc/iscsi/initiatorname.iscsi ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7 Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z) $ systemctl status iscsid ● iscsid.service - Open-iSCSI Loaded: loaded (/usr/lib/systemd/system/iscsid.service; disabled; vendor preset: disabled) Active: inactive (dead) since Wed 2020-08-12 21:15:44 UTC; 1min 9s ago Docs: man:iscsid(8) man:iscsiuio(8) man:iscsiadm(8) Main PID: 624 (code=exited, status=0/SUCCESS) Status: "Ready to process requests" Aug 12 21:15:42 localhost systemd[1]: Starting Open-iSCSI... Aug 12 21:15:42 localhost systemd[1]: Started Open-iSCSI. Aug 12 21:15:44 localhost systemd[1]: Stopping Open-iSCSI... Aug 12 21:15:44 localhost iscsid[624]: iscsid: iscsid shutting down. Aug 12 21:15:44 localhost systemd[1]: Stopped Open-iSCSI. ``` Compared to the failure case: ``` [core@cosa-devsh ~]$ systemctl status iscsid ● iscsid.service - Open-iSCSI Loaded: loaded (/usr/lib/systemd/system/iscsid.service; disabled; vendor preset: disabled) Active: inactive (dead) since Wed 2020-08-12 21:15:02 UTC; 11s ago Docs: man:iscsid(8) man:iscsiuio(8) man:iscsiadm(8) Main PID: 623 (code=exited, status=0/SUCCESS) Status: "Ready to process requests" Aug 12 21:15:00 localhost systemd[1]: Starting Open-iSCSI... Aug 12 21:15:00 localhost iscsid[623]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Aug 12 21:15:00 localhost iscsid[623]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp > Aug 12 21:15:00 localhost iscsid[623]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6. Aug 12 21:15:00 localhost iscsid[623]: If using hardware iscsi like qla4xxx this message can be ignored. Aug 12 21:15:00 localhost iscsid[623]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi Aug 12 21:15:00 localhost systemd[1]: Started Open-iSCSI. Aug 12 21:15:02 localhost systemd[1]: Stopping Open-iSCSI... Aug 12 21:15:02 localhost iscsid[623]: iscsid: iscsid shutting down. Aug 12 21:15:02 localhost systemd[1]: Stopped Open-iSCSI. ``` So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC. Additionally, we may need to change how RHCOS is configured initially so that it can look for the `/etc/iscsi/initiatorname.iscsi` file by default. And/or when a cluster is installed and the `/etc/iscsi/initiatorname.iscsi` is provided as part of a day 1 MC/Ignition config. I'm tenatively targeting for 4.6, but depending on the complexity of the fix, we may need to push it out to a later release.
So I think what's going here is that we currently ship a service which unconditionally regenerates the initiator name on boot and overwrites the one from the Ignition config. Posted a fix in https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098 to avoid that. > So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC. We should double-check with the customer, but IIUC the context for this is iSCSI-backed persistent storage. So we shouldn't need to modify the initramfs.
Part of this bootimage bump: https://github.com/openshift/installer/pull/4066.
Bootimage bump merged.
(In reply to Jonathan Lebon from comment #7) > So I think what's going here is that we currently ship a service which > unconditionally regenerates the initiator name on boot and overwrites the > one from the Ignition config. Posted a fix in > https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098 to > avoid that. > > > So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC. > > We should double-check with the customer, but IIUC the context for this is > iSCSI-backed persistent storage. So we shouldn't need to modify the > initramfs. I followed up with the CU and confirmed that this is correct.
Verified with 4.6.0-0.nightly-2020-09-19-004228 ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-19-004228 True False 21m Cluster version is 4.6.0-0.nightly-2020-09-19-004228 $ oc debug node/ip-10-0-135-149.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi Starting pod/ip-10-0-135-149us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:cabdb8e83dc Removing debug pod ... $ cat bz1868174.yaml # This MachineConfig demonstrates the minimal requirements for writing out a # file to the node apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-iscsi-config spec: config: ignition: version: 3.1.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg== filesystem: root path: /etc/iscsi/initiatorname.iscsi overwrite: true mode: 0644 $ echo "SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg==" | base64 -d InitiatorName=iqn.1994-05.com.nutanix:k8s-worker $ oc apply -f bz1868174.yaml machineconfig.machineconfiguration.openshift.io/99-iscsi-config created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 00-worker c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 01-master-container-runtime c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 01-master-kubelet c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 01-worker-container-runtime c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 01-worker-kubelet c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 6h19m 99-iscsi-config 3.1.0 6s 99-master-generated-registries c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 30m 99-master-ssh 2.2.0 6h20m 99-worker-generated-registries c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 30m 99-worker-ssh 2.2.0 6h20m rendered-master-474cb317eef983b8dcfaabf37c40885f b5ab0fbd69c492007e09455099fd3f3a79d3433b 2.2.0 3h16m rendered-master-54e984e6208d40d139d8bff323169dbf 55f7317224e7d8badc98879662771a14185e5739 2.2.0 4h10m rendered-master-8efa482986f14061666a1c1b38bdd393 204f5642bd3adbfa5c85e74958223fb0cf8ad2db 2.2.0 4h53m rendered-master-90a25e29ca44440df01724fdf78b8d42 f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b 2.2.0 6h19m rendered-master-94c0c10ef6d26ac79f83f26dddb2886b 08aad1925d6e29266390ecb6f4e6730d60e44aaf 2.2.0 113m rendered-master-b383db8c79b8e1c414be6000fefe4a4e 08aad1925d6e29266390ecb6f4e6730d60e44aaf 2.2.0 38m rendered-master-ea3d50e6005eb46bd31d38c7af0e1890 c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 30m rendered-worker-0617f5423aec0a1738e3732db8440a1e 204f5642bd3adbfa5c85e74958223fb0cf8ad2db 2.2.0 4h53m rendered-worker-0aee5a3cb624b11e90a5265912b23e1a 55f7317224e7d8badc98879662771a14185e5739 2.2.0 4h10m rendered-worker-63fb5d49df2b0a460bcbd707a0296441 c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 1s rendered-worker-6bc097f3ef8898db9837a3d4399122cf 08aad1925d6e29266390ecb6f4e6730d60e44aaf 2.2.0 113m rendered-worker-7966a5c09dee72a06c3069ebf3b7d66c f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b 2.2.0 6h19m rendered-worker-89b4196319e54095cad4b003daa21e96 c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 30m rendered-worker-d406aa7432afacbe2b7243ef8e7d40e4 b5ab0fbd69c492007e09455099fd3f3a79d3433b 2.2.0 3h16m $ oc debug node/ip-10-0-135-149.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi Starting pod/ip-10-0-135-149us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.nutanix:k8s-worker Removing debug pod ... ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196