Bug 1901021
Summary: | RHCOS 4.6.1 missing ISCSI initiatorname.iscsi ! | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | kevin <welin> | |
Component: | RHCOS | Assignee: | Jonathan Lebon <jlebon> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 4.6 | CC: | aghadge, agudi, bbreard, bgilbert, dornelas, imcleod, jligon, lucab, miabbott, nstielau, rkant, umesh_sunnapu | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1904243 (view as bug list) | Environment: | ||
Last Closed: | 2021-01-04 15:50:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1186913, 1899176, 1904243 |
Description
kevin
2020-11-24 09:57:05 UTC
In RHCOS 4.5, the initiatorname.iscsi default existed. But in RHCOS 4.6.1, this file is missing! Misc notes: * The file under /etc should is not supposed be part of the base OS in both RHCOS-4.5 and RHCOS-4.6. The InitatorName is machine-specific, thus it shouldn't be backed into the OS image * The RHEL package has some buggy %post logic in the RPM that we have to bypass on CoreOS side, see https://bugzilla.redhat.com/show_bug.cgi?id=1734144 * We carry some service units specific to CoreOS in order to generate the initiatorname on first-boot, see https://github.com/openshift/os/blob/a06951cb0b9ec63d79087ca257a2135601e027dd/overlay.d/05rhcos/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service * There have been some work around this area in the 4.5->4.6 timeframe which may have resulted in a regression, see https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098 I would be good to attach here the full journal from this 4.6 node to pinpoint what's going on. Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/), but the `43-manifest-rhcos.preset` content still tries to reference/enable the unit using the old name. Jonathan, can you maybe have a look here? Micah, did you verify https://bugzilla.redhat.com/show_bug.cgi?id=1868174 using a patched bootimage? *** Bug 1900926 has been marked as a duplicate of this bug. *** (In reply to Luca BRUNO from comment #3) > Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/), > but the `43-manifest-rhcos.preset` content still tries to reference/enable > the unit using the old name. Hmm indeed, you're right. I'm confused now how this worked during testing and verification. Anyway, will fix this and add a test for it as well. I have aggregate some log from journal related to iscsi [root@worker-1 ~]# journalctl |grep "can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi" -B 5 -A 5 Nov 24 08:58:12 localhost multipathd[688]: /etc/multipath.conf. See man mpathconf(8) for more details Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Nov 24 08:58:12 localhost iscsid[690]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier]. Nov 24 08:58:12 localhost iscsid[690]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6. Nov 24 08:58:12 localhost iscsid[690]: If using hardware iscsi like qla4xxx this message can be ignored. Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi Nov 24 08:58:12 localhost systemd[1]: Started Open-iSCSI. Nov 24 08:58:12 localhost systemd[1]: Started Create Volatile Files and Directories. Nov 24 08:58:12 localhost systemd[1]: Reached target System Initialization. Nov 24 08:58:12 localhost systemd[1]: Reached target Basic System. Nov 24 08:58:12 localhost systemd[1]: Starting Ignition (fetch-offline)... -- Nov 24 09:01:25 localhost multipathd[638]: /etc/multipath.conf. See man mpathconf(8) for more details Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi Nov 24 09:01:25 localhost iscsid[639]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier]. Nov 24 09:01:25 localhost iscsid[639]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6. Nov 24 09:01:25 localhost iscsid[639]: If using hardware iscsi like qla4xxx this message can be ignored. Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi Nov 24 09:01:25 localhost systemd[1]: Started Open-iSCSI. Nov 24 09:01:25 localhost systemd[1]: Started Create Volatile Files and Directories. Nov 24 09:01:25 localhost systemd[1]: Reached target System Initialization. Nov 24 09:01:25 localhost systemd[1]: Reached target Basic System. Nov 24 09:01:25 localhost systemd[1]: Starting dracut initqueue hook... [root@worker-1 ~]# journalctl -u coreos-generate-iscsi-initiatorname.service -- Logs begin at Tue 2020-11-24 08:58:09 UTC, end at Wed 2020-11-25 01:58:17 UTC. -- -- No entries -- (In reply to Jonathan Lebon from comment #10) > Fixed by https://github.com/openshift/installer/pull/4422. This PR has merged, so moving to MODIFIED which OCP 4.6 minor version can be fix ? (In reply to kevin from comment #12) > which OCP 4.6 minor version can be fix ? I would expect it to be part of 4.6.7 Thank you for your reply Verified on 4.6.0-0.nightly-2020-12-04-033739. initiatorname.iscsi file exists $ oc get clusterversion oNAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-12-04-033739 True False 26m Cluster version is 4.6.0-0.nightly-2020-12-04-033739 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-149-143.us-west-2.compute.internal Ready worker 43m v1.19.0+1348ff8 ip-10-0-157-110.us-west-2.compute.internal Ready master 51m v1.19.0+1348ff8 ip-10-0-161-30.us-west-2.compute.internal Ready master 50m v1.19.0+1348ff8 ip-10-0-178-7.us-west-2.compute.internal Ready worker 43m v1.19.0+1348ff8 ip-10-0-216-159.us-west-2.compute.internal Ready master 50m v1.19.0+1348ff8 ip-10-0-216-195.us-west-2.compute.internal Ready worker 41m v1.19.0+1348ff8 $ oc debug node/ip-10-0-157-110.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi Starting pod/ip-10-0-157-110us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:e09b1ba73bf0 Removing debug pod ... $ oc debug node/ip-10-0-149-143.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi Starting pod/ip-10-0-149-143us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` InitiatorName=iqn.1994-05.com.redhat:152d8a345d Removing debug pod ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.8 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5259 @Micah, I checked this in a cluster which I just upgraded to 4.6.8. File still does not exist [core@csah samples]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.8 True False 110m Cluster version is 4.6.8 [core@csah samples]$ oc debug node/compute-1.example.com -- chroot /host cat /etc/iscsi/initiatorname.iscsi Creating debug namespace/openshift-debug-node-d7xz9 ... Starting pod/compute-1examplecom-debug ... To use host binaries, run `chroot /host` cat: /etc/iscsi/initiatorname.iscsi: No such file or directory Removing debug pod ... Removing debug namespace/openshift-debug-node-d7xz9 ... error: non-zero exit code from debug container Based on the last comment, opening a new bug report Hello, I have test when updagrated from OCP 4.6.6 to 4.6.8, the initiatorname.iscsi file still not exsit ! oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.8 True False 9m10s Cluster version is 4.6.8 ansible workers --private-key=${RHCOS_KEY} -u core -m shell -a 'cat /etc/iscsi/initiatorname.iscsi' worker-3.ocp4-1.example.internal | FAILED | rc=1 >> cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code worker-2.ocp4-1.example.internal | FAILED | rc=1 >> cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code worker-0.ocp4-1.example.internal | FAILED | rc=1 >> cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code worker-1.ocp4-1.example.internal | FAILED | rc=1 >> cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code @welin See comment in BZ and see if that works for you. https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3 This is getting messy. There are two things here: 1. The first-boot case: the iSCSI name wasn't being generated on first-boot. This is tracked by this RHBZ and is fixed by https://github.com/openshift/os/pull/453. 2. The upgrading case: upgrading nodes still didn't have a generated iSCSI name. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1908830 and is fixed by https://github.com/openshift/os/pull/473, and has a temporary workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3. I'm going to close this one back since we already have an RHBZ for the upgrading case. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |