Bug 1901021 - RHCOS 4.6.1 missing ISCSI initiatorname.iscsi ! [NEEDINFO]
Summary: RHCOS 4.6.1 missing ISCSI initiatorname.iscsi !
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1900926 (view as bug list)
Depends On:
Blocks: 1186913 1899176 1904243
TreeView+ depends on / blocked
 
Reported: 2020-11-24 09:57 UTC by kevin
Modified: 2021-01-05 05:31 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1904243 (view as bug list)
Environment:
Last Closed: 2021-01-04 15:50:33 UTC
Target Upstream Version:
mnguyen: needinfo? (welin)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5259 0 None None None 2020-12-14 13:51:42 UTC

Description kevin 2020-11-24 09:57:05 UTC
Description of problem:

According OpenShift 4.6 Official document, the RHCOS worker node's default ISCSI initiator name is in /etc/iscsi/initiatorname.iscsi

https://docs.openshift.com/container-platform/4.6/storage/persistent_storage/persistent-storage-iscsi.html

But we cannot find this file in RHCOS 4.6.1 !!!


[core@worker-1 ~]$ ls -a /etc/iscsi/
.  ..  iscsid.conf

[core@worker-1 ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: worker-1.ocp4-1.example.internal
         Icon name: computer-vm
           Chassis: vm
        Machine ID: ada997b5da89423e8255633f232d8db4
           Boot ID: f4d48fddb7f0499c97dede5fd1ec47c1
    Virtualization: vmware
  Operating System: Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8::coreos
            Kernel: Linux 4.18.0-193.24.1.el8_2.dt1.x86_64
      Architecture: x86-64

Comment 1 kevin 2020-11-24 10:15:31 UTC
In RHCOS 4.5, the initiatorname.iscsi default existed. But in RHCOS 4.6.1, this file is missing!

Comment 2 Luca BRUNO 2020-11-24 11:04:29 UTC
Misc notes:
 * The file under /etc should is not supposed be part of the base OS in both RHCOS-4.5 and RHCOS-4.6. The InitatorName is machine-specific, thus it shouldn't be backed into the OS image
 * The RHEL package has some buggy %post logic in the RPM that we have to bypass on CoreOS side, see https://bugzilla.redhat.com/show_bug.cgi?id=1734144 
 * We carry some service units specific to CoreOS in order to generate the initiatorname on first-boot, see https://github.com/openshift/os/blob/a06951cb0b9ec63d79087ca257a2135601e027dd/overlay.d/05rhcos/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service
 * There have been some work around this area in the 4.5->4.6 timeframe which may have resulted in a regression, see https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098

I would be good to attach here the full journal from this 4.6 node to pinpoint what's going on.

Comment 3 Luca BRUNO 2020-11-24 11:14:53 UTC
Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/), but the `43-manifest-rhcos.preset` content still tries to reference/enable the unit using the old name.

Jonathan, can you maybe have a look here?

Micah, did you verify https://bugzilla.redhat.com/show_bug.cgi?id=1868174 using a patched bootimage?

Comment 4 Scott Dodson 2020-11-24 13:55:32 UTC
*** Bug 1900926 has been marked as a duplicate of this bug. ***

Comment 5 Jonathan Lebon 2020-11-24 14:25:01 UTC
(In reply to Luca BRUNO from comment #3)
> Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/),
> but the `43-manifest-rhcos.preset` content still tries to reference/enable
> the unit using the old name.

Hmm indeed, you're right. I'm confused now how this worked during testing and verification.
Anyway, will fix this and add a test for it as well.

Comment 8 kevin 2020-11-24 16:30:39 UTC
I have aggregate some log from journal related to iscsi

[root@worker-1 ~]# journalctl  |grep "can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi" -B 5 -A 5
Nov 24 08:58:12 localhost multipathd[688]: /etc/multipath.conf. See man mpathconf(8) for more details
Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 08:58:12 localhost iscsid[690]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Nov 24 08:58:12 localhost iscsid[690]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Nov 24 08:58:12 localhost iscsid[690]: If using hardware iscsi like qla4xxx this message can be ignored.
Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 08:58:12 localhost systemd[1]: Started Open-iSCSI.
Nov 24 08:58:12 localhost systemd[1]: Started Create Volatile Files and Directories.
Nov 24 08:58:12 localhost systemd[1]: Reached target System Initialization.
Nov 24 08:58:12 localhost systemd[1]: Reached target Basic System.
Nov 24 08:58:12 localhost systemd[1]: Starting Ignition (fetch-offline)...
--
Nov 24 09:01:25 localhost multipathd[638]: /etc/multipath.conf. See man mpathconf(8) for more details
Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 09:01:25 localhost iscsid[639]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Nov 24 09:01:25 localhost iscsid[639]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Nov 24 09:01:25 localhost iscsid[639]: If using hardware iscsi like qla4xxx this message can be ignored.
Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 09:01:25 localhost systemd[1]: Started Open-iSCSI.
Nov 24 09:01:25 localhost systemd[1]: Started Create Volatile Files and Directories.
Nov 24 09:01:25 localhost systemd[1]: Reached target System Initialization.
Nov 24 09:01:25 localhost systemd[1]: Reached target Basic System.
Nov 24 09:01:25 localhost systemd[1]: Starting dracut initqueue hook...

Comment 9 kevin 2020-11-25 01:59:27 UTC
[root@worker-1 ~]# journalctl -u coreos-generate-iscsi-initiatorname.service
-- Logs begin at Tue 2020-11-24 08:58:09 UTC, end at Wed 2020-11-25 01:58:17 UTC. --
-- No entries --

Comment 10 Jonathan Lebon 2020-11-26 16:32:46 UTC
Fixed by https://github.com/openshift/installer/pull/4422.

Comment 11 Micah Abbott 2020-11-30 21:32:20 UTC
(In reply to Jonathan Lebon from comment #10)
> Fixed by https://github.com/openshift/installer/pull/4422.

This PR has merged, so moving to MODIFIED

Comment 12 kevin 2020-12-01 16:13:17 UTC
which OCP 4.6 minor version can be fix ?

Comment 13 Micah Abbott 2020-12-01 21:44:46 UTC
(In reply to kevin from comment #12)
> which OCP 4.6 minor version can be fix ?

I would expect it to be part of 4.6.7

Comment 14 kevin 2020-12-02 12:34:56 UTC
Thank you for your reply

Comment 17 Michael Nguyen 2020-12-04 14:35:01 UTC
Verified on 4.6.0-0.nightly-2020-12-04-033739.  initiatorname.iscsi file exists

$ oc get clusterversion
oNAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-12-04-033739   True        False         26m     Cluster version is 4.6.0-0.nightly-2020-12-04-033739
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-149-143.us-west-2.compute.internal   Ready    worker   43m   v1.19.0+1348ff8
ip-10-0-157-110.us-west-2.compute.internal   Ready    master   51m   v1.19.0+1348ff8
ip-10-0-161-30.us-west-2.compute.internal    Ready    master   50m   v1.19.0+1348ff8
ip-10-0-178-7.us-west-2.compute.internal     Ready    worker   43m   v1.19.0+1348ff8
ip-10-0-216-159.us-west-2.compute.internal   Ready    master   50m   v1.19.0+1348ff8
ip-10-0-216-195.us-west-2.compute.internal   Ready    worker   41m   v1.19.0+1348ff8
$ oc debug node/ip-10-0-157-110.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-157-110us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:e09b1ba73bf0

Removing debug pod ...
$ oc debug node/ip-10-0-149-143.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-149-143us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:152d8a345d

Removing debug pod ...

Comment 19 errata-xmlrpc 2020-12-14 13:51:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.8 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5259

Comment 20 umesh_sunnapu 2020-12-17 15:17:35 UTC
@Micah, I checked this in a cluster which I just upgraded to 4.6.8. File still does not exist

[core@csah samples]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.8     True        False         110m    Cluster version is 4.6.8


[core@csah samples]$ oc debug node/compute-1.example.com -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Creating debug namespace/openshift-debug-node-d7xz9 ...
Starting pod/compute-1examplecom-debug ...
To use host binaries, run `chroot /host`

cat: /etc/iscsi/initiatorname.iscsi: No such file or directory

Removing debug pod ...
Removing debug namespace/openshift-debug-node-d7xz9 ...
error: non-zero exit code from debug container

Based on the last comment, opening a new bug report

Comment 21 kevin 2020-12-30 02:25:30 UTC
Hello, I have test when updagrated from OCP 4.6.6 to 4.6.8, the initiatorname.iscsi file still not exsit !


oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.8     True        False         9m10s   Cluster version is 4.6.8

ansible workers --private-key=${RHCOS_KEY} -u core -m shell -a 'cat /etc/iscsi/initiatorname.iscsi'

worker-3.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-2.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-0.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-1.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

Comment 23 Michael Nguyen 2021-01-04 14:46:32 UTC
@welin See comment in BZ and see if that works for you. https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3

Comment 24 Jonathan Lebon 2021-01-04 15:50:33 UTC
This is getting messy. There are two things here:
1. The first-boot case: the iSCSI name wasn't being generated on first-boot. This is tracked by this RHBZ and is fixed by https://github.com/openshift/os/pull/453.
2. The upgrading case: upgrading nodes still didn't have a generated iSCSI name. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1908830 and is fixed by https://github.com/openshift/os/pull/473, and has a temporary workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3.

I'm going to close this one back since we already have an RHBZ for the upgrading case.


Note You need to log in before you can comment on or make changes to this bug.