Bug 1901021

Summary: RHCOS 4.6.1 missing ISCSI initiatorname.iscsi !
Product: OpenShift Container Platform Reporter: kevin <welin>
Component: RHCOSAssignee: Jonathan Lebon <jlebon>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.6CC: aghadge, agudi, bbreard, bgilbert, dornelas, imcleod, jligon, lucab, miabbott, nstielau, rkant, umesh_sunnapu
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1904243 (view as bug list) Environment:
Last Closed: 2021-01-04 15:50:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1899176, 1904243    

Description kevin 2020-11-24 09:57:05 UTC
Description of problem:

According OpenShift 4.6 Official document, the RHCOS worker node's default ISCSI initiator name is in /etc/iscsi/initiatorname.iscsi

https://docs.openshift.com/container-platform/4.6/storage/persistent_storage/persistent-storage-iscsi.html

But we cannot find this file in RHCOS 4.6.1 !!!


[core@worker-1 ~]$ ls -a /etc/iscsi/
.  ..  iscsid.conf

[core@worker-1 ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: worker-1.ocp4-1.example.internal
         Icon name: computer-vm
           Chassis: vm
        Machine ID: ada997b5da89423e8255633f232d8db4
           Boot ID: f4d48fddb7f0499c97dede5fd1ec47c1
    Virtualization: vmware
  Operating System: Red Hat Enterprise Linux CoreOS 46.82.202010091720-0 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8::coreos
            Kernel: Linux 4.18.0-193.24.1.el8_2.dt1.x86_64
      Architecture: x86-64

Comment 1 kevin 2020-11-24 10:15:31 UTC
In RHCOS 4.5, the initiatorname.iscsi default existed. But in RHCOS 4.6.1, this file is missing!

Comment 2 Luca BRUNO 2020-11-24 11:04:29 UTC
Misc notes:
 * The file under /etc should is not supposed be part of the base OS in both RHCOS-4.5 and RHCOS-4.6. The InitatorName is machine-specific, thus it shouldn't be backed into the OS image
 * The RHEL package has some buggy %post logic in the RPM that we have to bypass on CoreOS side, see https://bugzilla.redhat.com/show_bug.cgi?id=1734144 
 * We carry some service units specific to CoreOS in order to generate the initiatorname on first-boot, see https://github.com/openshift/os/blob/a06951cb0b9ec63d79087ca257a2135601e027dd/overlay.d/05rhcos/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service
 * There have been some work around this area in the 4.5->4.6 timeframe which may have resulted in a regression, see https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098

I would be good to attach here the full journal from this 4.6 node to pinpoint what's going on.

Comment 3 Luca BRUNO 2020-11-24 11:14:53 UTC
Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/), but the `43-manifest-rhcos.preset` content still tries to reference/enable the unit using the old name.

Jonathan, can you maybe have a look here?

Micah, did you verify https://bugzilla.redhat.com/show_bug.cgi?id=1868174 using a patched bootimage?

Comment 4 Scott Dodson 2020-11-24 13:55:32 UTC
*** Bug 1900926 has been marked as a duplicate of this bug. ***

Comment 5 Jonathan Lebon 2020-11-24 14:25:01 UTC
(In reply to Luca BRUNO from comment #3)
> Ah, I think MR#1098 above renamed the service unit (s/regenerate/generate/),
> but the `43-manifest-rhcos.preset` content still tries to reference/enable
> the unit using the old name.

Hmm indeed, you're right. I'm confused now how this worked during testing and verification.
Anyway, will fix this and add a test for it as well.

Comment 8 kevin 2020-11-24 16:30:39 UTC
I have aggregate some log from journal related to iscsi

[root@worker-1 ~]# journalctl  |grep "can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi" -B 5 -A 5
Nov 24 08:58:12 localhost multipathd[688]: /etc/multipath.conf. See man mpathconf(8) for more details
Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 08:58:12 localhost iscsid[690]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Nov 24 08:58:12 localhost iscsid[690]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Nov 24 08:58:12 localhost iscsid[690]: If using hardware iscsi like qla4xxx this message can be ignored.
Nov 24 08:58:12 localhost iscsid[690]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 08:58:12 localhost systemd[1]: Started Open-iSCSI.
Nov 24 08:58:12 localhost systemd[1]: Started Create Volatile Files and Directories.
Nov 24 08:58:12 localhost systemd[1]: Reached target System Initialization.
Nov 24 08:58:12 localhost systemd[1]: Reached target Basic System.
Nov 24 08:58:12 localhost systemd[1]: Starting Ignition (fetch-offline)...
--
Nov 24 09:01:25 localhost multipathd[638]: /etc/multipath.conf. See man mpathconf(8) for more details
Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 09:01:25 localhost iscsid[639]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Nov 24 09:01:25 localhost iscsid[639]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Nov 24 09:01:25 localhost iscsid[639]: If using hardware iscsi like qla4xxx this message can be ignored.
Nov 24 09:01:25 localhost iscsid[639]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Nov 24 09:01:25 localhost systemd[1]: Started Open-iSCSI.
Nov 24 09:01:25 localhost systemd[1]: Started Create Volatile Files and Directories.
Nov 24 09:01:25 localhost systemd[1]: Reached target System Initialization.
Nov 24 09:01:25 localhost systemd[1]: Reached target Basic System.
Nov 24 09:01:25 localhost systemd[1]: Starting dracut initqueue hook...

Comment 9 kevin 2020-11-25 01:59:27 UTC
[root@worker-1 ~]# journalctl -u coreos-generate-iscsi-initiatorname.service
-- Logs begin at Tue 2020-11-24 08:58:09 UTC, end at Wed 2020-11-25 01:58:17 UTC. --
-- No entries --

Comment 10 Jonathan Lebon 2020-11-26 16:32:46 UTC
Fixed by https://github.com/openshift/installer/pull/4422.

Comment 11 Micah Abbott 2020-11-30 21:32:20 UTC
(In reply to Jonathan Lebon from comment #10)
> Fixed by https://github.com/openshift/installer/pull/4422.

This PR has merged, so moving to MODIFIED

Comment 12 kevin 2020-12-01 16:13:17 UTC
which OCP 4.6 minor version can be fix ?

Comment 13 Micah Abbott 2020-12-01 21:44:46 UTC
(In reply to kevin from comment #12)
> which OCP 4.6 minor version can be fix ?

I would expect it to be part of 4.6.7

Comment 14 kevin 2020-12-02 12:34:56 UTC
Thank you for your reply

Comment 17 Michael Nguyen 2020-12-04 14:35:01 UTC
Verified on 4.6.0-0.nightly-2020-12-04-033739.  initiatorname.iscsi file exists

$ oc get clusterversion
oNAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-12-04-033739   True        False         26m     Cluster version is 4.6.0-0.nightly-2020-12-04-033739
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-149-143.us-west-2.compute.internal   Ready    worker   43m   v1.19.0+1348ff8
ip-10-0-157-110.us-west-2.compute.internal   Ready    master   51m   v1.19.0+1348ff8
ip-10-0-161-30.us-west-2.compute.internal    Ready    master   50m   v1.19.0+1348ff8
ip-10-0-178-7.us-west-2.compute.internal     Ready    worker   43m   v1.19.0+1348ff8
ip-10-0-216-159.us-west-2.compute.internal   Ready    master   50m   v1.19.0+1348ff8
ip-10-0-216-195.us-west-2.compute.internal   Ready    worker   41m   v1.19.0+1348ff8
$ oc debug node/ip-10-0-157-110.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-157-110us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:e09b1ba73bf0

Removing debug pod ...
$ oc debug node/ip-10-0-149-143.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-149-143us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:152d8a345d

Removing debug pod ...

Comment 19 errata-xmlrpc 2020-12-14 13:51:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.8 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5259

Comment 20 umesh_sunnapu 2020-12-17 15:17:35 UTC
@Micah, I checked this in a cluster which I just upgraded to 4.6.8. File still does not exist

[core@csah samples]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.8     True        False         110m    Cluster version is 4.6.8


[core@csah samples]$ oc debug node/compute-1.example.com -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Creating debug namespace/openshift-debug-node-d7xz9 ...
Starting pod/compute-1examplecom-debug ...
To use host binaries, run `chroot /host`

cat: /etc/iscsi/initiatorname.iscsi: No such file or directory

Removing debug pod ...
Removing debug namespace/openshift-debug-node-d7xz9 ...
error: non-zero exit code from debug container

Based on the last comment, opening a new bug report

Comment 21 kevin 2020-12-30 02:25:30 UTC
Hello, I have test when updagrated from OCP 4.6.6 to 4.6.8, the initiatorname.iscsi file still not exsit !


oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.8     True        False         9m10s   Cluster version is 4.6.8

ansible workers --private-key=${RHCOS_KEY} -u core -m shell -a 'cat /etc/iscsi/initiatorname.iscsi'

worker-3.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-2.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-0.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

worker-1.ocp4-1.example.internal | FAILED | rc=1 >>
cat: /etc/iscsi/initiatorname.iscsi: No such file or directorynon-zero return code

Comment 23 Michael Nguyen 2021-01-04 14:46:32 UTC
@welin See comment in BZ and see if that works for you. https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3

Comment 24 Jonathan Lebon 2021-01-04 15:50:33 UTC
This is getting messy. There are two things here:
1. The first-boot case: the iSCSI name wasn't being generated on first-boot. This is tracked by this RHBZ and is fixed by https://github.com/openshift/os/pull/453.
2. The upgrading case: upgrading nodes still didn't have a generated iSCSI name. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1908830 and is fixed by https://github.com/openshift/os/pull/473, and has a temporary workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1908830#c3.

I'm going to close this one back since we already have an RHBZ for the upgrading case.

Comment 25 Red Hat Bugzilla 2023-09-15 00:51:47 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days