Bug 1908830 - RHCOS 4.6 - Missing Initiatorname
Summary: RHCOS 4.6 - Missing Initiatorname
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: x86_64
OS: Linux
medium
urgent
Target Milestone: ---
: 4.7.0
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1908847
TreeView+ depends on / blocked
 
Reported: 2020-12-17 17:21 UTC by umesh_sunnapu
Modified: 2024-03-25 17:37 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The service unit which regenerates the iSCSI initiator name only worked on first boot. Consequence: Upgrading nodes would not receive uniquely generated initiator names. Fix: The service unit now runs on every boot. Result: Upgrading nodes now receive generated initiator names if one doesn't already exist.
Clone Of:
: 1908847 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:46:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 473 0 None closed coreos-generate-iscsi-initiatorname.service: drop ConditionFirstBoot=true 2021-02-16 17:08:14 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:46:50 UTC

Description umesh_sunnapu 2020-12-17 17:21:20 UTC
Description of problem:
RHCOS compute nodes does not have /etc/iscsi/initiatorname.iscsi

[core@compute-1 ~]$ cat /etc/iscsi/initiatorname.iscsi
cat: /etc/iscsi/initiatorname.iscsi: No such file or directory

Version-Release number of selected component (if applicable):
[core@csah ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.8     True        False         3h46m   Cluster version is 4.6.8


How reproducible:

- Any basic install of RHCOS will show the file not being created.

Actual results:

[core@compute-1 ~]$ systemctl status coreos-generate-iscsi-initiatorname
● coreos-generate-iscsi-initiatorname.service - CoreOS Generate iSCSI Initiator Name
   Loaded: loaded (/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Thu 2020-12-17 17:05:21 UTC; 4s ago
           └─ ConditionFirstBoot=true was not met
     Docs: https://bugzilla.redhat.com/show_bug.cgi?id=1493296
           https://bugzilla.redhat.com/show_bug.cgi?id=1687722

[core@compute-1 ~]$ cat /etc/iscsi/initiatorname.iscsi
cat: /etc/iscsi/initiatorname.iscsi: No such file or directory


Expected results:
- Service coreos-generate-iscsi-initiatorname should be in running state and file /etc/iscsi/initiatorname.iscsi to be created with a unique iscsi name


Additional info:
Same bug is opened in https://bugzilla.redhat.com/show_bug.cgi?id=1901021. It is mentioned in the report that, 4.6.7 should fix this but considering I am currently in 4.6.8 and not seeing the necessary file, reopening the BUG again as per the comment

https://bugzilla.redhat.com/show_bug.cgi?id=1904243 is also created with same BUG but the version is set to 4.7. I am not sure if we need to wait till 4.7 to get this fixed. If that is the case, need to find an alternative to get this fixed in 4.6.x versions as this a long term support version.

Comment 1 Micah Abbott 2020-12-17 17:49:37 UTC
The service is configured to run only at first boot, so that's why the file is not being generated on upgrades.

We can modify our service to drop the `ConditionFirstBoot` and just let the service test for the `/etc/iscsi/initiatorname.iscsi` file, so any installs that are missing the file will get a new initiatorname generated.

This matches the upstream change that will should be delivered as part of RHEL 8.4

https://bugzilla.redhat.com/show_bug.cgi?id=1734144
https://github.com/open-iscsi/open-iscsi/blob/2.1.2/etc/systemd/iscsi-init.service

Comment 2 umesh_sunnapu 2020-12-17 18:05:15 UTC
@micah, do you then recommend customers who are already running 4.6.6 or lesser, to create a machine config and update the /usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service by removing ConditionFirstBoot=true or setting it to false ?

Suppose we go this route, and clusters have a mix of RHCOS and RHEL compute nodes with same ROLE "worker", does the machine config also apply it to RHEL based nodes or somehow ignore that for RHEL compute nodes.

Reason for my ask is, I have a mix of compute nodes running RHEL and RHCOS. Upon trying to update, one of the machine config created as part of the update is looking for a path that exists in RHCOS but not in RHEL and eventually marked the RHEL compute nodes as SchedulingDisabled. (Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1908842). 

For my understanding, would like to know if there will be conditions that will differentiate the OS running in compute nodes with same ROLES when machine configs are applied

[core@csah ~]$ oc get nodes -o wide
NAME                    STATUS                     ROLES    AGE     VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
compute-1.example.com   Ready                      worker   6d23h   v1.19.0+7070803   <ip address>   <none>        Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 (Ootpa)   4.18.0-193.29.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
compute-2.example.com   Ready                      worker   6d23h   v1.19.0+7070803   <ip address>   <none>        Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 (Ootpa)   4.18.0-193.29.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
compute-3.example.com   Ready,SchedulingDisabled   worker   16h     v1.19.0+7070803   <ip address>   <none>        OpenShift Enterprise                                           3.10.0-1160.11.1.el7.x86_64    cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7
etcd-0.example.com      Ready                      master   7d5h    v1.19.0+7070803   <ip address>   <none>        Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 (Ootpa)   4.18.0-193.29.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
etcd-1.example.com      Ready                      master   7d5h    v1.19.0+7070803   <ip address>   <none>        Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 (Ootpa)   4.18.0-193.29.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
etcd-2.example.com      Ready                      master   7d4h    v1.19.0+7070803   <ip address>   <none>        Red Hat Enterprise Linux CoreOS 46.82.202012051820-0 (Ootpa)   4.18.0-193.29.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8

FYI. Removed the Internal IP address info

I will create the cluster with 4.6.8 as well but would like to know the path for customers who are already running 4.6.6 or lesser.

Comment 3 Micah Abbott 2020-12-17 18:58:09 UTC
You should be able to work around this by providing a MachineConfig that writes a systemd drop-in to reset the `ConditionsFirstBoot` in the service:

```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 98-iscsi-initiator-dropin
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: coreos-generate-iscsi-initiatorname.service
          dropins: 
            - name: 10-disable-firstboot.conf
              contents: |
                [Unit]
                ConditionFirstBoot=
```

Note, you should remove this MachineConfig from the cluster once the `coreos-generate-iscsi-initiatorname.service` has been updated in a newer RHCOS/OCP version.

Comment 4 Micah Abbott 2020-12-17 19:06:00 UTC
(In reply to umesh_sunnapu from comment #2)
> @micah, do you then recommend customers who are already running 4.6.6 or
> lesser, to create a machine config and update the
> /usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service by
> removing ConditionFirstBoot=true or setting it to false ?
> 
> Suppose we go this route, and clusters have a mix of RHCOS and RHEL compute
> nodes with same ROLE "worker", does the machine config also apply it to RHEL
> based nodes or somehow ignore that for RHEL compute nodes.
> 
> Reason for my ask is, I have a mix of compute nodes running RHEL and RHCOS.
> Upon trying to update, one of the machine config created as part of the
> update is looking for a path that exists in RHCOS but not in RHEL and
> eventually marked the RHEL compute nodes as SchedulingDisabled. (Refer to
> https://bugzilla.redhat.com/show_bug.cgi?id=1908842). 

I've not tried it but it may be possible add an additional label in the metadata to match rhcos:

```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
    node.openshift.io/os_id=rhcos
  name: 98-iscsi-initiator-dropin
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: coreos-generate-iscsi-initiatorname.service
          dropins: 
            - name: 10-disable-firstboot.conf
              contents: |
                [Unit]
                ConditionFirstBoot=
```

TBH, I am not certain of the behavior of the MCO when writing dropins for a service that does not exist.

Comment 5 umesh_sunnapu 2020-12-18 04:32:44 UTC
@Micah, I went ahead and tried creating machine config with contents below

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
    node.openshift.io/os_id: rhcos
  name: 98-iscsi-initiator-dropin
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: coreos-generate-iscsi-initiatorname.service
          dropins: 
            - name: 10-disable-firstboot.conf
              contents: |
                [Unit]
                ConditionFirstBoot=false

Nodes (only RHCOS) rebooted. But upon checking, the service was still down.

[root@compute-2 ~]# systemctl status coreos-generate-iscsi-initiatorname.service
● coreos-generate-iscsi-initiatorname.service - CoreOS Generate iSCSI Initiator Name
   Loaded: loaded (/usr/lib/systemd/system/coreos-generate-iscsi-initiatorname.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/coreos-generate-iscsi-initiatorname.service.d
           └─10-disable-firstboot.conf
   Active: inactive (dead)
Condition: start condition failed at Fri 2020-12-18 04:19:36 UTC; 4min 38s ago
           └─ ConditionFirstBoot=true was not met
     Docs: https://bugzilla.redhat.com/show_bug.cgi?id=1493296
           https://bugzilla.redhat.com/show_bug.cgi?id=1687722

Wondering am I missing something that caused this behavior

Comment 6 umesh_sunnapu 2020-12-18 05:24:00 UTC
@Micah, I went and tried the below process and things seems to be working fine

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
    node.openshift.io/os_id: rhcos
  name: 98-iscsi-initiator-update
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: coreos-generate-iscsi-initiatorname.service
          contents: |
            [Unit]
            Description=CoreOS Generate iSCSI Initiator Name
            Documentation=https://bugzilla.redhat.com/show_bug.cgi?id=1493296
            Documentation=https://bugzilla.redhat.com/show_bug.cgi?id=1687722
            ConditionFirstBoot=false
            ConditionPathExists=!/etc/iscsi/initiatorname.iscsi
            Before=iscsid.service
            [Service]
            Type=oneshot
            ExecStart=/usr/bin/sh -c 'echo "InitiatorName=`/usr/sbin/iscsi-iname`" > /etc/iscsi/initiatorname.iscsi'
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target

Once the machine config was created, I see that file /etc/iscsi/initiatorname.iscsi is created.

[core@csah ~]$ ssh compute-1 cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:xxxxxxxxx

Want to make sure this is a supported method. Please confirm the same.

Comment 7 Micah Abbott 2020-12-18 19:45:00 UTC
Fix merged and available in RHCOS 47.83.202012180742-0(In reply to umesh_sunnapu from comment #5)
> @Micah, I went ahead and tried creating machine config with contents below
> 
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>     node.openshift.io/os_id: rhcos
>   name: 98-iscsi-initiator-dropin
> spec:
>   config:
>     ignition:
>       version: 3.1.0
>     systemd:
>       units:
>         - name: coreos-generate-iscsi-initiatorname.service
>           dropins: 
>             - name: 10-disable-firstboot.conf
>               contents: |
>                 [Unit]
>                 ConditionFirstBoot=false

When changing an existing parameter in the systemd unit, the parameter must be reset first.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html

```
Note that for drop-in files, if one wants to remove entries from a setting that is parsed as a list (and is not a dependency), such as AssertPathExists= (or e.g. ExecStart= in service units), one needs to first clear the list before re-adding all entries except the one that is to be removed. Dependencies (After=, etc.) cannot be reset to an empty list, so dependencies can only be added in drop-ins. If you want to remove dependencies, you have to override the entire unit.
```

The correct contents would look like:

```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
    node.openshift.io/os_id: rhcos
  name: 98-iscsi-initiator-dropin
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: coreos-generate-iscsi-initiatorname.service
          dropins: 
            - name: 10-disable-firstboot.conf
              contents: |
                [Unit]
                ConditionFirstBoot=
                ConditionFirstBoot=false
```

Comment 8 Micah Abbott 2020-12-18 19:46:16 UTC
(In reply to umesh_sunnapu from comment #6)
> @Micah, I went and tried the below process and things seems to be working
> fine
> 
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>     node.openshift.io/os_id: rhcos
>   name: 98-iscsi-initiator-update
> spec:
>   config:
>     ignition:
>       version: 3.1.0
>     systemd:
>       units:
>         - name: coreos-generate-iscsi-initiatorname.service
>           contents: |
>             [Unit]
>             Description=CoreOS Generate iSCSI Initiator Name
>             Documentation=https://bugzilla.redhat.com/show_bug.cgi?id=1493296
>             Documentation=https://bugzilla.redhat.com/show_bug.cgi?id=1687722
>             ConditionFirstBoot=false
>             ConditionPathExists=!/etc/iscsi/initiatorname.iscsi
>             Before=iscsid.service
>             [Service]
>             Type=oneshot
>             ExecStart=/usr/bin/sh -c 'echo
> "InitiatorName=`/usr/sbin/iscsi-iname`" > /etc/iscsi/initiatorname.iscsi'
>             RemainAfterExit=yes
>             [Install]
>             WantedBy=multi-user.target
> 
> Once the machine config was created, I see that file
> /etc/iscsi/initiatorname.iscsi is created.
> 
> [core@csah ~]$ ssh compute-1 cat /etc/iscsi/initiatorname.iscsi
> InitiatorName=iqn.1994-05.com.redhat:xxxxxxxxx
> 
> Want to make sure this is a supported method. Please confirm the same.

This is technically supported, but the preferred method would be to use a dropin as described in the previous comments.

Comment 9 umesh_sunnapu 2020-12-18 20:02:07 UTC
Thanks @Micah. I will go ahead and test the drop in again. So in future, when the new RHCOS does not need this machine config, is it just good enough to delete the machine config and wondering if the compute nodes automatically update or do we have to create another machine config which will remove the changes and then the changes are reflected. 

What is the correct process.

Comment 11 Micah Abbott 2020-12-18 20:49:26 UTC
(In reply to umesh_sunnapu from comment #9)
> Thanks @Micah. I will go ahead and test the drop in again. So in future,
> when the new RHCOS does not need this machine config, is it just good enough
> to delete the machine config and wondering if the compute nodes
> automatically update or do we have to create another machine config which
> will remove the changes and then the changes are reflected. 
> 
> What is the correct process.

I would first upgrade the cluster/nodes to the latest version of RHCOS that has the fixed service, then you can just delete the Machine Config with the drop-in.  FWIW, this will incur two reboots: one for the upgrade, one for the removed Machine Config.

Comment 13 Michael Nguyen 2021-01-04 15:42:46 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-20-055006   True        False         65s     Cluster version is 4.7.0-0.nightly-2020-12-20-055006
$ oc get node -o name | xargs -I {} oc debug {} -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-140-198us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:99a08e94e237

Removing debug pod ...
Starting pod/ip-10-0-157-134us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:281e1772a2a4

Removing debug pod ...
Starting pod/ip-10-0-180-7us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:3e4ebe55cd1

Removing debug pod ...
Starting pod/ip-10-0-189-27us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:b6311e6c030

Removing debug pod ...
Starting pod/ip-10-0-197-122us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:e991ec894df0

Removing debug pod ...
Starting pod/ip-10-0-218-10us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.redhat:361fc025336c

Comment 18 errata-xmlrpc 2021-02-24 15:46:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.