Bug 1868174

Summary: MCO is being used to deploy worker node but fails to load file
Product: OpenShift Container Platform Reporter: Brandon Anderson <braander>
Component: RHCOSAssignee: Jonathan Lebon <jlebon>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.zCC: bbreard, dornelas, imcleod, jligon, kgarriso, miabbott, nstielau
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, specifying the iSCSI initiator name from an Ignition config would be ignored and replaced by a dynamically generated one. Now, we only dynamically generate a name if one hasn't already been provided via Ignition.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:27:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913    

Description Brandon Anderson 2020-08-11 22:01:25 UTC
Description of problem: MCO is configured to load /etc/iscsi/initiatorname.iscsi in new worker but fails to degraded state.

 The API respond with the correct file


[aguadarr@dlosbastion03 ~]$ curl -ks https://api-int.kubelab2.cscglobal.com:22623/config/worker | jq -r '.storage.files[] | select(.path=="/etc/iscsi/initiatorname.iscsi")'
{
"filesystem": "root",
"path": "/etc/iscsi/initiatorname.iscsi",
"contents": {
 "source": "data:text/plain;charset=utf-8;base64,SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg==",
 "verification": {}
},
"mode": 420
}

The content of the file /etc/iscsi/initiatorname.iscsi offer by the API config/worker is correct


[aguadarr@dlosbastion03 ~]$ echo SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg== | base64 -d
InitiatorName=iqn.1994-05.com.nutanix:k8s-worker

The log shows error reading the file /etc/iscsi/initiatorname.iscsi, even the logs shows that the file was written and selinux relabel the file, but the content is not the provided by the API


sh-4.4# journalctl  | grep 'etc/iscsi'
Aug 05 19:29:25 localhost iscsid[718]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Aug 05 19:29:25 localhost iscsid[718]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Aug 05 19:29:25 localhost iscsid[718]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Aug 05 19:31:19 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO     : files: createFilesystemsFiles: createFiles: op(1a): [started]  writing file "/etc/iscsi/initiatorname.iscsi"
Aug 05 19:31:19 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO     : files: createFilesystemsFiles: createFiles: op(1a): [finished] writing file "/etc/iscsi/initiatorname.iscsi"
Aug 05 19:31:20 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO     : files: createFilesystemsFiles: createFiles: op(1a): [started]  writing file "/etc/iscsi/initiatorname.iscsi"
Aug 05 19:31:20 ocp-compute-06.kubelab2.cscglobal.com ignition[1048]: INFO     : files: createFilesystemsFiles: createFiles: op(1a): [finished] writing file "/etc/iscsi/initiatorname.iscsi"
Aug 05 19:31:21 ocp-compute-06.kubelab2.cscglobal.com restorecon[1389]: Relabeled /etc/iscsi/initiatorname.iscsi from system_u:object_r:unlabeled_t:s0 to system_u:object_r:etc_t:s0
Aug 05 19:32:07 localhost iscsid[766]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Aug 05 19:32:07 localhost iscsid[766]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp or ib_iser) or partial offload (bnx2i or cxgbi iscsi), you may not be able to log into or discover targets. Please create a file /etc/iscsi/initiatorname.iscsi that contains a sting with the format: InitiatorName=iqn.yyyy-mm.<reversed domain name>[:identifier].
Aug 05 19:32:07 localhost iscsid[766]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi

Due this behavior the MCD is in the error loop


I0805 19:34:04.175620    3495 daemon.go:955] Validating against pending config rendered-worker-9eb0054542fb4cc0439e927d588d7344
E0805 19:34:04.176304    3495 daemon.go:1350] content mismatch for file /etc/iscsi/initiatorname.iscsi: InitiatorName=iqn.1994-05.com.
.
A: redhat:c6d9a5616ee1
.
.
B: nutanix:k8s-worker
.
.
E0805 19:34:04.176397    3495 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against rendered-worker-9eb0054542fb4cc0439e927d588d7344
I0805 19:34:06.175031    3495 daemon.go:724] Current+desired config: rendered-worker-9eb0054542fb4cc0439e927d588d7344


Actual results: Node fails to deploy and goes into Degraded 


Expected results: /etc/iscsi/initiatorname.iscsi to be successfully deployed and node to be Ready as this occurs with existing nodes

Comment 1 Kirsten Garrison 2020-08-12 17:02:56 UTC
@Brandon please provide a must gather from this cluster.

Comment 2 Brandon Anderson 2020-08-12 18:52:47 UTC
Hi Kirsten, got a fresh mustgather from the cluster here:

https://access.redhat.com/hydra/rest/cases/02702919/attachments/2ab94029-06a7-43b1-9f08-c479ae8b7d8a?usePresignedUrl=true

Comment 3 Kirsten Garrison 2020-08-12 18:54:50 UTC
Hi Brandon, 

I don't have hydra access can you please upload to GDrive?

Comment 4 Kirsten Garrison 2020-08-12 18:57:52 UTC
Actually I am able to download from the customer portal. No need to reupload.

Comment 5 Kirsten Garrison 2020-08-12 20:08:08 UTC
After discussing with Micah, moving this over to RHCOS for further triage.

Comment 6 Micah Abbott 2020-08-12 21:20:36 UTC
I'm able to reproduce this behavior on a 4.6 cluster; it looks like if I restart the `iscsid.service` after the MC is applied and the node comes back, it happily starts.

I'm wondering if there is something amiss happening in the initrd re: iscsid, since I see the service starting before the pivot into the real root of RHCOS.

```
Aug 12 20:53:43 localhost systemd[1]: Reached target Basic System.  
Aug 12 20:53:43 localhost iscsid[627]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi
Aug 12 20:53:43 localhost iscsid[627]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp >
Aug 12 20:53:43 localhost iscsid[627]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Aug 12 20:53:43 localhost iscsid[627]: If using hardware iscsi like qla4xxx this message can be ignored.
Aug 12 20:53:43 localhost iscsid[627]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
...
Aug 12 20:53:43 localhost systemd[1]: Starting OSTree Prepare OS/...
Aug 12 20:53:43 localhost ostree-prepare-root[729]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/aecccdc13ae1926a4907b0e53b959654adcd5dfd6e37728955c33163963c1fba.0
Aug 12 20:53:43 localhost systemd[1]: Started OSTree Prepare OS/.
Aug 12 20:53:43 localhost systemd[1]: Reached target Initrd Root File System.
Aug 12 20:53:43 localhost systemd[1]: Starting Reload Configuration from the Real Root...
Aug 12 20:53:43 localhost systemd[1]: Reloading.
Aug 12 20:53:44 localhost systemd[1]: Started Reload Configuration from the Real Root.
...
Aug 12 20:53:44 localhost systemd[1]: Starting Switch Root...
Aug 12 20:53:44 localhost systemd[1]: Switching root.
...
Aug 12 20:53:48 cosa-devsh systemd[1]: iscsi.service: Unit cannot be reloaded because it is inactive.
```

Additional strangeness I observed:  after applying the MC, attempts to view the `/etc/iscsi/initiatorname.iscsi` file with `vi` shows an empty file.  But a `cat` of the file shows the contents.

<time passes>

I experimented with `rpm-ostree initramfs` to see if I could include the `/etc/iscsi/initiatorname.iscsi` in the initrd and had some successs:

```
$ sudo rpm-ostree initramfs --arg=-I --arg=/etc/iscsi/initiatorname.iscsi --enable      
Checking out tree 935aac4... done                                                                         
Generating initramfs... done                                                                              
Writing OSTree commit... done                                                                             
Staging deployment... done                                                                                
Initramfs regeneration is now: enabled                                                                    
[core@cosa-devsh ~]$ rpm-ostree status                                                                    
State: idle                                                                                               
Deployments:                                                                                              
  ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7
                   Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z)
                 Initramfs: -I /etc/iscsi/initiatorname.iscsi 
...        
```

When rebooting into the new deployment, I no longer see any errors about the inability to open the `/etc/iscsi/initiatorname.iscsi` file.

```

$ rpm-ostree status
State: idle
Deployments:
* ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7
                   Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z)
                 Initramfs: -I /etc/iscsi/initiatorname.iscsi 

  ostree://935aac448cdde5455ab18e1b6141504504e2fa9f00120a6e262dff1c7e0ffba7
                   Version: 46.82.202008031957-0 (2020-08-03T20:02:51Z)

$ systemctl status iscsid
● iscsid.service - Open-iSCSI
   Loaded: loaded (/usr/lib/systemd/system/iscsid.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2020-08-12 21:15:44 UTC; 1min 9s ago
     Docs: man:iscsid(8)
           man:iscsiuio(8)
           man:iscsiadm(8)
 Main PID: 624 (code=exited, status=0/SUCCESS)
   Status: "Ready to process requests"

Aug 12 21:15:42 localhost systemd[1]: Starting Open-iSCSI...
Aug 12 21:15:42 localhost systemd[1]: Started Open-iSCSI.
Aug 12 21:15:44 localhost systemd[1]: Stopping Open-iSCSI...
Aug 12 21:15:44 localhost iscsid[624]: iscsid: iscsid shutting down.
Aug 12 21:15:44 localhost systemd[1]: Stopped Open-iSCSI.
```


Compared to the failure case:

```
[core@cosa-devsh ~]$ systemctl status iscsid                                                              
● iscsid.service - Open-iSCSI                                                                             
   Loaded: loaded (/usr/lib/systemd/system/iscsid.service; disabled; vendor preset: disabled)             
   Active: inactive (dead) since Wed 2020-08-12 21:15:02 UTC; 11s ago                                     
     Docs: man:iscsid(8)  
           man:iscsiuio(8)                                                                                
           man:iscsiadm(8)                                                                                
 Main PID: 623 (code=exited, status=0/SUCCESS)                                                            
   Status: "Ready to process requests"                                                                    
                                                                                                          
Aug 12 21:15:00 localhost systemd[1]: Starting Open-iSCSI...           
Aug 12 21:15:00 localhost iscsid[623]: iscsid: can't open InitiatorName configuration file /etc/iscsi/initiatorname.iscsi                                                                                            
Aug 12 21:15:00 localhost iscsid[623]: iscsid: Warning: InitiatorName file /etc/iscsi/initiatorname.iscsi does not exist or does not contain a properly formatted InitiatorName. If using software iscsi (iscsi_tcp >
Aug 12 21:15:00 localhost iscsid[623]: Example: InitiatorName=iqn.2001-04.com.redhat:fc6.
Aug 12 21:15:00 localhost iscsid[623]: If using hardware iscsi like qla4xxx this message can be ignored.
Aug 12 21:15:00 localhost iscsid[623]: iscsid: can't open InitiatorAlias configuration file /etc/iscsi/initiatorname.iscsi
Aug 12 21:15:00 localhost systemd[1]: Started Open-iSCSI.
Aug 12 21:15:02 localhost systemd[1]: Stopping Open-iSCSI...
Aug 12 21:15:02 localhost iscsid[623]: iscsid: iscsid shutting down.
Aug 12 21:15:02 localhost systemd[1]: Stopped Open-iSCSI.
```



So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC.

Additionally, we may need to change how RHCOS is configured initially so that it can look for the `/etc/iscsi/initiatorname.iscsi` file by default.  And/or when a cluster is installed and the `/etc/iscsi/initiatorname.iscsi` is provided as part of a day 1 MC/Ignition config.



I'm tenatively targeting for 4.6, but depending on the complexity of the fix, we may need to push it out to a later release.

Comment 7 Jonathan Lebon 2020-08-17 21:46:49 UTC
So I think what's going here is that we currently ship a service which unconditionally regenerates the initiator name on boot and overwrites the one from the Ignition config. Posted a fix in https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098 to avoid that.

> So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC.

We should double-check with the customer, but IIUC the context for this is iSCSI-backed persistent storage. So we shouldn't need to modify the initramfs.

Comment 8 Jonathan Lebon 2020-08-18 19:52:09 UTC
Part of this bootimage bump: https://github.com/openshift/installer/pull/4066.

Comment 9 Jonathan Lebon 2020-08-21 14:47:54 UTC
Bootimage bump merged.

Comment 10 Brandon Anderson 2020-09-02 14:58:09 UTC
(In reply to Jonathan Lebon from comment #7)
> So I think what's going here is that we currently ship a service which
> unconditionally regenerates the initiator name on boot and overwrites the
> one from the Ignition config. Posted a fix in
> https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1098 to
> avoid that.
> 
> > So I suspect we need to regenerate the initramfs when the `/etc/iscsi/initiatorname.iscsi` is changed via an MC.
> 
> We should double-check with the customer, but IIUC the context for this is
> iSCSI-backed persistent storage. So we shouldn't need to modify the
> initramfs.

I followed up with the CU and confirmed that this is correct.

Comment 12 Micah Abbott 2020-09-19 22:20:55 UTC
Verified with 4.6.0-0.nightly-2020-09-19-004228

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-19-004228   True        False         21m     Cluster version is 4.6.0-0.nightly-2020-09-19-004228

$ oc debug node/ip-10-0-135-149.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-135-149us-west-2computeinternal-debug ...                                                                                                 
To use host binaries, run `chroot /host`                                                                                                                       
InitiatorName=iqn.1994-05.com.redhat:cabdb8e83dc                                                                                                               
                                                                                                                                                               
Removing debug pod ...                                                           

$ cat bz1868174.yaml
# This MachineConfig demonstrates the minimal requirements for writing out a
# file to the node
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-iscsi-config
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg==
          filesystem: root
          path: /etc/iscsi/initiatorname.iscsi
          overwrite: true
          mode: 0644

$ echo "SW5pdGlhdG9yTmFtZT1pcW4uMTk5NC0wNS5jb20ubnV0YW5peDprOHMtd29ya2VyCg==" | base64 -d
InitiatorName=iqn.1994-05.com.nutanix:k8s-worker

$ oc apply -f bz1868174.yaml                                                                                                                                                                                                                       
machineconfig.machineconfiguration.openshift.io/99-iscsi-config created                                                                                        

$ oc get mc                                                                                                                                                                                                                                        
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE                                            
00-master                                          c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                          
00-worker                                          c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                          
01-master-container-runtime                        c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                                                                                                                                                                                         
01-master-kubelet                                  c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                          
01-worker-container-runtime                        c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                                                                                                                                                                                         
01-worker-kubelet                                  c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             6h19m                                                                                                                                                                                                         
99-iscsi-config                                                                               3.1.0             6s                                                                                                                                                                                                            
99-master-generated-registries                     c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             30m                                                                                                                                                                                                           
99-master-ssh                                                                                 2.2.0             6h20m                                          
99-worker-generated-registries                     c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             30m                                                                                                                                                                                                           
99-worker-ssh                                                                                 2.2.0             6h20m                                                                                                                                                                                                         
rendered-master-474cb317eef983b8dcfaabf37c40885f   b5ab0fbd69c492007e09455099fd3f3a79d3433b   2.2.0             3h16m                                                                                                                                                                                                         
rendered-master-54e984e6208d40d139d8bff323169dbf   55f7317224e7d8badc98879662771a14185e5739   2.2.0             4h10m                                          
rendered-master-8efa482986f14061666a1c1b38bdd393   204f5642bd3adbfa5c85e74958223fb0cf8ad2db   2.2.0             4h53m                                                                                                                                                                                                         
rendered-master-90a25e29ca44440df01724fdf78b8d42   f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b   2.2.0             6h19m                                                                                                                                                                                                         
rendered-master-94c0c10ef6d26ac79f83f26dddb2886b   08aad1925d6e29266390ecb6f4e6730d60e44aaf   2.2.0             113m                                                                                                                                                                                                          
rendered-master-b383db8c79b8e1c414be6000fefe4a4e   08aad1925d6e29266390ecb6f4e6730d60e44aaf   2.2.0             38m                                                                                                                                                                                                           
rendered-master-ea3d50e6005eb46bd31d38c7af0e1890   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             30m                                                                                                                                                                                                           
rendered-worker-0617f5423aec0a1738e3732db8440a1e   204f5642bd3adbfa5c85e74958223fb0cf8ad2db   2.2.0             4h53m                                                                                                                                                                                                         
rendered-worker-0aee5a3cb624b11e90a5265912b23e1a   55f7317224e7d8badc98879662771a14185e5739   2.2.0             4h10m                                                                                                                                                                                                         
rendered-worker-63fb5d49df2b0a460bcbd707a0296441   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             1s                                                                                                                                                                                                            
rendered-worker-6bc097f3ef8898db9837a3d4399122cf   08aad1925d6e29266390ecb6f4e6730d60e44aaf   2.2.0             113m                                                                                                                                                                                                          
rendered-worker-7966a5c09dee72a06c3069ebf3b7d66c   f56d736e74af8fb0dc85c4b1ee3cc8d1d1f6600b   2.2.0             6h19m                                                                                                                                                                                                         
rendered-worker-89b4196319e54095cad4b003daa21e96   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             30m                                                                                                                                                                                                           
rendered-worker-d406aa7432afacbe2b7243ef8e7d40e4   b5ab0fbd69c492007e09455099fd3f3a79d3433b   2.2.0             3h16m    

$ oc debug node/ip-10-0-135-149.us-west-2.compute.internal -- chroot /host cat /etc/iscsi/initiatorname.iscsi
Starting pod/ip-10-0-135-149us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
InitiatorName=iqn.1994-05.com.nutanix:k8s-worker

Removing debug pod ...
```

Comment 14 errata-xmlrpc 2020-10-27 16:27:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196