Bug 1687722

Summary: Fail to use authentication enabled iscsi on OCP 4.1
Product: OpenShift Container Platform Reporter: Liang Xia <lxia>
Component: RHCOSAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Liang Xia <lxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, aos-storage-staff, bbreard, dustymabe, imcleod, jligon, jsafrane, nstielau, walters, wsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1680012    

Description Liang Xia 2019-03-12 08:53:37 UTC
Description of problem:
Try to use iscsi on OCP 4.0, with authentication enabled.
As node is immutable, we can not confiure initiator on nodes.
Pod with iscsi volume failed with below error:
  Warning  FailedMount             10s (x8 over 74s)  kubelet, ip-10-0-136-148.us-east-2.compute.internal  MountVolume.WaitForAttach failed for volume "pv-iscsi-9k5fn" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260].
iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)
iscsiadm: Could not log into all portals
Logging in to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260] (multiple)
 (exit status 24)


Version-Release number of selected component (if applicable):
4.0.0-0.alpha-2019-03-12-005310


How reproducible:
Always

Steps to Reproduce:
1. Create iSCSI server, with authentication enabled. E.g, use CHAP.
2. Create iSCSI PV.
3. User create pod with iSCSI volume.

Actual results:
$ oc describe pod iscsi-9k5fn 
Name:               iscsi-9k5fn
Namespace:          9k5fn
Priority:           0
PriorityClassName:  <none>
Node:               ip-10-0-136-148.us-east-2.compute.internal/10.0.136.148
Start Time:         Tue, 12 Mar 2019 14:13:53 +0800
Labels:             name=iscsi
Annotations:        openshift.io/scc: node-exporter
Status:             Pending
IP:                 
Containers:
  iscsi:
    Container ID:   
    Image:          jhou/hello-openshift
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/iscsi from iscsi (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v44mf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  iscsi:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-iscsi-9k5fn
    ReadOnly:   false
  default-token-v44mf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v44mf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                From                                                 Message
  ----     ------                  ----               ----                                                 -------
  Normal   Scheduled               79s                default-scheduler                                    Successfully assigned 9k5fn/iscsi-9k5fn to ip-10-0-136-148.us-east-2.compute.internal
  Normal   SuccessfulAttachVolume  79s                attachdetach-controller                              AttachVolume.Attach succeeded for volume "pv-iscsi-9k5fn"
  Warning  FailedMount             10s (x8 over 74s)  kubelet, ip-10-0-136-148.us-east-2.compute.internal  MountVolume.WaitForAttach failed for volume "pv-iscsi-9k5fn" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260].
iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure)
iscsiadm: Could not log into all portals
Logging in to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260] (multiple)
 (exit status 24)


Expected results:
Document how iSCSI can work with OCP 4.0
E.g, should authentication be used or not or something like that.

Comment 1 Jan Safranek 2019-03-12 15:39:28 UTC
You are right, the operator had two issues:

* In Kubernetes 1.12, dynamic driver registration is enabled and the operator must start driver registrar with --kubelet-registration-path. This has been fixed in https://github.com/openshift/csi-operator/pull/44/files and I checked that it's available in today's OKD repository (registry.svc.ci.openshift.org/openshift/origin-v4.0:csi-operator)

* For 1.12 a new hostpath driver is required, see https://github.com/openshift/csi-operator/pull/47

Comment 2 Jan Safranek 2019-03-12 15:40:20 UTC
Oops, wrong bug, please ignore comment #1.

Comment 3 Matthew Wong 2019-03-12 18:54:42 UTC
I edited /etc/iscsi/initiatorname.iscsi and then did `systemctl restart iscsid` and it worked for me. @lxia, does that work? Do we need to document it on openshift side or is it more of an iscsi-specific issue (i.e. admin needs to set up acl's correctly)?

Comment 4 Liang Xia 2019-03-13 06:34:52 UTC
In 4.0, nodes are dynamic provisioned/removed, so manually configuration on nodes is not acceptable.

Comment 5 Matthew Wong 2019-03-13 18:54:58 UTC
the initiator name is the same for all nodes with the same OS image version so something (an operator? the node post-provision script?) will need to generate an initiator name for every node (iscsi-iname). Then the admin will need to periodically find out what the initiator name for every node is and keep their iscsi acl's updated so maybe the "operator" will need to write node:initiator name mappings to an openshift object for the admin to parse. Without a complex solution like this, I don't see a way to avoid manual ssh into the node (to either read or write /etc/iscsi/initiatorname.iscsi). Need some more input to figure out a solution. In 3.x, as far as I can tell, openshift-ansible installed iscsi-initiator-utils then did nothing, which is fine since configuration is a one-time thing.

Comment 6 Matthew Wong 2019-03-13 19:55:25 UTC
We could set the unique part of the initiator name to equal the node name maybe?

Comment 7 Jan Safranek 2019-03-25 11:25:43 UTC
/etc/iscsi/initiatorname.iscsi is created by iscsi-initiator-utils RPM package during %post. It is then baked into RHCOS images and every VM then has the same initiator name. That's the root cause of the bug - initiatorname.iscsi should be unique on each host.

RHCOS should ship images without /etc/iscsi/initiatorname.iscsi and then generate a new one during the first boot. It's quite simple, from iscsi-initiator-utils %post script:

        if [ ! -f %{_sysconfdir}/iscsi/initiatorname.iscsi ]; then
                echo "InitiatorName=`/usr/sbin/iscsi-iname`" > %{_sysconfdir}/iscsi/initiatorname.iscsi
        fi

(/usr/sbin/iscsi-iname is installed in current RHCOS8).

Comment 8 Steve Milner 2019-03-26 20:18:44 UTC
I'm taking a quick look at the package to see where the disconnect is.

Comment 12 Colin Walters 2019-03-27 15:35:48 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1493294
which links to https://bugzilla.redhat.com/show_bug.cgi?id=1493296

Ideally we fix this upstream - a comment from the maintainer would be nice.

Comment 22 Liang Xia 2019-04-19 01:08:26 UTC
Checked with payload 4.1.0-0.nightly-2019-04-18-210657 (with Red Hat Enterprise Linux CoreOS 410.8.20190417.0 ),

Verified the nodes are using different initiator names.
[core@ip-172-31-136-154 ~]$ cat /etc/iscsi/initiatorname.iscsi 
InitiatorName=iqn.1994-05.com.redhat:ecba29bf977

[core@ip-172-31-136-71 ~]$ cat /etc/iscsi/initiatorname.iscsi 
InitiatorName=iqn.1994-05.com.redhat:aee4174ca864


Also verified iSCSI volume is working fine.

Comment 24 errata-xmlrpc 2019-06-04 10:45:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758