Description of problem: Try to use iscsi on OCP 4.0, with authentication enabled. As node is immutable, we can not confiure initiator on nodes. Pod with iscsi volume failed with below error: Warning FailedMount 10s (x8 over 74s) kubelet, ip-10-0-136-148.us-east-2.compute.internal MountVolume.WaitForAttach failed for volume "pv-iscsi-9k5fn" : failed to get any path for iscsi disk, last err seen: iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260]. iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure) iscsiadm: Could not log into all portals Logging in to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260] (multiple) (exit status 24) Version-Release number of selected component (if applicable): 4.0.0-0.alpha-2019-03-12-005310 How reproducible: Always Steps to Reproduce: 1. Create iSCSI server, with authentication enabled. E.g, use CHAP. 2. Create iSCSI PV. 3. User create pod with iSCSI volume. Actual results: $ oc describe pod iscsi-9k5fn Name: iscsi-9k5fn Namespace: 9k5fn Priority: 0 PriorityClassName: <none> Node: ip-10-0-136-148.us-east-2.compute.internal/10.0.136.148 Start Time: Tue, 12 Mar 2019 14:13:53 +0800 Labels: name=iscsi Annotations: openshift.io/scc: node-exporter Status: Pending IP: Containers: iscsi: Container ID: Image: jhou/hello-openshift Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/iscsi from iscsi (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-v44mf (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: iscsi: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-iscsi-9k5fn ReadOnly: false default-token-v44mf: Type: Secret (a volume populated by a Secret) SecretName: default-token-v44mf Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 79s default-scheduler Successfully assigned 9k5fn/iscsi-9k5fn to ip-10-0-136-148.us-east-2.compute.internal Normal SuccessfulAttachVolume 79s attachdetach-controller AttachVolume.Attach succeeded for volume "pv-iscsi-9k5fn" Warning FailedMount 10s (x8 over 74s) kubelet, ip-10-0-136-148.us-east-2.compute.internal MountVolume.WaitForAttach failed for volume "pv-iscsi-9k5fn" : failed to get any path for iscsi disk, last err seen: iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260]. iscsiadm: initiator reported error (24 - iSCSI login failed due to authorization failure) iscsiadm: Could not log into all portals Logging in to [iface: default, target: iqn.2016-04.test.com:storage.target00, portal: 172.30.187.199,3260] (multiple) (exit status 24) Expected results: Document how iSCSI can work with OCP 4.0 E.g, should authentication be used or not or something like that.
You are right, the operator had two issues: * In Kubernetes 1.12, dynamic driver registration is enabled and the operator must start driver registrar with --kubelet-registration-path. This has been fixed in https://github.com/openshift/csi-operator/pull/44/files and I checked that it's available in today's OKD repository (registry.svc.ci.openshift.org/openshift/origin-v4.0:csi-operator) * For 1.12 a new hostpath driver is required, see https://github.com/openshift/csi-operator/pull/47
Oops, wrong bug, please ignore comment #1.
I edited /etc/iscsi/initiatorname.iscsi and then did `systemctl restart iscsid` and it worked for me. @lxia, does that work? Do we need to document it on openshift side or is it more of an iscsi-specific issue (i.e. admin needs to set up acl's correctly)?
In 4.0, nodes are dynamic provisioned/removed, so manually configuration on nodes is not acceptable.
the initiator name is the same for all nodes with the same OS image version so something (an operator? the node post-provision script?) will need to generate an initiator name for every node (iscsi-iname). Then the admin will need to periodically find out what the initiator name for every node is and keep their iscsi acl's updated so maybe the "operator" will need to write node:initiator name mappings to an openshift object for the admin to parse. Without a complex solution like this, I don't see a way to avoid manual ssh into the node (to either read or write /etc/iscsi/initiatorname.iscsi). Need some more input to figure out a solution. In 3.x, as far as I can tell, openshift-ansible installed iscsi-initiator-utils then did nothing, which is fine since configuration is a one-time thing.
We could set the unique part of the initiator name to equal the node name maybe?
/etc/iscsi/initiatorname.iscsi is created by iscsi-initiator-utils RPM package during %post. It is then baked into RHCOS images and every VM then has the same initiator name. That's the root cause of the bug - initiatorname.iscsi should be unique on each host. RHCOS should ship images without /etc/iscsi/initiatorname.iscsi and then generate a new one during the first boot. It's quite simple, from iscsi-initiator-utils %post script: if [ ! -f %{_sysconfdir}/iscsi/initiatorname.iscsi ]; then echo "InitiatorName=`/usr/sbin/iscsi-iname`" > %{_sysconfdir}/iscsi/initiatorname.iscsi fi (/usr/sbin/iscsi-iname is installed in current RHCOS8).
I'm taking a quick look at the package to see where the disconnect is.
See https://bugzilla.redhat.com/show_bug.cgi?id=1493294 which links to https://bugzilla.redhat.com/show_bug.cgi?id=1493296 Ideally we fix this upstream - a comment from the maintainer would be nice.
Checked with payload 4.1.0-0.nightly-2019-04-18-210657 (with Red Hat Enterprise Linux CoreOS 410.8.20190417.0 ), Verified the nodes are using different initiator names. [core@ip-172-31-136-154 ~]$ cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1994-05.com.redhat:ecba29bf977 [core@ip-172-31-136-71 ~]$ cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1994-05.com.redhat:aee4174ca864 Also verified iSCSI volume is working fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758